User Tools

Site Tools


powering_up_down_nodes

Powering Nodes Up and Down

Each Limulus system consists of one login-node and three or seven worker nodes. As shipped, the login-node has the alias name headnode (or limulus) and the worker node names are “n0, n1, and n2” and “n0, n1, n2, n3, n4, n5, and n6”. Do not attempt to change these the node names; many services rely on this naming scheme.

Power to the three (or seven) worker nodes is controlled by the main login-node (the motherboard to which the monitor, keyboard, and mouse are connected on the back of the case). There is a set of relays underneath the 1 GbE switch inside the case. These relays can be controlled directly by using the relayset program (see below).

Direct use of relayset is not recommended as a method to control the nodes, because it essentially cuts or applies power to the nodes without shutting down the operating system.

Use of node-poweron and node-poweroff utilities or the power control GUI is strongly recommended. These utilities are described below.

!!!Resetting or rebooting the login-node will shutdown all worker nodes!!!

Worker Node Startup on HPC Systems

All nodes are in the powered down state when the system is initially started. Nodes can be started using either the command line or the GUI tool described below. It is assumed that the user can control what nodes are powered on (or off) as they use the system. If all nodes are to be started when the system starts, add the line /usr/sbin/node-poweron at the end of the Limulus systemd startup file (this file acts like an rc.local file in SysV init systems). The file can be found in the “hidden” Limulus management directory: /etc/warewulf/.Limulus/Limulus-startup.sh. Be careful modifying this file (make a backup before editing). As noted previously, node power control can be combined with the Slurm resource scheduler to automatically power-on/off nodes as they are needed. See Slurm Workload Manager.

HPC System Shutdown

Shutting down the login-node, either by rebooting or powering down, will gracefully shut down the worker nodes (i.e. they will receive a local poweroff command sent by the login-node).

Worker Node Startup on Data Analytics (Hadoop) Systems

All nodes and all services (e.g. HDFS, YARN) are started when the system is initially powered on. Nodes can be started using either the command line or the GUI tool described below, however, power cycling nodes may disrupt the running service daemons.

Data Analytics (Hadoop) System Shutdown

Proper shutdown of Data Analytics (Hadoop) systems is provided in the Using the Apache Ambari Cluster Manager section. Basically, the Hadoop services should be gracefully shut down before the system is powered down. Note that these services are robust and can often recover from a sudden (or unexpected) power loss, however, data can be lost – particularly in HDFS.

Command Line Power Control

The node-poweron and node-poweroff are command line utilities that are available on the login-node. These commands will only operate for the root users.

Executing either command with no arguments will power-on/off all nodes. An individual node can be supplied as an argument.

node-poweron

To turn on all nodes, simply enter:

# node-poweron

To turn on node n2 enter:

# node-poweron n2

The node-poweron command will wait until the node(s) have fully booted (i.e. the operating system is up and running), or if no operating system can be detected it will “time-out.” Thus, the command can take up to several minutes to complete. Interrupting this command may put the system in an unstable state. The full option list is given below. Note, there is a -s option to run the command with no output in a script.

Also note that if a node is already up and running, turning the power on with node-poweron will have no effect.

# node-poweron -h

node-power-on [-h help] [-s silent] [nodes]
No node arguments turns all nodes ON. If a node is already on, nothing will happen.
Node name(s) can be given as argument(s) in the range {n0,...,n6}. For example:
  # node-poweron n0 n2
  # node-poweron -s n1
  # node-poweron -s 
Invalid nodes will be ignored. Default Limulus nodes are {n0,n1,n2}
The script waits until all nodes are started or the process times out.
 -s runs in quiet mode; -h provides this help.

node-poweroff

To turn all nodes off gracefully (remove power), simply enter:

# node-poweroff

To turn off just node n0 enter:

# node-poweroff n0

The node-poweroff command will wait until the node(s) have fully shut down (i.e. it cannot contact the node operating systems) or until a timeout occurs. If the timeout occurs AND the node relay indicates the power is applied, power is removed (relay is turned off) regardless of operating system state (i.e. the “plug is pulled” for the node).

Like node-powerup, this command can take up to several minutes to complete. Interrupting this command may put the system in an unstable state. The full option list is given below. Note, there is a -s option to run the command with no output.

Also note that if a node is already down, turning the power off with node-poweroff will have no effect.

# node-poweroff -h

node-power-on [-h help] [-s silent] [nodes]
No node arguments turns all nodes OFF. If a node is already off, nothing will happen.
Node name(s) can be given as argument(s) in the range {n0,...,n6}. For example:
  # node-poweroff n0 n2
  # node-poweroff -s n1
  # node-poweroff -s
Invalid nodes will be ignored. Default Limulus nodes are {n0,n1,n2}
A delay is included so nodes can properly shutdown before power
is removed. Any node attached drives are placed in stand-by mode.
-s runs in quiet mode; -h provides this help.

Power Control GUI

Power to the nodes can also be controlled using a GUI tool. Using both the command line and GUI tools at the same time may cause system instability.

The GUI power control tool can be started from the command line by entering, as the root user:

# NPstat

There is also a “Node Power Status” entry in the Applications/System Tools menu (only visible to the root user). An example menu is shown below.

The main power control panel shown below is displayed either through the command line or the Applications menus. In addition to power status there are several other status indicators. These indicators are described as follows:

  • Power - indicates if the power relay is “on” or “off.”
  • OS Up - indicates if the node Operating System is up and running (indicated by a “YES”). It is possible to have power applied (relay is on) and the operating system unresponsive (see the FAQ section). Also, if the power is off, this will always be “NO.”
  • Users - indicates the number of users logged into the node. This information is provided so that user activity is not accidentally terminated.
  • Load - an indication of how busy the node is. This information is provided so that background activity is not accidentally terminated.
  • Mem - an indication of how much memory is being used. Similar to the Load and User status, this information is provided so that background activity is not accidentally terminated.
  • Days Alive – the number of days since the node was started.

Note, the Node Power Control tool is not intended as a monitoring tool. The response times can be slow due to how the information is obtained from the nodes. It is primarily designed to provide information needed for power control of the worker nodes.

There are three button at the bottom of the panel.

  1. Refresh - refresh the panel data. The last refresh time is shown at the top of the panel. The panles des not “auto refresh.”
  2. Node Power - open the power control selection window shown below.
  3. Quit - quit the utility.

Selecting the node to control is done with the selection window shown below. Any combination of nodes can be powered on or off using this selection box. If a node is checked it will be powered on. If a node is not checked it will be powered off. The node name and the current status are indicated on the panel. Similar to the command line tools, if a node is already on (or off) setting it to on (or off), will have no effect.

In the selection window above, node n0 will have no change, node n1 will be turn off, node n2 will be turned on. Clicking “OK” will display the Confirmation Window shown below.

This window will indicate what will happen for each node. The power control choices can be changed by using the “Back” button. Once the choice is correct, entering “Yes” will start the power control operations. An indicator of which nodes are powering down will be displayed until they are finished. Like the command line tools, the shutdown will time out and cut power if the operating system's shutdown cannot be confirmed. The “Close” button does not stop the shutdown process.

Next, all nodes slated to power up will be started. In this case, node n2 will start. This process cannot be interrupted and the window will remain until the startup process is complete or the timeout has been reached.

At the end of the the power-up or power-down cycle, the main Power Control panel will display the current state of the system.

Using the Relayset Program

The low-level relayset utility is available to administrators, however, it should only be used as a last resort. The node-power{on/off} utilities provide a much more controlled and graceful method of turning systems on and off.

For reference, the options to relayset are provided below.

Important: All Limulus systems use the following convention: n0 is connected to relay-2, n1 is connected to relay-3, and n2 is connected to relay-4. Contact Limulus Computing for larger numbers of nodes.

# relayset 
Not enough or wrong arguments.
  To initialize (do first):  relayset init
  To turn relay on/off:      relayset 1|2|3|4 on|off
  To get status:             relayset 1|2|3|4 status
                             (Returns 1 if on, 0 if off)
  To list the devices (with ID):  relayset list
  To create the relay node map: relayset map 
  For multiple boards add the board ID to the command line (not needed for list and map)
  To print debug messages add "debug" to the command line
  Returns -1 on error, 0 or 1 if successful.
Version: 0.3-05-30-17
powering_up_down_nodes.txt · Last modified: 2021/04/29 17:36 by brandonm