User Tools

Site Tools


powering_up_down_nodes

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
powering_up_down_nodes [2020/11/10 14:22]
deadline clean up and added to relayset
powering_up_down_nodes [2021/04/29 17:36] (current)
brandonm [Using the Relayset Program] Punctuation, spelling, and word fixes
Line 1: Line 1:
 ===== Powering Nodes Up and Down===== ===== Powering Nodes Up and Down=====
  
-Each Limulus system consists of one login-node and four or seven worker nodes. As shipped, the login-node has the alias name ''headnode'' (or ''limulus'') and the worker node names are "n0, n1, and n2" and "n0, n1, n2, n3, n4, n5, and n6" Do not attempt to change these the node names because many services rely on this naming scheme.+Each Limulus system consists of one login-node and three or seven worker nodes. As shipped, the login-node has the alias name ''headnode'' (or ''limulus'') and the worker node names are "n0, n1, and n2" and "n0, n1, n2, n3, n4, n5, and n6"Do not attempt to change these the node namesmany services rely on this naming scheme.
  
-Power to the three (or seven) worker nodes is controlled by the main login-node (The motherboard on which the monitor and keyboard and mouse are connected on the back of the case). There are a set of relays underneath the 1 GbE switch inside the case. These relays can be controlled directly by using the ''relayset'' program (see below).+Power to the three (or seven) worker nodes is controlled by the main login-node (the motherboard to which the monitorkeyboardand mouse are connected on the back of the case). There is a set of relays underneath the 1 GbE switch inside the case. These relays can be controlled directly by using the ''relayset'' program (see below).
  
-Direct use of ''relayset'' is not recommendedas a method to control the nodes because it essential cuts or applies power to the nodes without shutting down the operating system.+Direct use of ''relayset'' is not recommended as a method to control the nodesbecause it essentially cuts or applies power to the nodes without shutting down the operating system.
  
-Use of ''node-poweron'' and ''node-poweroff'' utilities or the power control GUI is strongly recommenced. These utilities are described below. +Use of ''node-poweron'' and ''node-poweroff'' utilities or the power control GUI is strongly recommended. These utilities are described below. 
  
-Also note, node power control can be combined with the Slurm resource scheduler to automatically power-on/off nodes as they are needed. +**!!!Resetting or rebooting the login-node will shutdown all worker nodes!!!**
  
 ===Worker Node Startup on HPC Systems=== ===Worker Node Startup on HPC Systems===
  
-All nodes are in the powered down state when the system is initially started. Nodes can be started using either the command line of the GUI tool described below.+All nodes are in the powered down state when the system is initially started. Nodes can be started using either the command line or the GUI tool described below. It is assumed that the user can control what nodes are powered on (or off) as they use the system. If all nodes are to be started when the system starts, add the line ''/usr/sbin/node-poweron'' **at the end** of the Limulus systemd startup file (this file acts like an ''rc.local'' file in SysV init systems). The file can be found in the "hidden" Limulus management directory: ''/etc/warewulf/.Limulus/Limulus-startup.sh''. Be careful modifying this file (make a backup before editing). As noted previously, node power control can be combined with the Slurm resource scheduler to automatically power-on/off nodes as they are needed. See [[Slurm Workload Manager|Slurm Workload Manager]].
  
-**Resetting or rebooting the login-node will shutdown all worker nodes.**+===HPC System Shutdown===
  
-===Worker Node Startup on Data Analytics Systems===+Shutting down the login-node, either by rebooting or powering down, will gracefully shut down the worker nodes (i.e. they will receive a local ''poweroff'' command sent by the login-node).  
  
-All nodes and all services (e.g. HDFS, YARN) are started when the system is initially powered-on. Nodes can be started using either the command line or the GUI tool described below, however, power cycling nodes may disrupt the running service daemons.  
  
-**Resetting or rebooting the login node will shutdown all worker nodes.**+===Worker Node Startup on Data Analytics (Hadoop) Systems===
  
-===System Shutdown===+All nodes and all services (e.g. HDFS, YARN) are started when the system is initially powered on. Nodes can be started using either the command line or the GUI tool described below, however, power cycling nodes may disrupt the running service daemons. 
  
-Shutting down the login-nodeeither by rebooting or powering down will gracefully shutdown the worker nodes (i.e. they will receive local ''poweroff'' command (sent by the login-node). +===Data Analytics (Hadoop) System Shutdown=== 
 + 
 +Proper shutdown of Data Analytics (Hadoop) systems is provided in the [[Using the Apache Ambari Cluster Manager|Using the Apache Ambari Cluster Manager]] section. Basicallythe Hadoop services should be gracefully shut down before the system is powered downNote that these services are robust and can often recover from sudden (or unexpectedpower loss, however, data can be lost -- particularly in HDFS
  
 ====Command Line Power Control==== ====Command Line Power Control====
Line 43: Line 44:
   # node-poweron n2   # node-poweron n2
      
-The ''node-poweron'' command will wait until the node(s) has fully booted (i.e. the operating system is up and running) or if no operating system can be detected it will "time-out." Thus, **the command can take upto several minutes to complete**. Interrupting this command may put the system in an unstable state. The full option list is given below. Note, there is a ''-s'' option to run the command with no output in a script.+The ''node-poweron'' command will wait until the node(s) have fully booted (i.e. the operating system is up and running)or if no operating system can be detected it will "time-out." Thus, **the command can take up to several minutes to complete**. Interrupting this command may put the system in an unstable state. The full option list is given below. Note, there is a ''-s'' option to run the command with no output in a script.
  
 Also note that if a node is already up and running, turning the power on with ''node-poweron'' will have no effect.   Also note that if a node is already up and running, turning the power on with ''node-poweron'' will have no effect.  
Line 71: Line 72:
   # node-poweroff n0   # node-poweroff n0
      
-The ''node-poweroff'' command will wait until the node(s) has fully shutdown (i.e. it cannot contact the node operating systems) or until a timeout occurs. If the timeout occurs AND the node relay indicates the power is applied, power is removed (relay is turned off) regardless of operating system state (i.e. the "plug is pulled" for the node).+The ''node-poweroff'' command will wait until the node(s) have fully shut down (i.e. it cannot contact the node operating systems) or until a timeout occurs. If the timeout occurs AND the node relay indicates the power is applied, power is removed (relay is turned off) regardless of operating system state (i.e. the "plug is pulled" for the node).
  
-Like ''node-powerup''**this command can take up to several minutes to complete**. Interrupting this command may put the system in an unstable state. The full option list is given below. Note, there is a ''-s'' option run the command with no output.+Like ''node-powerup''**this command can take up to several minutes to complete**. Interrupting this command may put the system in an unstable state. The full option list is given below. Note, there is a ''-s'' option to run the command with no output.
  
 Also note that if a node is already down, turning the power off with ''node-poweroff'' will have no effect.   Also note that if a node is already down, turning the power off with ''node-poweroff'' will have no effect.  
Line 95: Line 96:
 ====Power Control GUI==== ====Power Control GUI====
  
-Power to the nodes can also be controlled using a GUI tool. Using both the command line and GUI tools at the same time may case system instability. +Power to the nodes can also be controlled using a GUI tool. Using both the command line and GUI tools at the same time may cause system instability. 
  
-The GUI power control tool can be stared from the command line by entering (it will only allow the root uses to run the command):+The GUI power control tool can be started from the command line by entering, as the root user:
  
   # NPstat   # NPstat
      
-There is also an "Node Power Status" entry in the Applications/System Tools menu (only visible to the root user).+There is also "Node Power Status" entry in the Applications/System Tools menu (only visible to the root user).
 An example menu is shown below.  An example menu is shown below. 
  
Line 107: Line 108:
 {{ :wiki:applications-menu.png?340 |}} {{ :wiki:applications-menu.png?340 |}}
  
-The main power control panel shown below is displayed either through the command line or the Applications menus. IN addition to power status there are several other status indicators. These indicators are described as follows+The main power control panel shown below is displayed either through the command line or the Applications menus. In addition to power status there are several other status indicators. These indicators are described as follows:
  
 {{ :wiki:node-power-control-main.png?400 |}} {{ :wiki:node-power-control-main.png?400 |}}
  
-  * **Power** - indicates if the power relay is "on" or "off" +  * **Power** - indicates if the power relay is "on" or "off.
-  * **OS Up** - indicates if the node Operating Systems is up and running (indicated by a "YES"). It is possible to have power applied (relay is on) and the operating system unresponsive (see the FAQ section). Also, if the power is off, this will always be "NO" +  * **OS Up** - indicates if the node Operating System is up and running (indicated by a "YES"). It is possible to have power applied (relay is on) and the operating system unresponsive (see [[start#Frequently Asked Questions|the FAQ section]]). Also, if the power is off, this will always be "NO.
-  * **Users** - indicated the number of users logged into the node. This information is provided so that user activity is not accidentally terminated.+  * **Users** - indicates the number of users logged into the node. This information is provided so that user activity is not accidentally terminated.
   * **Load** - an indication of how busy the node is. This information is provided so that background activity is not accidentally terminated.   * **Load** - an indication of how busy the node is. This information is provided so that background activity is not accidentally terminated.
   * **Mem** - an indication of how much memory is being used. Similar to the Load and User status, this information is provided so that background activity is not accidentally terminated.   * **Mem** - an indication of how much memory is being used. Similar to the Load and User status, this information is provided so that background activity is not accidentally terminated.
   * **Days Alive** -- the number of days since the node was started.    * **Days Alive** -- the number of days since the node was started. 
  
-**Note**, The Node Power Control tool is not intended as a monitoring tool. The response times can be slow due to how the information is obtained form the nodes. It is primarily designed to provide information needed for power control of the worker nodes. +**Note**, the Node Power Control tool is not intended as a monitoring tool. The response times can be slow due to how the information is obtained from the nodes. It is primarily designed to provide information needed for power control of the worker nodes. 
  
 There are three button at the bottom of the panel. There are three button at the bottom of the panel.
  
-  - **Refresh** - refresh the panel data. The last refresh time is shown at the top of the panel. The panles des not "auto refresh"+  - **Refresh** - refresh the panel data. The last refresh time is shown at the top of the panel. The panles des not "auto refresh."
   - **Node Power** - open the power control selection window shown below.    - **Node Power** - open the power control selection window shown below. 
-  - **Quit** - quit the utility +  - **Quit** - quit the utility.
  
-Selecting node to control is done with the selection window shown below. Any combination of nodes can be powered on or off using this selection box. If a node is checked it will be powered on. If a node is not checked it will be powered off. The node name and the current status are indicated on the panel. Similar to the command line tools, if a node is already on (or off) setting it to on (or off) will have no effect. +Selecting the node to control is done with the selection window shown below. Any combination of nodes can be powered on or off using this selection box. If a node is checked it will be powered on. If a node is not checked it will be powered off. The node name and the current status are indicated on the panel. Similar to the command line tools, if a node is already on (or off) setting it to on (or off)will have no effect. 
  
 {{ :wiki:node-power-control-select.png?320 |}} {{ :wiki:node-power-control-select.png?320 |}}
Line 134: Line 135:
 {{ :wiki:node-power-control-confirm.png?360 |}} {{ :wiki:node-power-control-confirm.png?360 |}}
  
-This window will indicate "what will happenfor each node. The power control choices can be changed by using the "Back" button. Oncethe choice is correct, entering "Yes" will start the power control operations. An indicator of which nodes are powering down will be displayed until they are complete. Like the command line tools, the shutdown will time out and cut power if the operating systems shutdown cannot be confirmed. The "Close" button does not stop the shutdown process. +This window will indicate what will happen for each node. The power control choices can be changed by using the "Back" button. Once the choice is correct, entering "Yes" will start the power control operations. An indicator of which nodes are powering down will be displayed until they are finished. Like the command line tools, the shutdown will time out and cut power if the operating system'shutdown cannot be confirmed. The "Close" button does not stop the shutdown process. 
  
 {{ :wiki:node-power-control-power-down.png?240 |}} {{ :wiki:node-power-control-power-down.png?240 |}}
  
-Next all nodes slated to power-up will be started. In this case, node ''n2'' will start. This process cannot be interrupted and the window will remain until the startup process is complete or the timeout has been reached. +Nextall nodes slated to power up will be started. In this case, node ''n2'' will start. This process cannot be interrupted and the window will remain until the startup process is complete or the timeout has been reached. 
  
-At the end of the the power-up or power-down cyce the main Power Control panel will be display with the current state of the system.+At the end of the the power-up or power-down cycle, the main Power Control panel will display the current state of the system.
  
 {{ :wiki:node-power-control-power-up.png?240 |}} {{ :wiki:node-power-control-power-up.png?240 |}}
Line 148: Line 149:
 ====Using the Relayset Program==== ====Using the Relayset Program====
  
-The low level ''relayset'' utility is available to administrators, however it should only be used as a last resort. The node-power{on/off} utilities provide a much more controled and gracefull method of turning systems on and off.+The low-level ''relayset'' utility is available to administrators, howeverit should only be used as a last resort. The node-power{on/off} utilities provide a much more controlled and graceful method of turning systems on and off.
  
 For reference, the options to ''relayset'' are provided below.  For reference, the options to ''relayset'' are provided below. 
  
-**Important:** All Limulus systems use the following conventionn0 is connected to relay-2, n1 is connected to relay-3, and n2 is connected to relay-4. Contact Limulus Computing for larger number of nodes.+**Important:** All Limulus systems use the following conventionn0 is connected to relay-2, n1 is connected to relay-3, and n2 is connected to relay-4. Contact Limulus Computing for larger numbers of nodes.
  
 <code> <code>
powering_up_down_nodes.1605018121.txt.gz · Last modified: 2020/11/10 14:22 by deadline