=====Limulus HPC Systems Quick Start===== The following is basic information about your Limulus system. Please consult the other sections in this manual for further details. A general description of the systems can be found by consulting the [[Functional Diagram|Functional Diagram]]. ====Adding Users:==== There are two scripts for adding and deleting users: AddUser DelUser There is also a GUI tool for adding and deleting users. The GUI can be be found in the root ''Applications/System Tools'' menu or started from the command line using: $ AddUser -g See [[Adding Users|Adding Users]]. ====The Case: ==== The case has two doors on both sides and a removable front bezel. The front bezel can be "popped" off by opening one (or both) of the doors and pulling the bezel forward. Behind the bezel you will find the three node blades. The blades are placed behind the bezel to provide protection and allow limited access (under normal conditions you will not need to access the blades). There are also six removable (not hot swap) 3.5-inch drive bays across the top on the inside. Depending on the model, there may also be user accessible removable SSD drives (also not hot swap). Local LAN port and Video are labeled on the back. (Note: On AMD Systems, the video port on northerner board is non-functional.) ====Login Node (Main Node): ==== The login node is named ''limulus'' (alias ''headnode''). A unique root password will be provided. The login node is the "outward facing node". It is the traditional motherboard location in the case. Motherboard ports (Video, LAN, USB, etc.) are available on the back of and front of the case. See [[Logging In|Logging In]]. ====Worker Nodes:==== The worker nodes are on blades that are behind the bezel. Blades are removed by turning off the blade power (see below), removing the cables, unscrewing the captive bolts (top and bottom), and pulling the blade out using the captive bolts. The blade position is unique (the same blade must always be placed in the same slot, otherwise the nodes will need to be re-registered). The blades have a number on the small power-supply on the back of the board. Node names begin with "n0" and continue to the total number of workers in the system. Typically, this is ''n0, n1, n2''. Looking at the front of the system with the bezel off, node ''n0'' is in the rightmost position, node ''n1'' is in the middle, and node ''n2'' is in the leftmost position and can be seen when the through the door. ====Networking: ==== All systems have a 1 GbE network. All administrative daemons and shared file systems (NFS) are assigned to this network. The login node acts as a gateway node (i.e. it has an external LAN connection and an internal cluster network connection). The LAN port (as labeled on the case) is set to use DHCP. The internal cluster 1 GbE network uses 10.0.0.0/24 addresses (the login node is 10.0.0.1; the nodes, such as n0, start at 10.0.0.10 and are sequential). The login node is connected to this network by the short blue Ethernet cable on the back of the unit. This cable connects the login node to the internal 1 GbE switch. The internal Ethernet switch (as seen inside the case above the power supply) connects to the internal cluster network. In addition to the blue cable connected to the login node, one additional external cluster Ethernet port is available on the back of the unit for expansion (i.e. a cluster NAS or more nodes). If you have the 10/25 GbE option, nodes can be addressed by using "limulusg, n0g, n1g, n2g." (i.e. addressing nodes with these names will use the 10/25 GbE network instead of the 1 GbE network). The IP addresses mirror the 1 GbE address but use a 10.1.0.0 network. Typically, these are used by MPI applications. If installed, Hadoop systems are configured to use the 10/25 GbE network. The 10/25 GbE network is "switched" by using a 4-port 10/25 GbE card plugged into the login node. Like the 1 GbE network, this board has an open port for expansion. Note, the three ports currently in use on the 10/25 GbE card must be used for the cluster network; the cables do not need to be inserted in a particular order, however, the same three ports on the 10/25 GbE card must be used. On the large eight-motherboard systems two of these 4-port cards are bridged to provide full 10/25 network throughout the cluster. See [[Internal Networks|Internal Networks]]. ====Power Control:==== Power to the three worker nodes is controlled by the login node. There is a set of relays underneath the 1 GbE switch. These relays can be controlled directly by using the ''relayset'' program (''n0'' is connected to relay-2, ''n1'' is connected to relay-3, and ''n2'' is connected to relay-4). However, it is not recommended to control the nodes using ''relayset''. We highly recommended using the ''node-poweron'' and ''node-poweroff'' utilities to control the node power. (Using these commands, as root, with no arguments will power-on/off all nodes. An individual node can be supplied as an argument.) There is also a GUI tool for controlling node power. See [[Powering Up/Down Nodes|Powering Up/Down Nodes]]. **!!On HPC Limulus systems: All nodes are in the powered down state when the system boots!!** To turn the nodes on, enter ''node-poweron''. To turn the nodes off, enter ''node-poweroff'' (the OS on the nodes is shut down gracefully). Use the "-h" option to get a full explanation. ====SSH Access:==== The root user has ''ssh'' access to all nodes, e.g. ''ssh n0''. Users do not have ''ssh'' access to the worker nodes. These nodes are used though the Slurm WorkFlow Scheduler. See [[Submitting Jobs to the Slurm Workload Manager|Submitting Jobs to the Slurm Workload Manager]]. ====File System:==== The file system configuration can vary depending on installation options. In general, there is a solid state NVMe drive on the head node that stores the OS. There is an /opt and /home partition NFS mounted to all nodes. If a RAID option has been included, the RAID drives will be installed in the top drive area in the case and mounted on the head node. Worker nodes use the disk-less Warewulf VNFS filesystem. See [[RAID Storage Management|RAID Storage Management]]. ====Software:==== The system has been installed with CentOS 7. Specific Limulus and OpenHPC software has been installed as RPM packages. The ''yum'' utility can be used to add additional software to the login node. See [[Open HPC Components|OpenHPC Components]]. The worker nodes are provisioned by the Warewulf Cluster Toolkit. Adding and removing software from Warewulf provisioned nodes requires some additional steps. See [[Warewulf Worker Node Images|Warewulf Worker Node Images]]. ====Adaptive Cooling and Filters: ==== The nodes have adaptive cooling. There are two intake fans underneath the worker nodes pushing air up into the nodes from underneath the case. The speed of the intake fans will increase or decrease depending on the node processor temperatures. Note that all intake locations on the case have magnetic dust filters (including the bottom of the case). In dusty environments, these filters will become dirty and can adversely affect the ability of the fans to move air through the case. It is recommended that you check and clean these filters regularly. They can be easily cleaned with water. See [[Adaptive Cooling|Adaptive Cooling]]. ====Monitoring:==== There are two basic ways to monitor the cluster as a whole. The first is the web-based Ganglia interface. (Open a browser and point to ''http://localhost/ganglia''.) Please note, when the nodes start it may require 5-10 minutes before they are fully reporting to the Ganglia host interface. See [[Monitoring System Resources|Monitoring System Resources]]. There is also a command line utility called ''wwtop''. This utility provides a real-time "top"-like interface for the entire cluster. ====Additional Information:==== See the rest of the manual for additional topics and information.