User Tools

Site Tools


essential_quick_start

Limulus HPC Systems Quick Start

The following is basic information about your Limulus system. Please consult the other sections in this manual for further details.

The Case:

The case has two doors on both sides and a removable front bezel. The front bezel can be “popped” off by opening one (or both) of the doors and pulling the bezel forward. Behind the bezel you will find removable (not full hot swap) hard drives and the node blades. These items are placed behind the bezel to provided protection and allow limited access (under normal condition you will not need to access the nodes or hard drives).

Login Node (Main Node):

The login node is named “limulus.” The root password is “changeme” The login node is the “outward facing node” It is the traditional motherboard location in the case. Motherboard ports (Video, LAN, USB, etc.) are available on the back of and front of the case.

Worker Nodes:

The worker nodes are on blades that are behind the bezel. Blades are removed by turning off the blade power (see below), removing the cables, unscrewing the captive bolts (top and bottom), and pulling the blade out using the captive bolts. The blade position is unique (the same blade must always be placed in the same slot, otherwise the nodes need to be re-registered) The blades have a number on small power-supply on the back of the board. Node names begin with “n0” and continue to the total number of workers in the system. Typically, this is “n0, n1, n2.” Looking at the front of the system with the bezel off, node n0 is right most position. node n1 is in the middle, node n2 is in the left most position and can be seen when the through the door.

Networking:

All systems have a 1 GbE network. All administrative daemons and shared file systems (NFS) are assigned to this network. The login node acts as a gateway node (i.e. it has a n external LAN connection and an internal cluster network connection). The LAN port (as labeled on the case) is set to use DHCP. The internal cluster 1 GbE network uses 10.0.1.0 address (the login node is 10.0.0.1, the nodes, n0, start as 10.0.1.10 and are sequential. The Login node is connected to this network by the short blue Ethernet cable on the back of the unit. This cable connects the login node to the internal 1 GbE switch.

The internal Ethernet switch (as seen inside the case above the power supply) connect to the internal cluster network. In addition to the blue cable connected to login node, one additional external cluster Ethernet port is available on the back of the unit for expansion (i.e. a cluster NAS or more nodes). If you have the 10 GbE option, nodes can be addresses by using is “limulusg, n0g, n1g, n2g.” (i.e. addressing nodes with these names will use the 10 GbE network instead of the 1 GbE network). the IP address mirror the 1Gbe address but use a 10.1.0.0 network. Typically, these are used by MPI applications. The 10GbE network is “switched” by using a 4-port 10GbE card plugged into the login node. Like the 1 GbE network, this board has an open port for expansion. Note, the three ports currently in use on the 10 GbE card must be used for the cluster network, there is no particular order to insert the cables, however, the same three ports on the 10 GbE card must be used.

Power Control:

Power to the three worker nodes is controlled by the login node. There are a set of relays underneath the 1 GbE switch. These relays can be controlled directly by using the relayset program (n0 is connected to relay-2, n1 is connected to relay-3, and n2 is connected to relay-4) However, it is not recommended to control the nodes using relayset. We highly recommended using the node-poweron and node-poweroff utilities to control the node power (Using these commands, as root, with no arguments will power-on/off all nodes. An individual node can be supplied as an argument.)

All nodes are in the powered down state when the systems boots. To turn the nodes on, enter “node-poweron”

SSH Access:

The root user has ssh access to all nodes e.g. “ssh n0” Users do not have ssh access to the worker nodes. These nodes are used though the Slurm WorkFlow Scheduler.

File System:

The file system configuration can vary depending installation options. In general, there is a solid state nvme drive on the head node that stores all system files. There is a “/scratch” partition for use by applications on this drive. The two spinning disks are configured as a RAID1 (mirrored) XFS file system and mounted as /home. The /home file system is mounted on all the nodes using NFS.

Software:

The system has been installed with CentOS7. Specific Limulus and OpenHPC software (as RPMs) has been installed. The yum utility can be used to add additional software to the login node. The worker nodes are provisioned by the Warewulf Cluster Toolkit. Adding and removing software from Warewulf provisioned nodes requires some additional steps.

Adaptive Cooling and Filters:

The nodes have adaptive cooling. There are two intake fans underneath the worker nodes pushing air up into the nodes from underneath the case. The speed of the intake fans will increase or decrease depending on the node processor temperatures. Note that all intake locations on the case have magnetic dust filters (including the bottom of the case). In dusty environments, these filters will become dirty and can adversely affect the ability of the fans to move air through case. It is recommended that you check and clean these filters regularly. They can be easily cleaned with water.

Monitoring:

There are two basic ways to monitor the cluster as a whole. The first is the web-based Ganglia interface (http://localhost/ganglia). Please note when the nodes start it may require 5-10 minutes before they are fully reporting to the Ganglia host interface. There is also a command line utility called wwtop. This utility provides a real-time “top” like interface for the entire cluster.

Additional Information:

We are currently updating our Limulus documentation to a new wiki format. When complete we will provide an RPM to install on the cluster. You may send questions and comments to support@basement-supercomputing.com

essential_quick_start.txt · Last modified: 2020/07/28 16:05 by deadline