Systems Manual

Limulus Data Analytics (Hadoop/Spark/Kafka) Systems Quick Start

The following is basic information about your Limulus system. Please consult the other sections in this manual for further details. A general description of the systems can be found by consulting the Functional Diagram.

Adding Users

There are two scripts for adding and deleting users:

AddUser
DelUser

There is also a GUI tool for adding and deleting users. The GUI can be be found in the root Applications/System Tools menu or started from the command line using:

$ AddUser -g

See Adding Users.

The Case:

The case has two doors on both sides and a removable front bezel. The front bezel can be “popped” off by opening one (or both) of the doors and pulling the bezel forward. Behind the bezel you will find the three node blades. The blades are placed behind the bezel to provide protection and allow limited access (under normal conditions you will not need to access the blades). There are also six removable (not hot swap) 3.5-inch drive bays across the top on the inside. Depending on the model, there may also be user-accessible removable SSD drives (also not hot swap). Local LAN port and Video are labeled on the back. (Note: On AMD Systems, the video port on the motherboard is non-functional.)

Login Node (Main Node):

The login node is named limulus (alias headnode). You will be provided a unique root password. The login node is the “outward facing node.” It is the traditional motherboard location in the case. Motherboard ports (Video, LAN, USB, etc.) are available on the back of and front of the case. See Logging In.

Worker Nodes:

The worker nodes are on blades that are behind the bezel. Blades are removed by turning off the blade power (see below), removing the cables, unscrewing the captive bolts (top and bottom), and pulling the blade out using the captive bolts. The blade positions are unique (the same blade must always be placed in the same slot, due to DHCP booting (HPC systems) or the attached HDFS drives (Hadoop systems)). The blades have a number on the small power-supply on the back of the board.

Node names begin with “n0” and continue to the total number of workers in the system. Typically, this is n0, n1, n2. Looking at the front of the system with the bezel off, node n0 is in the right most position. node n1 is in the middle, and node n2 is in the left most position and can be seen when looking through the door.

Networking:

All systems have a 1 GbE network. All administrative daemons and shared file systems (NFS) are assigned to this network. The login node acts as a gateway node (i.e. it has an external LAN connection and an internal cluster network connection). The LAN port (as labeled on the case) is set to use DHCP. The internal cluster 1 GbE network uses 10.0.0.0/24 addresses (the login node is 10.0.0.1; the nodes, such as n0, start at 10.0.0.10 and are sequential). The login node is connected to this network by the short blue Ethernet cable on the back of the unit. This cable connects the login node to the internal 1 GbE switch.

The internal Ethernet switch (as seen inside the case above the power supply) connects to the internal cluster network. In addition to the blue cable connected to the login node, one additional external cluster Ethernet port is available on the back of the unit for expansion (i.e. a cluster NAS or more nodes).

If you have the 10/25 GbE option, nodes can be addressed by using “limulusg, n0g, n1g, n2g.” (i.e. addressing nodes with these names will use the 10/25 GbE network instead of the 1 GbE network). The IP addresses mirror the 1 GbE address but use a 10.1.0.0 network. Typically, these are used by MPI applications. If installed, Hadoop systems are configured to use the 10/25 GbE network.

The 10/25 GbE network is “switched” by using a 4-port 10/25 GbE card plugged into the login node. Like the 1 GbE network, this board has an open port for expansion. Note, the three ports currently in use on the 10/25 GbE card must be used for the cluster network; the cables do not need to be inserted in a particular order, however, the same three ports on the 10/25 GbE card must be used. On the large eight-motherboard systems two of these 4-port cards are bridged to provide full 10/25 network throughout the cluster. See Internal Networks.

Power Control:

Power to the three worker nodes is controlled by the login node. There is a set of relays underneath the 1 GbE switch. These relays can be controlled directly by using the relayset program (n0 is connected to relay-2, n1 is connected to relay-3, and n2 is connected to relay-4). However, it is not recommended to control the nodes using relayset. We highly recommended using the node-poweron and node-poweroff utilities to control the node power. (Using these commands, as root, with no arguments will power-on/off all nodes. An individual node can be supplied as an argument.) There is also a GUI tool for controlling node power. See Powering Up/Down Nodes.

!!On HADOOP Limulus systems all nodes are powered on when the system starts!!

If for some reason the nodes are not operating (check the power GUI mentioned above), enter node-poweron or use the GUI tool to try and start the nodes. To turn the nodes off enter node-poweroff (the OS on the nodes is shut down gracefully). Use the “-h” option to get a full explanation.

Hadoop Services Startup

The Auto Start feature is not currently working. To start all the Hadoop services, log into the Ambari interface (http:/localhost:8080) and click on the three dots next to Services in the vertical menu on the left side and select Start All. The Ambari login is admin, the password is provided elsewhere.

Please see Using the Apache Ambari Cluster Manager for more information on how to startup and shutdown of the cluster-wide Hadoop services.

SSH Access:

The root user has ssh access to all nodes, e.g. ssh n0. Users do not have ssh access to the worker nodes. These nodes are used though the Slurm WorkFlow Scheduler. See Using the Apache Ambari Cluster Manager.

File System:

The file system configuration can vary depending on installation options. In general, there is a solid state NVMe drive on the head node that stores the OS. There is an /opt and /home partition NFS mounted to all nodes. If a RAID option has been included, the RAID drives will be installed in the top drive area in the case and mounted on the head node.

On Hadoop systems, each node has a NVMe drive for the OS and Hadoop software. One or two additional SATA drives are also attached to each node. These drives are used for Hadoop HDFS storage.

Software:

The system has been installed with CentOS 7 as the base distribution. Specific Limulus RPM packages that provide management tools, an updated kernel, power control drivers, etc. have also been installed. The yum utility can be used to add additional software to the login node.

On Hadoop systems, all node software is managed by the Ambari management system (on the head node, open local browser to http:/localhost:8080). See Using the Apache Ambari Cluster Manager.

Adaptive Cooling and Filters:

The nodes have adaptive cooling. There are two intake fans underneath the worker nodes pushing air up into the nodes from underneath the case. The speed of the intake fans will increase or decrease depending on the node processor temperatures. Note that all intake locations on the case have magnetic dust filters (including the bottom of the case). In dusty environments, these filters will become dirty and can adversely affect the ability of the fans to move air through the case. It is recommended that you check and clean these filters regularly. They can be easily cleaned with water. See Adaptive Cooling.

Monitoring:

On Hadoop systems, the Apache Ambari Cluster Manager provides complete node (and Hadoop) monitoring.

In addition, there is also a command line utility called wwtop. This utility provides a real-time “top”-like interface for the entire cluster (including node temperatures).

Additional Information:

See the rest of this manual for additional topics and information.