=====Limulus Data Analytics (Hadoop/Spark/Kafka) Systems Quick Start===== The following is basic information about your Limulus system. Please consult the other sections in this manual for further details. A general description of the systems can be found by consulting the [[Functional Diagram|Functional Diagram]]. ====Adding Users==== There are two scripts for adding and deleting users: AddUser DelUser There is also a GUI tool for adding and deleting users. The GUI can be be found in the root ''Applications/System Tools'' menu or started from the command line using: $ AddUser -g See [[Adding Users|Adding Users]]. ====The Case: ==== The case has two doors on both sides and a removable front bezel. The front bezel can be "popped" off by opening one (or both) of the doors and pulling the bezel forward. Behind the bezel you will find the three node blades. The blades are placed behind the bezel to provide protection and allow limited access (under normal conditions you will not need to access the blades). There are also six removable (not hot swap) 3.5-inch drive bays across the top on the inside. Depending on the model, there may also be user-accessible removable SSD drives (also not hot swap). Local LAN port and Video are labeled on the back. (Note: On AMD Systems, the video port on the motherboard is non-functional.) ====Login Node (Main Node): ==== The login node is named ''limulus'' (alias ''headnode''). You will be provided a unique root password. The login node is the "outward facing node." It is the traditional motherboard location in the case. Motherboard ports (Video, LAN, USB, etc.) are available on the back of and front of the case. See [[Logging In|Logging In]]. ====Worker Nodes:==== The worker nodes are on blades that are behind the bezel. Blades are removed by turning off the blade power (see below), removing the cables, unscrewing the captive bolts (top and bottom), and pulling the blade out using the captive bolts. The blade positions are unique (the same blade must always be placed in the same slot, due to DHCP booting (HPC systems) or the attached HDFS drives (Hadoop systems)). The blades have a number on the small power-supply on the back of the board. Node names begin with "n0" and continue to the total number of workers in the system. Typically, this is ''n0, n1, n2''. Looking at the front of the system with the bezel off, node ''n0'' is in the right most position. node ''n1'' is in the middle, and node ''n2'' is in the left most position and can be seen when looking through the door. ====Networking: ==== All systems have a 1 GbE network. All administrative daemons and shared file systems (NFS) are assigned to this network. The login node acts as a gateway node (i.e. it has an external LAN connection and an internal cluster network connection). The LAN port (as labeled on the case) is set to use DHCP. The internal cluster 1 GbE network uses 10.0.0.0/24 addresses (the login node is 10.0.0.1; the nodes, such as n0, start at 10.0.0.10 and are sequential). The login node is connected to this network by the short blue Ethernet cable on the back of the unit. This cable connects the login node to the internal 1 GbE switch. The internal Ethernet switch (as seen inside the case above the power supply) connects to the internal cluster network. In addition to the blue cable connected to the login node, one additional external cluster Ethernet port is available on the back of the unit for expansion (i.e. a cluster NAS or more nodes). If you have the 10/25 GbE option, nodes can be addressed by using "limulusg, n0g, n1g, n2g." (i.e. addressing nodes with these names will use the 10/25 GbE network instead of the 1 GbE network). The IP addresses mirror the 1 GbE address but use a 10.1.0.0 network. Typically, these are used by MPI applications. If installed, Hadoop systems are configured to use the 10/25 GbE network. The 10/25 GbE network is "switched" by using a 4-port 10/25 GbE card plugged into the login node. Like the 1 GbE network, this board has an open port for expansion. Note, the three ports currently in use on the 10/25 GbE card must be used for the cluster network; the cables do not need to be inserted in a particular order, however, the same three ports on the 10/25 GbE card must be used. On the large eight-motherboard systems two of these 4-port cards are bridged to provide full 10/25 network throughout the cluster. See [[Internal Networks|Internal Networks]]. ====Power Control:==== Power to the three worker nodes is controlled by the login node. There is a set of relays underneath the 1 GbE switch. These relays can be controlled directly by using the ''relayset'' program (''n0'' is connected to relay-2, ''n1'' is connected to relay-3, and ''n2'' is connected to relay-4). However, it is not recommended to control the nodes using ''relayset''. We highly recommended using the ''node-poweron'' and ''node-poweroff'' utilities to control the node power. (Using these commands, as root, with no arguments will power-on/off all nodes. An individual node can be supplied as an argument.) There is also a GUI tool for controlling node power. See [[Powering Up/Down Nodes|Powering Up/Down Nodes]]. **!!On HADOOP Limulus systems all nodes are powered on when the system starts!!** If for some reason the nodes are not operating (check the power GUI mentioned above), enter ''node-poweron'' or use the GUI tool to try and start the nodes. To turn the nodes off enter ''node-poweroff'' (the OS on the nodes is shut down gracefully). Use the "-h" option to get a full explanation. ====Hadoop Services Startup=== The Auto Start feature is not currently working. To start all the Hadoop services, log into the Ambari interface (http:/localhost:8080) and click on the three dots next to ''Services'' in the vertical menu on the left side and select ''Start All''. The Ambari login is ''admin'', the password is provided elsewhere. Please see [[Using the Apache Ambari Cluster Manager|Using the Apache Ambari Cluster Manager ]] for more information on how to startup and shutdown of the cluster-wide Hadoop services. ====SSH Access:==== The root user has ''ssh'' access to all nodes, e.g. ''ssh n0''. Users do not have ''ssh'' access to the worker nodes. These nodes are used though the Slurm WorkFlow Scheduler. See [[Using the Apache Ambari Cluster Manager|Using the Apache Ambari Cluster Manager]]. ====File System:==== The file system configuration can vary depending on installation options. In general, there is a solid state NVMe drive on the head node that stores the OS. There is an /opt and /home partition NFS mounted to all nodes. If a RAID option has been included, the RAID drives will be installed in the top drive area in the case and mounted on the head node. On Hadoop systems, each node has a NVMe drive for the OS and Hadoop software. One or two additional SATA drives are also attached to each node. These drives are used for Hadoop HDFS storage. ====Software:==== The system has been installed with CentOS 7 as the base distribution. Specific Limulus RPM packages that provide management tools, an updated kernel, power control drivers, etc. have also been installed. The ''yum'' utility can be used to add additional software to the login node. On Hadoop systems, all node software is managed by the Ambari management system (on the head node, open local browser to ''http:/localhost:8080''). See [[Using the Apache Ambari Cluster Manager|Using the Apache Ambari Cluster Manager]]. ====Adaptive Cooling and Filters: ==== The nodes have adaptive cooling. There are two intake fans underneath the worker nodes pushing air up into the nodes from underneath the case. The speed of the intake fans will increase or decrease depending on the node processor temperatures. Note that all intake locations on the case have magnetic dust filters (including the bottom of the case). In dusty environments, these filters will become dirty and can adversely affect the ability of the fans to move air through the case. It is recommended that you check and clean these filters regularly. They can be easily cleaned with water. See [[Adaptive Cooling|Adaptive Cooling]]. ====Monitoring:==== On Hadoop systems, the [[Using the Apache Ambari Cluster Manager|Apache Ambari Cluster Manager]] provides complete node (and Hadoop) monitoring. In addition, there is also a command line utility called wwtop. This utility provides a real-time "top"-like interface for the entire cluster (including node temperatures). ====Additional Information:==== See the rest of this manual for additional topics and information.