This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
monitoring_system_resources [2020/11/09 21:39] deadline |
monitoring_system_resources [2021/04/30 15:33] (current) brandonm [Data Analytics Cluster Monitoring] Punctuation and word fixes; also fix 404 link |
||
---|---|---|---|
Line 4: | Line 4: | ||
====Ganglia==== | ====Ganglia==== | ||
- | On HPC systems the popular [[http:// | + | On HPC systems the popular [[http:// |
< | < | ||
http:// | http:// | ||
</ | </ | ||
- | In Firefox (or any other browser that is installed on the system) | + | In Firefox (or any other browser that is installed on the system). The default screen is shown below. |
{{ : | {{ : | ||
Line 17: | Line 17: | ||
{{ : | {{ : | ||
- | Note that in addition to a myriad of other metrics it is possible to observe the CPU temperatures by selecting '' | + | Note that in addition to a myriad of other metrics it is possible to observe the CPU temperatures by selecting '' |
{{ : | {{ : | ||
- | More information on using and configuration can be found at the [[http:// | + | More information on usage and configuration can be found at the [[http:// |
====Warewulf Top (wwtop)==== | ====Warewulf Top (wwtop)==== | ||
- | Warewulf Top is a command line tool for monitoring the state of the cluster. Similar to the '' | + | Warewulf Top is a command line tool for monitoring the state of the cluster. Similar to the '' |
$ wwtop | $ wwtop | ||
Line 62: | Line 62: | ||
This tool is part of the Warewulf cluster distribution | This tool is part of the Warewulf cluster distribution | ||
| | ||
- | + | </ | |
- | </ | + | |
In addition to the temperature updates, '' | In addition to the temperature updates, '' | ||
Line 87: | Line 86: | ||
</ | </ | ||
+ | |||
+ | ==== Slurm Top (slop)==== | ||
+ | |||
+ | A real-time text-based Slurm " | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | The above example shows the Slurm batch queue in the top pane with job-ID partition, user, etc. The bottom pane displays the cluster nodes' metrics, similar to '' | ||
+ | |||
+ | < | ||
+ | Slop (SLurm tOP) displays node statistics and the batch queue on a cluster. | ||
+ | |||
+ | The top window is the batch queue and the bottom window are the hosts. The | ||
+ | windows update automatically and are scrollable with the arrow keys. A " | ||
+ | indicates that the list will scroll further. Available options: | ||
+ | q - to quit userstat | ||
+ | h - to get this help | ||
+ | b - to make the batch window active | ||
+ | n - to make the nodes window active | ||
+ | spacebar - update windows (automatic update after 20 seconds) | ||
+ | up_arrow - to move though the jobs or nodes window | ||
+ | down_arrow - to move though the jobs or nodes window | ||
+ | Pg Up/Down - move a whole page in the jobs or nodes window | ||
+ | |||
+ | Queue Window Commands: | ||
+ | j - sort on job-ID | ||
+ | u - sort on user name a - redisplay all hosts | ||
+ | p - sort on program name | ||
+ | a - redisplay all jobs | ||
+ | d - delete a job from the queue | ||
+ | return - display only the nodes for that job | ||
+ | (When sorting on multiple parameters all matches are displayed.) | ||
+ | |||
+ | Press ' | ||
+ | </ | ||
+ | |||
+ | A useful feature of '' | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | If more specific node information is needed, a standard '' | ||
+ | on a node in the lower Host pane (switch to the host pane by entering " | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | For more information on '' | ||
======Data Analytics Cluster Monitoring====== | ======Data Analytics Cluster Monitoring====== | ||
+ | |||
+ | Data analytics systems (i.e. Hadoop/ | ||
+ | |||
+ | {{ : |