All Limulus systems have a a mechanism for executing commands on all worker nodes at the same time. The pdsh
command allows the same command to be run on any combination of Limulus nodes. By default, and without any arguments, the pdsh
command works on all active nodes and uses ssh
to issue the command. For example:
# pdsh hostname n2: n2 n1: n1 n0: n0
Piping the results into sort
will keep the returned results ordered by node number (the pipe happens on the node that issued the pdsh
command):
# pdsh uptime|sort n0: 08:58:47 up 153 days, 18:29, 0 users, load average: 0.00, 0.02, 0.00 n1: 08:58:47 up 153 days, 18:29, 0 users, load average: 0.09, 0.08, 0.06 n2: 08:58:47 up 153 days, 18:29, 0 users, load average: 0.13, 0.12, 0.05
pdsh
also allows commands to be issued on specific nodes using the “[ ]” syntax. For example, to issue on a range of nodes (in this case trivial), the following command can be used:
# pdsh -w n[0-1] date n1: Mon Jan 18 09:00:39 EST 2021 n0: Mon Jan 18 09:00:39 EST 2021
Nodes do not need to be “sequential” and can be separated by a comma.
# pdsh -w n[0,1] date n0: Mon Jan 18 09:00:59 EST 2021 n1: Mon Jan 18 09:00:59 EST 2021
Although not very useful on Limulus systems, you can also use host lists with pdsh
. See the man page for details.
There is a full man page for pdsh
(run man pdsh
).
There may be times when a file needs to be updated across the cluster. Be aware that important files are managed by either Warewulf or Ambari and there is no need to manipulate these files “by hand.”
In addition, there are two NFS-mounted directories that appear across the cluster. On all systems /home
is available on all nodes. This configuration is important for HPC systems and actually not really needed on Data Analytics systems (Hadoop). The second system-wide NFS mount depends on the type of system.
/opt/ohpc
is mounted on all nodes. /opt/cluster
is mounted on all nodes.
Under both these mounts is a private admin/etc
path. Files needed on all nodes can be conveniently located in these system-wide directories and thus eliminate the need to copy files.
In the event that copying a file is absolutely necessary, the following procedure is the preferred way to copy a file to the nodes (Assume the file name is TEMP-FILE
):
admin/etc
directory on the headnode (use /opt/ohpc/admin/etc
on HPC systems): # cp TEMP-FILE /opt/cluster/admin/etc
pdsh
to copy the file to the /root
directory on all the nodes (surround the command with “ or '): # pdsh "/opt/cluster/admin/etc/TEMP-FILE /root"
# pdsh ls /root/TEMP-FILE n1: /root/TEMP-FILE n2: /root/TEMP-FILE n0: /root/TEMP-FILE
# pdsh rm /root/TEMP-FILE
By keeping a copy of the file in the NFS-mounted ~/admin/etc
path, a convenient record of file movement/changes can be consulted in the future.
While pdsh
is an immensely useful command, it does have some limitations and cautions.
pdsh
command only works from the headnode (login node) It is not available on the worker nodes.pdsh
cannot be used for interactive commands (e.g. pdsh top
will not work). It should be used with commands that “finish.” You can break out of pdsh
using multiple ctrl-c
commands.# pdsh "uptime|sort" n0: 09:08:14 up 153 days, 18:38, 0 users, load average: 0.09, 0.08, 0.02 n2: 09:08:14 up 153 days, 18:38, 0 users, load average: 0.04, 0.05, 0.01 n1: 09:08:14 up 153 days, 18:39, 0 users, load average: 0.02, 0.04, 0.04
pdsh
to make permanent changes on the nodes is not recommended. On HPC systems, any changes will go away on the next restart of the nodes, The Warewulf Cluster toolkit provides a mechanism to globally manage all node configuration details (see VNFS images).