User Tools

Site Tools


system-wide_administration_commands_pdsh

System Wide Commands using pdsh

All Limulus systems have a a mechanism for executing commands on all worker nodes at the same time. The pdsh command allows the same command to be run on any combination of Limulus nodes. By default, and without any arguments, the pdsh command works on all active nodes and uses ssh to issue the command. For example:

# pdsh hostname
n2: n2
n1: n1
n0: n0

Piping the results into sort will keep the returned results ordered by node number (the pipe happens on the node that issued the pdsh command):

# pdsh uptime|sort
n0:  08:58:47 up 153 days, 18:29,  0 users,  load average: 0.00, 0.02, 0.00
n1:  08:58:47 up 153 days, 18:29,  0 users,  load average: 0.09, 0.08, 0.06
n2:  08:58:47 up 153 days, 18:29,  0 users,  load average: 0.13, 0.12, 0.05

pdsh also allows commands to be issued on specific nodes using the “[ ]” syntax. For example, to issue on a range of nodes (in this case trivial), the following command can be used:

# pdsh -w n[0-1] date
n1: Mon Jan 18 09:00:39 EST 2021
n0: Mon Jan 18 09:00:39 EST 2021

Nodes do not need to be “sequential” and can be separated by a comma.

# pdsh -w n[0,1] date
n0: Mon Jan 18 09:00:59 EST 2021
n1: Mon Jan 18 09:00:59 EST 2021

Although not very useful on Limulus systems, you can also use host lists with pdsh. See the man page for details.

Help with pdsh

There is a full man page for pdsh (run man pdsh).

Using pdsh to Copy Files

There may be times when a file needs to be updated across the cluster. Be aware that important files are managed by either Warewulf or Ambari and there is no need to manipulate these files “by hand.”

In addition, there are two NFS-mounted directories that appear across the cluster. On all systems /home is available on all nodes. This configuration is important for HPC systems and actually not really needed on Data Analytics systems (Hadoop). The second system-wide NFS mount depends on the type of system.

  • On HPC systems, /opt/ohpc is mounted on all nodes.
  • On Data Analytics (Hadoop) systems. /opt/cluster is mounted on all nodes.

Under both these mounts is a private admin/etc path. Files needed on all nodes can be conveniently located in these system-wide directories and thus eliminate the need to copy files.

In the event that copying a file is absolutely necessary, the following procedure is the preferred way to copy a file to the nodes (Assume the file name is TEMP-FILE):

  1. Copy the file to the NFS shared admin/etc directory on the headnode (use /opt/ohpc/admin/etc on HPC systems):
    # cp TEMP-FILE /opt/cluster/admin/etc 
  2. Next, use pdsh to copy the file to the /root directory on all the nodes (surround the command with “ or '):
    # pdsh "/opt/cluster/admin/etc/TEMP-FILE /root" 
  3. Check that the file arrived:
    # pdsh ls /root/TEMP-FILE
    n1: /root/TEMP-FILE
    n2: /root/TEMP-FILE
    n0: /root/TEMP-FILE
  4. Finally, to remove the file on all nodes:
    # pdsh rm /root/TEMP-FILE

By keeping a copy of the file in the NFS-mounted ~/admin/etc path, a convenient record of file movement/changes can be consulted in the future.

Some Important Points about pdsh

While pdsh is an immensely useful command, it does have some limitations and cautions.

  1. The pdsh command only works from the headnode (login node) It is not available on the worker nodes.
  2. pdsh cannot be used for interactive commands (e.g. pdsh top will not work). It should be used with commands that “finish.” You can break out of pdsh using multiple ctrl-c commands.
  3. If you want multiple commands to execute on the node, then the command must be surrounded by single or double quotes. For example, if the above sort command were surrounded by quotes, the sort would take place on the target nodes and not on the issuing node. The following example “sorts” on each node (there is nothing to sort) and the results are returned unordered:
     # pdsh "uptime|sort"
    n0:  09:08:14 up 153 days, 18:38,  0 users,  load average: 0.09, 0.08, 0.02
    n2:  09:08:14 up 153 days, 18:38,  0 users,  load average: 0.04, 0.05, 0.01
    n1:  09:08:14 up 153 days, 18:39,  0 users,  load average: 0.02, 0.04, 0.04
  4. Although tempting, using pdsh to make permanent changes on the nodes is not recommended. On HPC systems, any changes will go away on the next restart of the nodes, The Warewulf Cluster toolkit provides a mechanism to globally manage all node configuration details (see VNFS images).

    On Data Analytics Systems (Hadoop), changing nodes by-hand may cause “node personalities” to develop (an unmanaged and unique collection of files and directories) and eventually make managing the system confusing or almost impossible. With few exceptions, full management of the Analytics systems (Hadoop) should be possible through the Ambari Cluster Manager.
system-wide_administration_commands_pdsh.txt · Last modified: 2021/05/19 14:59 by meadline