===== System Wide Commands using pdsh ===== All Limulus systems have a a mechanism for executing commands on all worker nodes at the same time. The ''pdsh'' command allows the same command to be run on any combination of Limulus nodes. By default, and without any arguments, the ''pdsh'' command works on **all active nodes** and uses ''ssh'' to issue the command. For example: # pdsh hostname n2: n2 n1: n1 n0: n0 Piping the results into ''sort'' will keep the returned results ordered by node number (the pipe happens on the node that issued the ''pdsh'' command): # pdsh uptime|sort n0: 08:58:47 up 153 days, 18:29, 0 users, load average: 0.00, 0.02, 0.00 n1: 08:58:47 up 153 days, 18:29, 0 users, load average: 0.09, 0.08, 0.06 n2: 08:58:47 up 153 days, 18:29, 0 users, load average: 0.13, 0.12, 0.05 ''pdsh'' also allows commands to be issued on specific nodes using the "[ ]" syntax. For example, to issue on a range of nodes (in this case trivial), the following command can be used: # pdsh -w n[0-1] date n1: Mon Jan 18 09:00:39 EST 2021 n0: Mon Jan 18 09:00:39 EST 2021 Nodes do not need to be "sequential" and can be separated by a comma. # pdsh -w n[0,1] date n0: Mon Jan 18 09:00:59 EST 2021 n1: Mon Jan 18 09:00:59 EST 2021 Although not very useful on Limulus systems, you can also use host lists with ''pdsh''. See the man page for details. ==== Help with pdsh ==== There is a full man page for ''pdsh'' (run ''man pdsh''). ==== Using pdsh to Copy Files ==== There may be times when a file needs to be updated across the cluster. Be aware that important files are managed by either [[Warewulf Worker Node Images|Warewulf]] or [[ Using the Apache Ambari Cluster Manager|Ambari ]] and there is no need to manipulate these files "by hand." In addition, there are two NFS-mounted directories that appear across the cluster. On all systems ''/home'' is available on all nodes. This configuration is important for HPC systems and actually not really needed on Data Analytics systems (Hadoop). The second system-wide NFS mount depends on the type of system. * On HPC systems, ''/opt/ohpc'' is mounted on all nodes. * On Data Analytics (Hadoop) systems. ''/opt/cluster'' is mounted on all nodes. Under both these mounts is a private ''admin/etc'' path. Files needed on all nodes can be conveniently located in these system-wide directories and thus eliminate the need to copy files. In the event that copying a file is absolutely necessary, the following procedure is the preferred way to copy a file to the nodes (Assume the file name is ''TEMP-FILE''): - Copy the file to the NFS shared ''admin/etc'' directory on the headnode (use ''/opt/ohpc/admin/etc'' on HPC systems): # cp TEMP-FILE /opt/cluster/admin/etc - Next, use ''pdsh'' to copy the file to the ''/root'' directory on all the nodes (**surround the command with " or '**): # pdsh "/opt/cluster/admin/etc/TEMP-FILE /root" - Check that the file arrived: # pdsh ls /root/TEMP-FILE n1: /root/TEMP-FILE n2: /root/TEMP-FILE n0: /root/TEMP-FILE - Finally, to remove the file on all nodes:# pdsh rm /root/TEMP-FILE By keeping a copy of the file in the NFS-mounted ''~/admin/etc'' path, a convenient record of file movement/changes can be consulted in the future. ==== Some Important Points about pdsh ==== While ''pdsh'' is an immensely useful command, it does have some limitations and cautions. - The ''pdsh'' command only works from the headnode (login node) It is not available on the worker nodes. - ''pdsh'' cannot be used for interactive commands (e.g. ''pdsh top'' will not work). It should be used with commands that "finish." You can break out of ''pdsh'' using multiple ''ctrl-c'' commands. - If you want multiple commands to execute on the node, then the command must be surrounded by single or double quotes. For example, if the above sort command were surrounded by quotes, the sort would take place on the target nodes and not on the issuing node. The following example "sorts" on each node (there is nothing to sort) and the results are returned unordered: # pdsh "uptime|sort" n0: 09:08:14 up 153 days, 18:38, 0 users, load average: 0.09, 0.08, 0.02 n2: 09:08:14 up 153 days, 18:38, 0 users, load average: 0.04, 0.05, 0.01 n1: 09:08:14 up 153 days, 18:39, 0 users, load average: 0.02, 0.04, 0.04 - Although tempting, using ''pdsh'' to make permanent changes on the nodes is **not recommended**. On HPC systems, any changes will go away on the next restart of the nodes, The Warewulf Cluster toolkit provides a mechanism to globally manage all node configuration details (see [[ Warewulf Worker Node Images| VNFS images]]). \\ \\ On Data Analytics Systems (Hadoop), changing nodes by-hand may cause "node personalities" to develop (an unmanaged and unique collection of files and directories) and eventually make managing the system confusing or almost impossible. With few exceptions, full management of the Analytics systems (Hadoop) should be possible through the [[ Using the Apache Ambari Cluster Manager|Ambari Cluster Manager]].