Submitting Jobs to the Slurm Workload Manager

All Limulus HPC systems use a workflow (user job) scheduler called Slurm. The purpose of the workflow scheduler is to distribute user programs across the cluster based on the amount of resources needed by each job, since the amount of user programs may exceed the amount of resources. For example, if each program needs one core and the cluster has 32 cores, then running more than 32 programs will oversubscribe the cluster resources. To manage a possible over-subscribed situation, the workflow scheduler will place all jobs in a work queue and run jobs (empty the queue) with a first-in then first-out scheduling policy. (The scheduling policy can be modified to use priorities, etc.)

Job Submission Tutorial

The following example will submit a single job to the Slurm workflow scheduler. The user is logged into the headnode. The first step is to create a simple program called hello-world.c (the classic example program).

Use an editor and create (copy-and-paste) a file with the following simple code.

#include <stdio.h>
int main() {
    printf("Hello, World!");
    return 0;

Next compile the program to make a binary called hello-world and test to see that it works.

$ gcc hello-world.c -o hello-world
$ ./hello-world 
Hello, World!

As a user you could run this on the head (login) node of the cluster, however, it is not a good thing to do if there are many users or other programs running together on the head node. (On some clusters, user applications running on the head node are automatically killed after a certain number of minutes.)

To run the hello-world program through Slurm, a “submit script” must be created. Using an editor, create a file called and copy and paste the following into the file.


# Even thought these look like comments they are actually variables used by Slurm

#SBATCH -J first-job              # Job name
#SBATCH -o first-slurm-job.%j.out # Name of unique stdout output file (%j expands to jobId)
#SBATCH -N 1                      # Total number of nodes requested
#SBATCH -n 1                      # Total number of mpi tasks requested
#SBATCH -t 01:30:00               # Run time (hh:mm:ss) - 1.5 hours

# This file is an example Slurm submission script. It resembles a bash script
# and is actually interpreted as such. Thus basic shell commands can be executed
# with the results captured in the output file. 

echo "First Slurm Job"
# run system binary files
# run your binary file, with the use of "./" indicating "this directory"
echo -n " from cluster node "
#take a short nap
sleep 20

The is a bash script file that tells Slurm how to run the program. The #SBATCH lines are not comments, but are actually directives telling Slurm how to run the applications. The directives are self-explanatory except for the output file name. Since each program submitted to Slurm will produce a unique output that can be viewed at a later time, the output file has a %j in the name. Slurm will substitute the job number for this variable, creating a unique file. (Slurm assigns every job submitted to the queue a unique job number, starting at 1 and incrementing with each new submission.)

To run the program using Slurm simply enter:

$ sbatch 

Slurm will reply with a job number:

Submitted batch job 661

The work queue can be examined using the squeue command:

$ squeue
               661    normal first-jo  testing  R       0:04      1 headnode

If the job (in this case job number 661) is finished, then it will not be listed in the work queue. As mentioned, the job output will be placed in a file called first-slurm-job.661.out and should look similar to the following.

First Slurm Job
Wed Jul  1 09:55:42 EDT 2020
 09:55:42 up 8 days, 18:42,  4 users,  load average: 0.06, 0.04, 0.00
Hello, World! from cluster node headnode

There are a few points to note about the above process.

First, the hostname command in the script prints the node name where the job was run. Limulus nodes have the following names: headnode, n0, n1, n2. (The double-wide units go to n6.) In this case, the job was run on the head node because some of its cores are included in the Slurm resources (this setting is adjustable). If six jobs are submitted at the same time then the “overflow” work will be placed on the nodes. For example:

$ sbatch ; sbatch; sbatch; sbatch; sbatch; sbatch
Submitted batch job 663
Submitted batch job 664
Submitted batch job 665
Submitted batch job 666
Submitted batch job 667
Submitted batch job 668

Checking squeue indicates that the first four jobs were assigned to the headnode and the next two were assigned to node n0.

$ squeue
               663    normal first-jo  testing  R       0:04      1 headnode
               664    normal first-jo  testing  R       0:04      1 headnode
               665    normal first-jo  testing  R       0:04      1 headnode
               666    normal first-jo  testing  R       0:04      1 headnode
               667    normal first-jo  testing  R       0:04      1 n0
               668    normal first-jo  testing  R       0:04      1 n0

If the output file from job 668 is inspected, the line Hello, World! from cluster node n0 indicates that the job was run on n0.

Second, any program errors are sent to the output file. Slurm does not care what you run or whether it works, it basically will run the job when resources are available and report the results.

Finally, jobs that run on Slurm can request multiple cores on a single node or spread them across multiple nodes. (See the Compiling and Running an MPI Application section.) It can also be configured so that programs must request an amount of memory per node so that any memory contention issues can be avoided. As indicated above, the amount of run time is requested, after which Slurm will kill your job. Slurm can be configured to set priorities based on the user or the type and amount of resources required.