User Tools

Site Tools


submitting_jobs_to_the_slurm_workload_manager

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
submitting_jobs_to_the_slurm_workload_manager [2021/04/28 14:40]
brandonm [Submitting Jobs to the Slurm Workload Manager] Insert helpful commas
submitting_jobs_to_the_slurm_workload_manager [2021/04/28 14:51] (current)
brandonm [Job Submission Tutorial] Word, punctuation, case, spacing, and formatting fixes
Line 39: Line 39:
 #SBATCH -t 01:30:00               # Run time (hh:mm:ss) - 1.5 hours #SBATCH -t 01:30:00               # Run time (hh:mm:ss) - 1.5 hours
  
-# This file is an example Slurm submission script. It resemble a bash script +# This file is an example Slurm submission script. It resembles a bash script 
-# and is actual interpreted as such. Thus basic shell commands can be executed+# and is actually interpreted as such. Thus basic shell commands can be executed
 # with the results captured in the output file.  # with the results captured in the output file. 
  
 echo "First Slurm Job" echo "First Slurm Job"
-# run system binary files+# run system binary files
 date date
 uptime uptime
-# run your binary file, now the use of "./" indicating "this directory"+# run your binary file, with the use of "./" indicating "this directory"
 ./hello-world ./hello-world
 echo -n " from cluster node " echo -n " from cluster node "
Line 56: Line 56:
  
 The ''first-slurm-job.sh'' is a bash script file that tells Slurm how to run the program. The ''#SBATCH'' lines The ''first-slurm-job.sh'' is a bash script file that tells Slurm how to run the program. The ''#SBATCH'' lines
-are not comments, but are actually directives telling Slurm how to run the applications. The directives are self-explanatory except for the output file name. Since each program submitted to Slurm will produce a unique output that can be viewed at a later time, the output file has a ''%j'' in the name. Slurm will substitute the job number for this variable creating a unique file. (e.g. Slurm assigns every job submitted to the queue a unique job number, starting at 1 and incrementing with each new submission.)+are not comments, but are actually directives telling Slurm how to run the applications. The directives are self-explanatory except for the output file name. Since each program submitted to Slurm will produce a unique output that can be viewed at a later time, the output file has a ''%j'' in the name. Slurm will substitute the job number for this variablecreating a unique file. (Slurm assigns every job submitted to the queue a unique job number, starting at 1 and incrementing with each new submission.)
  
 To run the program using Slurm simply enter: To run the program using Slurm simply enter:
Line 69: Line 69:
 </code> </code>
  
-The work queue can be examined using the ''squeue'' command+The work queue can be examined using the ''squeue'' command:
 <code> <code>
 $ squeue $ squeue
Line 86: Line 86:
 There are a few points to note about the above process. There are a few points to note about the above process.
  
-First, the ''hostname'' command in the script prints the node name where the job was run. Limulus nodes have the following names ''headnode'', ''n0'', ''n1'', ''n2'' (The double-wide units go to ''n6'') In this case, the job was run on the head node because some of its cores are included in the Slurm resources (this setting is adjustable). If six jobs are submitted at the same time then the "overflow" work will be placed on the nodes. For example:+First, the ''hostname'' command in the script prints the node name where the job was run. Limulus nodes have the following names''headnode'', ''n0'', ''n1'', ''n2''(The double-wide units go to ''n6''.) In this case, the job was run on the head node because some of its cores are included in the Slurm resources (this setting is adjustable). If six jobs are submitted at the same time then the "overflow" work will be placed on the nodes. For example:
  
 <code> <code>
Line 96: Line 96:
 Submitted batch job 667 Submitted batch job 667
 Submitted batch job 668 Submitted batch job 668
-<code>+</code>
  
 Checking ''squeue'' indicates that the first four jobs were assigned to the headnode and the next two were assigned to node ''n0''. Checking ''squeue'' indicates that the first four jobs were assigned to the headnode and the next two were assigned to node ''n0''.
 <code> <code>
-[testing@headnode slurm-tests]$ squeue+$ squeue
              JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)              JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                663    normal first-jo  testing  R       0:04      1 headnode                663    normal first-jo  testing  R       0:04      1 headnode
Line 110: Line 110:
 </code> </code>
  
-If the output file from job 668 is inspected, the line ''Hello, World!from cluster node n0'' indicates that the job was run on ''n0''.+If the output file from job 668 is inspected, the line ''Hello, World! from cluster node n0'' indicates that the job was run on ''n0''.
  
 Second, any program errors are sent to the output file. Slurm does not care what you run or whether it works, it basically will run the job when resources are available and report the results.  Second, any program errors are sent to the output file. Slurm does not care what you run or whether it works, it basically will run the job when resources are available and report the results. 
  
-Finally, Jobs that run on Slurm can request multiple cores on a single node or spread them across multiple nodes. See the [[compiling_and_running_an_mpi_application|Compiling and Running and MPI Application]] section) It can also be configured so that programs must request an amount of memory per node so that any memory contention issues can be avoided. As indicated above, the amount of run time is requested after which Slurm will kill your job. Slurm can be configured to set priorities based on the user or the type and amount of resources required.  +Finally, jobs that run on Slurm can request multiple cores on a single node or spread them across multiple nodes. (See the [[compiling_and_running_an_mpi_application|Compiling and Running an MPI Application]] section.) It can also be configured so that programs must request an amount of memory per node so that any memory contention issues can be avoided. As indicated above, the amount of run time is requestedafter which Slurm will kill your job. Slurm can be configured to set priorities based on the user or the type and amount of resources required.  
  
submitting_jobs_to_the_slurm_workload_manager.1619620815.txt.gz · Last modified: 2021/04/28 14:40 by brandonm