SLURM Scheduler

The Biostatistics cluster uses Slurm for resource management and job scheduling.

On this page

Common Commands

Detailed Explanations

sbatch

Job Arrays

squeue

scancel

Interactive Jobs

Common Commands

Description	Command
Submit a job	`sbatch <job script>`
Delete a job	`scancel <job ID>`
View the queue (all jobs)	`squeue`
View your jobs in queue	`sq`
Job status (detailed)	`scontrol show job -dd <job ID>`
Hold a job	`scontrol hold <job ID>`
Release a job	`scontrol release <job ID>`
Cluster node status	`sinfo -lN`
Monitor or review a job’s resource usage	`seff <job_ID>`
View job batch script	`sacct-Bj <job_ID>`
View accounts you can submit to	`sacctmgr show assoc user=$USER`

Detailed Explanations

sbatch

sbatch submits a batch script to Slurm. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input. The batch script may contain options preceded with #SBATCH before any executable commands in the script.

sbatch exits immediately after the script is successfully transferred to the Slurm controller and assigned a Slurm job ID. The batch script is not necessarily granted resources immediately, it may sit in the queue of pending jobs for some time before its required resources become available.

When the job allocation is finally granted for the batch script, Slurm runs a single copy of the batch script on the first node in the set of allocated nodes.

$ cat example.slurm
#!/bin/sh

#SBATCH --job-name=hello_world
#SBATCH --time=1:00:00
#SBATCH [email protected]
#SBATCH --mail-type=END,FAIL,BEGIN
#SBATCH --mem=1g
#SBATCH --cpus-per-task=1

R CMD BATCH --no-save --no-restore script.R

$ sbatch example.slurm
Submitted batch job 25618

$ cat slurm-25618.out
# any output to STDOUT would be in this file

Job Arrays

Array jobs provide a mechanism for submitting and managing collections of jobs. Job arrays are only supported from batch jobs. To control the size of the array the –array or -a option is passed to sbatch.

An example batch script would look like

$ cat sample.txt
#!/bin/sh

#SBATCH --mail-type=ALL
#SBATCH [email protected]
#SBATCH --time=1-0
#SBATCH --array=1-100

srun R CMD BATCH ./script.R

squeue

This command is used to view information about the Slurm scheduling queue.

To view you jobs in the queue regardless of their current state.

$ squeue -u $USER

Job Status Codes

Typically your job will be either in the Running state of PenDing state. However here is a breakdown of all the states that your job could be in.

Code	State	Description
CA	CANCELED	Job was explicitly canceled by the user or system administrator. The job may or may not have been initiated.
CD	COMPLETED	Job has terminated all processes on all nodes.
CF	CONFIGURING	Job has been allocated resources, but are waiting for them to become ready for use (e.g. booting).
CG	COMPLETING	Job is in the process of completing. Some processes on some nodes may still be active.
F	FAILED	Job terminated with non-zero exit code or other failure condition.
NF	NODE_FAIL	Job terminated due to failure of one or more allocated nodes.
PD	PENDING	Job is awaiting resource allocation.
R	RUNNING	Job currently has an allocation.
S	SUSPENDED	Job has an allocation, but execution has been suspended.
TO	TIMEOUT	Job terminated upon reaching its time limit.

scancel

Used to signal jobs or job steps that are under the control of Slurm.

scancel is used to signal or cancel jobs or job steps. An arbitrary number of jobs or job steps may be signaled using job specification filters or a space separated list of specific job and/or job step IDs. A job or job step can only be signaled by the owner of that job or user root. If an attempt is made by an unauthorized user to signal a job or job step, an error message will be printed and the job will not be signaled.

$ squeue 
JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
29908 biostat-d     bash  schelcj   R       0:05      2 cn[001-002]
$ scancel 29908
$ squeue
JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
$

Here we see our jobid is 29908 then we cancel that job with the scancel command.

Interactive Jobs

For times when you need an interactive shell to debug code, do post processing, run tests, or do anything that would be outside the appropriate use of the login nodes, you can submit an interactive job that will give you a shell on a compute node.

To submit an interactive job to Slurm, use the salloc command and the srun command in combination:

  $ salloc --time=1:00:00 srun --pty /bin/bash
  salloc: Pending job allocation 12000
  salloc: job 12000 queued and waiting for resources
  salloc: job 12000 has been allocated resources
  salloc: Granted job allocation 12000
  schelcj@cn004:~$

This example creates a shell session for one hour with minimum requestable resources. A potential use case of this type of job could be to debug some code in R. If want to specify resources for this type of job, it would look something like this:

  $ salloc --time=1:00:00 --cpus-per-task=<cpus> --mem=<memory> srun --pty /bin/bash

SLURM Scheduler

Common Commands

Detailed Explanations

sbatch

Job Arrays

squeue

Job Status Codes

scancel

Interactive Jobs

Information For

About Us

Student Resources

Connect