Tips

  1. Job Arrays
  2. Job Dependencies
  3. Job Performance Factors

Job Arrays

Job arrays provide a means of submitting multiple similar jobs. The only difference being an array index, that is incremented for each job, that can be used in your batch script, or codes, to provide a different parameter, load data files, or change the flow of your code.

To submit an array job use the #SBATCH --array=i-n directive. For example:

#!/bin/sh

#SBATCH --mail-type=ALL
#SBATCH --mail-user=uniqname@umich.edu
#SBATCH --time=1-0
#SBATCH --job-name=array_job_test

#SBATCH --array=1-10

srun Rscript ./script.R

This batch script will submit job 10 jobs with names array_job_test[1] … array_job_test[10] incrementing the $SLURM_ARRAY_TASK_ID with each job submission. This environment variable can be access directly in your code, for example in R like this

# grab the array id value from the environment variable passed from sbatch
slurm_arrayid <- Sys.getenv('SLURM_ARRAY_TASK_ID')

# coerce the value to an integer
n <- as.numeric(slurm_arrayid)

You can also use the variable as a command line parameter to your job like this example also in R

#!/bin/sh

#SBATCH --mail-type=ALL
#SBATCH --mail-user=uniqname@umich.edu
#SBATCH --time=1-0
#SBATCH --job-name=array_job_test

#SBATCH --array=1-10

srun Rscript ./script.R --array_id=${SLURM_ARRAY_TASK_ID}

Each job could output its results to seperate files based on this index, or possibly the job id using the $SLURM_JOB_ID environment variables using the same means as the array index. Once all jobs are complete use a separate job to combine these results into your final results.

Job Dependencies

It is possible to delay the start of a job until a specified dependency has been satisfied. For example suppose you have multiple jobs running concurrently that each generate seperate datasets that need to be aggregated as the final step. You could just wait for all the jobs to complete and download the results and combine on your local system. Or you could create another job to aggregate your results and set it dependent on all your jobs.

With dependencies you have many options. Here is the relevant porition of the man page for sbatch:

-d, --dependency=<dependency_list>
       Defer  the  start  of this job until the specified dependencies have been satisfied com‐
       pleted.  <dependency_list> is of the form <type:job_id[:job_id][,type:job_id[:job_id]]>.
       Many  jobs  can  share  the  same dependency and these jobs may even belong to different
       users. The  value may be changed after job submission using the scontrol command.

       after:job_id[:jobid...]
              This job can begin execution after the specified jobs have begun execution.

       afterany:job_id[:jobid...]
              This job can begin execution after the specified jobs have terminated.

       afternotok:job_id[:jobid...]
              This job can begin execution after the specified jobs  have  terminated  in  some
              failed state (non-zero exit code, node failure, timed out, etc).

       afterok:job_id[:jobid...]
              This  job can begin execution after the specified jobs have successfully executed
              (ran to completion with an exit code of zero).

       expand:job_id
              Resources allocated to this job should be used to expand the specified job.   The
              job  to  expand must share the same QOS (Quality of Service) and partition.  Gang
              scheduling of resources in the partition is also not supported.

       singleton
              This job can begin execution after any previously launched jobs sharing the  same
              job name and user have terminated.

This gives you many options for job dependencies. To continue with the example above, all your jobs that run concurrently should have the same –job-name, as well as the aggregation job, allowing you to use the singleton option for your aggregation job. This option will cause this job to wait for all jobs with the same name to complete before it will start. There are many more options for dependencies such as allowing you to handle job failure.

Job Performance Factors

For most work, performance differences between similar jobs do not have an impact on the results of the research. Jobs are submitted and results are received when the job completes. However, some work, like writing new methods and packages and then comparing the speed to solution of these new methods versus generally available packages, is affected by performance differences. Differences in job performance for the same code are caused mainly by three factors: the computers the job runs on, the geometry of the resources allocated within the computer, and how heavily the computer is utilized. The following suggestions should only be implemented if your work is dependent on consistent performance. These suggestions can cause longer queue times and lesser job throughput.

The computers the job runs on

Jobs on the cluster run on one of the many compute nodes which were added piecewise over the lifetime of the cluster beginning in 2010. Currently there are 10 different hardware configurations in the cluster. As you can imagine, not all of these configurations are equally performant. The older computers tend to have slower processors and lower clocked memory than the newer ones. When doing performance sensitive work, such as comparing time to completion between methods, it is critical that you run your jobs on the same hardware configuration. To enable this, all hardware types have unique "features" defined with the workload manager Slurm. You should not request these features if your jobs do not need consistent hardware. When you request a feature, your job will only run on a node with that feature. If all nodes with that feature are busy, your job will wait in the queue until nodes with that feature are free - even if other nodes are idle.

The hardware configurations and their corresponding features may be found here.

To request a given feature, add this line to your batch scripts:
--constraint=<feature>

For example, to run on a PowerEdge R430 with Intel Xeon E5-2650 v4 @ 2.20GHz use:
#SBATCH --constraint=E5-2650v4

You may want to consider including the hardware specifications that your jobs ran on in your papers. For example "This work was performed on a Dell PowerEdge R430 with dual Intel Xeon E5-2650 v4 @ 2.20GHz and 256 GB of Memory".

Utilization of the Computer

The frequency of a CPU core changes depending on the amount of work being done by the processor. This is due to a feature called turboboost. When a processor is heavily utilized, the individual CPU cores operate at the advertised base frequency. When the processor is not doing much work, the frequency of the individual cores increases up to the potential maximum called the turbo boost maximum. Turbo boost is great for getting more work done on a processor with low utilization, but it makes it difficult to have predictable and consistent performance on a compute node which is being used by multiple jobs. In order to minimize the effects of turboboost, you should submit your jobs with the exclusive flag #SBATCH --exclusive. This will guarantee that your job is the only job running on the node. Jobs with the exclusive flag are often in queue for a longer time as they must wait for a node to become idle before they can start.

NUMA

Requesting exclusive jobs also mitigates the effects of Non-Uniform Memory Architecture (NUMA). The compute nodes are all dual-socket, meaning they have 2 processors with memory channels attached to each processor. Jobs running on one CPU core on processor 0 can access the memory on both socket 0 and 1, however accessing the memory on socket 1 is slower. With only one job on a node you will get a consistent NUMA arrangement.

Please only use the exclusive flag when you really need it. Use of this parameter can decrease cluster throughput dramatically.