R
Running R in Batch Mode
The preferred method for running R is in batch mode. This is done by creating a text file containing the R commands you wish to run, then when you invoke R, you give it the name of the file that contains the commands it should run.
Put the following R commands in a file called script.R
library(datasets)
data(iris)
summary(iris)
To run script.R, you would invoke R this way
$ R CMD BATCH --no-restore --no-save script.R
which will run R in batch mode. The --no-restore and --not-save options prevent the workspace from automatically being saved and loaded. This is desired for reproducibility of your results. Unless you provide the name of an output file, R will append the word "out" to the input file name and write the output there. In this case, it will be script.Rout. If another name is desired simply specify it at the end of your R command:
$ R CMD BATCH --no-restore --no-save script.R Thisismyoutputfile
Basic R Job
For the majority of your R work a simple batch script like below will suffice:
#!/bin/sh
#SBATCH --job-name=basic_r_job
#SBATCH --time=1:00:00
#SBATCH [email protected]
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mem=1g
#SBATCH --cpus-per-task=1
R CMD BATCH --no-save --no-restore script.R
Array job with R
Array jobs provide the ability to submit multiple jobs with the same parameters and an iterating environment variable $SLURM_ARRAY_TASK_ID. This provides a means of changing the flow of your code, executing a different R script, or setting variables in your code. Arrays are especially useful for running the same simulation multiple times. A basic batch script that uses an array to run multiple R jobs:
#!/bin/sh
#SBATCH --job-name=basic_r_job
#SBATCH --time=1:00:00
#SBATCH [email protected]
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mem=1g
#SBATCH --cpus-per-task=1
#SBATCH --array=1-10
R CMD BATCH --no-save --no-restore script.R script_$SLURM_ARRAY_TASK_ID
When using arrays, it is important to include the SLURM_ARRAY_TASK_ID variable in your R output file. If you do not do this, then each array element will overwrite the R output file and you will only have the results of the last array element to finsish running.
R array job with getenv()
To access the variable from within R simply use the Sys.getenv() function.
#!/usr/bin/env Rscript
# grab the array id value from the environment variable passed from sbatch
slurm_arrayid <- Sys.getenv('SLURM_ARRAY_TASK_ID')
# coerce the value to an integer
n <- as.numeric(slurm_arrayid)