SLURM as the batch scheduler and resource manager.
Basic common commands are summarized below.
|submit a batch job script|
|run a parallel job|
|show queue status|
|view info about nodes and partitions|
|cancel a job|
sbatch is used to submit jobs to the
A batch submit script usually starts like this
Here two nodes from the
48mem_192mem partition is requested, using 48 processors per node for a total of 96 processors. The memory per cpu-core is set to 3900MB RAM. See the Partitions & Hardware for details on the available partitions.
SLURM scheduler has allocated the resources the subsequent lines are executed in order. First a program environment bash is loaded (see Program Environment), and an
mpirun instance of a Python script is executed.
Hyper-threading of the intel based CPUs is on by default, hence it is is highly recommended to suppress it in your submit (or .bashrc) script (unless your software supports and is correctly compiled with openmp).
After submitting a job you can view the current status and jobids' like this
You can cancel a job using the JOBID number. In this example
IF your job requires a lot of input data, or if it generates a lot of output it is advisable to make use of the /scratch/ disk available on the compute nodes. See the next section.