Submitting Batch Jobs
#
SLURMElja uses SLURM
as the batch scheduler and resource manager.
Basic common commands are summarized below.
Command | Description |
---|---|
sbatch | submit a batch job script |
srun | run a parallel job |
squeue (-a, -u $USER) | show queue status |
sinfo | view info about nodes and partitions |
scancel JOBID | cancel a job |
#
Batch jobsThe command sbatch
is used to submit jobs to the SLURM
queue
A batch submit script usually starts like this
Here two nodes from the 48mem_192mem
partition is requested, using 48 processors per node for a total of 96 processors. The memory per cpu-core is set to 3900MB RAM. See the Partitions & Hardware for details on the available partitions.
When the SLURM
scheduler has allocated the resources the subsequent lines are executed in order. First a program environment bash is loaded (see Program Environment), and an mpirun
instance of a Python script is executed.
Hyper-threading of the intel based CPUs is on by default, hence it is is highly recommended to suppress it in your submit (or .bashrc) script (unless your software supports and is correctly compiled with openmp).
For .basrhc
After submitting a job you can view the current status and jobids' like this
You can cancel a job using the JOBID number. In this example
IF your job requires a lot of input data, or if it generates a lot of output it is advisable to make use of the /scratch/ disk available on the compute nodes. See the next section.