This page contains all important information about the batch system Slurm, that you will need to run software on the HLRN. It does not contain every feature that Slurm has to offer. For that, please consult the official documentation and the man pages.
Submission of jobs mainly happens via the sbatch
command using jobscript, but interactive jobs and node allocations are also possible using srun
or salloc
. Resource selecttion (e.g. number of nodes or cores) is handled via command parameters, or may be specified in the job script.
Partitions
Partition | Location | Max. walltime | Nodes | Max. nodes per job | Max jobs per user | Memory physical | Memory allowed | Remark |
---|---|---|---|---|---|---|---|---|
standard96 | Lise | 12:00:00 | 952 | 256 | (var) | 384 GB | 362 GB | normal nodes, default partition |
standard96:test | Lise | 1:00:00 | 32 dedicated +128 on demand | 16 | 1 | 384 GB | 362 GB | normal test nodes with higher priority but lower walltime |
large96 | Lise | 12:00:00 | 28 | 4 | (var) | 768 GB | 747 GB | fat nodes |
large96:test | Lise | 1:00:00 | 2 dedicated +2 on demand | 2 | 1 | 768 GB | 747 GB | fat test nodes with higher priority but lower walltime |
large96:shared | Lise | 48:00:00 | 2 dedicated | 1 | (var) | 768 GB | 747 GB | fat nodes for data pre- and postprocessing |
huge96 | Lise | 24:00:00 | 2 | 1 | (var) | 1536 GB | 1522 GB | very fat nodes for data pre- and postprocessing |
medium40 | Emmy | 12:00:00 | 368 | 128 | unlimited | 384 GB | 362 GB | normal nodes, default partition |
medium40:test | Emmy | 1:00:00 | 16 dedicated +48 on demand | 8 | unlimited | 380 GB | 747 GB | normal test nodes with higher priority but lower walltime |
large40 | Emmy | 12:00:00 | 11 | 4 | unlimited | 768 GB | 747 GB | fat nodes |
large40:test | Emmy | 1:00:00 | 3 | 2 | unlimited | 768 GB | 765 GB | fat test nodes with higher priority but lower walltime |
large40:shared | Emmy | 24:00:00 | 2 | 1 | unlimited | 768 GB | 765 GB | for data pre- and postprocessing |
gpu | Emmy | 12:00:00 | 1 | 1 | unlimited | equipped with 4 x NVIDIA Tesla V100 32GB |
If you do not request a partition, you will be placed on to the default partition, which is standard96 in Berlin and medium40 in Göttingen.
The default partitions are suitable for most calculations. The :test partitions are, as the name suggests, intended for shorter and smaller test runs. These have a higher priotity and a few dedicated nodes, but are limited in time and number of nodes. The :shared nodes are mainly for postprocessing. Nearly all nodes are exclusive to one job, except for the nodes in these :shared partitions.
Parameters
Parameter | Comment | |
---|---|---|
# nodes | -N # | |
# tasks | -n # | |
# tasks per node | --tasks-per-node # | Different defaults between mpirun and srun |
partition | -p <name> | standard96/medium40 |
# CPUs per task | -c # | Default 1, interesting for OpenMP/Hybrid jobs |
Timelimit | -t hh:mm:ss | |
--mail-type=ALL | See sbatch manpage for different types | |
Project/Account | -A <project> | Specify project |
Job Scripts
A job script can be any script that contains special instruction for Slurm. Most commonly used forms are shell scripts, such as bash
or plain sh
. But other scripting languages (e.g. Python, Perl, R) are also possible.
#!/bin/bash #SBATCH -p medium40 #SBATCH -N 16 #SBATCH -t 06:00:00 module load impi srun mybinary
The job scripts have to have a shebang line at the top, followed by the #SBATCH
options. These #SBATCH
comments have to be at the top, as Slurm stops scanning for them after the first non-comment non-whitespace line (e.g. an echo
or variable declaration).
More examples can be found at Examples and Recipes.
Important slurm commands
The commands normally used for job control and management are
- Job submission:
sbatch <jobscript>
srun <arguments> <command>
- Job status of a specific job:
squeue -j jobID
for queues/running jobs$ scontrol show job jobID
for full job information (even after the job finished).
- Job cancellation:
scancel jobID
scancel -i -u $USER
cancel all your jobs (-u $USER
) but ask for every job (-i
)scancel -9
send killSIGKILL
instead ofSIGTERM
- Job overview:
$ squeue -l --me
- Job start (estimated):
squeue --start -j jobID
- Workload overview of the whole system:
sinfo
(esp.sinfo --format="%25C %A"
) ,squeue -l
Using the Shared Nodes
We provide a varying number of nodes from the large40 and large96 partitions as post processeing nodes in a shared mode, so that multiple jobs can run at once on a single node. You can request CPUs and memory and should take care, that you do not exceed your limits. For each CPU/Hyperthread, there is about 9.6Gb of Memory on large40:shared or 4 on the large96:shared partition.
The maximum walltime on the shared partitions is currently 2 days.
Advanced Options
Slurm offers a lot of options for job allocation, process placement, job dependencies and arrays and much more. We cannot exhaustively cover all topics here. As mentioned at the top of the page, please consult the official documentation and the man pages for an in depth description of all parameters.
Job Arrays
If you need to submit a large number of similar jobs, please do use for loops to submet them, but instead use job arrays (this lessens the burden on the scheduler). Arrays can be defined using the -a <number of jobs>
option. To divide your workload on to the different jobs within your jobscript, there are several environment variables that can be used: