This page contains all important information about the batch system Slurm, that you will need to run software on the HLRN. It does not contain every feature that Slurm has to offer. For that, please consult the official documentation and the man pages.

Submission of jobs mainly happens via the sbatch command using jobscript, but interactive jobs and node allocations are also possible using srun or salloc. Resource selecttion (e.g. number of nodes or cores) is handled via command parameters, or may be specified in the job script.

Partitions

...

very fat nodes with 1536 GB memory,
for data pre- and postprocessing

...

16 dedicated

+48 on demand

...

If you do not request a partition, you will be placed on to the default partition, which is standard96 in Berlin and medium40 in Göttingen.

The default partitions are suitable for most calculations. The :test partitions are, as the name suggests, intended for shorter and smaller test runs. These have a higher priotity and a few dedicated nodes, but are limited in time and number of nodes. The :shared nodes are mainly for postprocessing. Nearly all nodes are exclusive to one job, except for the nodes in these :shared partitions.

Parameters

...

Content

Inhalt

Slurm partitions

To match your job requirements with the hardware, you choose among the

slurm partitions of the Compute partitions which are linked to their
charge rates on the page Accounting.

Important slurm commands

The commands normally used for job control and management are

Job submission:
sbatch <jobscript>
srun <arguments> <command>
Job status of a specific job:
squeue -j jobID for queues/running jobs
$ scontrol show job jobID for full job information (even after the job finished).

Job cancellation:
scancel jobIDscancel -i -u $USER cancel all your jobs (-u $USER) but ask for every job (-i)scancel -9 send kill SIGKILL instead of SIGTERM

Job overview:
$ squeue -l --me
Job start (estimated):
squeue --start -j jobID
Workload overview of the whole system: sinfo (esp. sinfo --format="%25C %A") , squeue -l

Job Scripts

A job script can be any script that contains special instruction for Slurm. Most commonly used forms are shell scripts, such as bash or plain sh. But other scripting languages (e.g. Python, Perl, R) are also possible.

Codeblock

language	bash
title	Example Batch Script

#!/bin/bash

#SBATCH -p standard96:test
#SBATCH -N 16
#SBATCH -t 06:00:00

module load impi
srun mybinary

The job scripts have to have a shebang line at the top, followed by the #SBATCH options. These #SBATCH comments have to be at the top, as Slurm stops scanning for them after the first non-comment non-whitespace line (e.g. an echo or variable declaration).

More examples can be found at Examples and Recipes.

Parameters

Parameter	SBATCH flag	Comment
# nodes	-N

#

<#>
# tasks	-n

#

<#>
# tasks per node	#SBATCH --tasks-per-node

#Hyperthreading active by default! See below

<#>	Different defaults between mpirun and srun
partition	-p <name>

standard96/medium40

e.g. standard96, overview: Slurm partitions CPU
# CPUs per task	-c

#

<#>

Default 1,

interesting for OpenMP/Hybrid jobs

Timelimit

Wall time limit	-t hh:mm:ss
Mail	--mail-type=ALL	See sbatch manpage for different types
Project/Account	-A <project>	Specify project for core hour accounting

Job

...

A job script can be any script that contains special instruction for Slurm. Most commonly used forms are shell scripts, such as bash or plain sh. But other scripting languages (e.g. Python, Perl, R) are also possible.

Codeblock

language	bash
title	Example Batch Script
linenumbers	true

#!/bin/bash

#SBATCH -p medium40
#SBATCH -N 16
#SBATCH -t 06:00:00

module load impi
srun mybinary

Tasks, CPUs and Hyperthreading

By default, hyperthreading is activated. Our nodes have 40 or 96 cores, with two threads each. Slurm doesn't differentiate between hyperthreads and cores and calls a single hyperthread CPU. So don't be confused by this weird nomenclature. If you do not specify anything, 192 or 80 processes will be started. If you want to disable it, you will have to use the --tasks-per-node option and set it to 96 or 40. If your software uses shared memory parallelization (e.g. OpenMP), you only need a single task per node, but more CPUs per task, which is set by -c. Take a look at the examples for more information.

Getting Information about Jobs

Using the Shared Nodes

Advanced Options

Job Arrays

...

Walltime

The maximum runtime is set per partition and can be viewed either on the system with sinfo or here. There is no minimum walltime (we cannot stop your jobs from finishing, obviously), but a walltime of at least 1 hour is encouraged. A large amount of smaller, shorter jobs can cause problems with our accounting system. The occasional short job is fine, but if you submit larger amounts of jobs that finish (or crash) quickly, we might have to intervene and temporarily suspend your account. If you have lots of smaller workloads, please consider combining them into a single job that uses at least 1 hour.

Select the project account

Batch jobs are submitted by a user account to the compute system.

For each job the user chooses one project that will be charged by the job. At the beginning of the lifetime of the User Account the default project is the Test Project.
The user controls the project for a job using the option --account at submit time.
For the User Account the default project for computing time can be changed under the link User Data on the Portal NHR@ZIB.

Codeblock

title	Example: account for unsafe-one job

To charge the account myaccount
add the following line to the job script. 
#SBATCH --account=myaccount

After job script submission the batch system checks the project for account coverage and authorizes the job for scheduling. Otherwise the job is rejected, please notice the error message:

Codeblock

title	Example: out of core hour

You can check the account of a job that is out of core hour.
> squeue
... myaccount ... AccountOutOfNPL ...

Interactive Jobs

See according Section in the Quick Start Guide.

Using the Shared Nodes

We provide a varying number of nodes from the large40 and large96 partitions as post processeing nodes in a shared mode, so that multiple jobs can run at once on a single node. You can request CPUs and memory and should take care, that you do not exceed your limits. For each CPU/Hyperthread, there is about 9.6Gb of Memory on large40:shared or 4 on the large96:shared partition.

The maximum walltime on the shared partitions is 2 days.

Erweitern

title	Example Job for the shared partition

This is an example for a job script using 10 cores. As this is not a MPI job, srun/mpirun is not needed. This jobs memory usage should not exceed

Mb

Codeblock
#!/bin/bash #SBATCH -p large96:shared #SBATCH -t 1-0 #one day #SBATCH -n 10 #SBATCH -N 1 python postprocessing.py

Versionen im Vergleich

Alte Version 5

Neue Version Aktuell

Schlüssel

Partitions

Parameters

Slurm partitions

Important slurm commands

Job Scripts

Parameters

Job

Tasks, CPUs and Hyperthreading

Getting Information about Jobs

Using the Shared Nodes

Advanced Options

Job Arrays

Walltime

Select the project account

Interactive Jobs

Using the Shared Nodes

Seitenvergleich

Versionen im Vergleich

Alte Version 5

Neue Version Aktuell

Schlüssel

Partitions

Parameters

Slurm partitions

Important slurm commands

Job Scripts

Parameters

Job

Tasks, CPUs and Hyperthreading

Getting Information about Jobs

Using the Shared Nodes

Advanced Options

Job Arrays

Walltime

Select the project account

Interactive Jobs

Using the Shared Nodes