Slurm usage
Content
Slurm partitions
To match your job requirements with the hardware, you choose among the
- Slurm partitions of the Compute partitions which are linked to their
- charge rates on the page Accounting.
Important Slurm commands
The commands normally used for job control and management are
- Job submission:
$ sbatch <jobscript>$ srun <arguments> <command> - Status of a specific job:
$ squeue -j <jobID>$ scontrol show job <jobID>(for full job information, even after the job finished) - Job cancellation:
$ scancel <jobID>$ scancel -i -u $USER(cancel all your jobs, asking at each job)$ scancel -9(sendSIGKILLinstead ofSIGTERM) - Job overview:
$ squeue
$ squeue -l --me - Estimated job start:
$ squeue --start -j <jobID> - Workload overview of the whole system:
$ sinfo$ sinfo --format="%25C %A"$ squeue -l
Job scripts
A job script can be any script that contains special instruction for Slurm. Most commonly used forms are shell scripts, such as bash or plain sh. But other scripting languages (e.g. Python, Perl, R) are also possible.
#!/bin/bash #SBATCH -p cpu-clx:test #SBATCH -N 4 #SBATCH -t 06:00:00 module load impi mpirun my-mpi-parallel-binary
Job scripts need to have a shebang line at the top, followed by the #SBATCH options. These #SBATCH comments have to be at the top, as Slurm stops scanning for them after the first non-comment non-whitespace line (e.g., a command or a declaration of a variable).
More examples can be found at Examples and Recipes.
Parameters
| Parameter | SBATCH flag | Comment |
|---|---|---|
| # nodes | -N <#> | |
| # tasks | -n <#> | |
| # tasks per node | --tasks-per-node <#> | different defaults between mpirun and srun |
| Slurm partition | -p <name> | e.g. cpu-clx, overview: Slurm partition CPU CLX |
# CPU cores per task | -c <#> | interesting for OpenMP/Hybrid jobs |
| job walltime limit | -t hh:mm:ss | realistic estimates for best scheduling efficiency |
| e-mail notification | --mail-type=ALL | see sbatch manpage for notification types |
| project account | -A <project-ID> | project account to be charged |
Job runtime limits
Maximum runtimes of jobs, also known as walltime limits, are defined for each partition and can be viewed either on the command line using sinfo , or here. There is no minimum runtime, but a runtime of at least 1 hour is encouraged. Large amounts of small and short jobs can trigger problems in our accounting system. Occasional short jobs are fine, but if you submit large amounts of jobs that finish (or crash) quickly, we might have to intervene and temporarily suspend your account. If you have lots of smaller workloads, please consider combining them into a single job that runs for at least 1 hour.
Project account selection
Each job is running under the name of the user who submitted the job to the system. The user name is not to be confused with the project account which a job gets charged to.
- For users new to the system, the default project account is their Test Project. It reads the same as the user name.
- Users with membership in one or more compute projects need to decide which project account a job will be charged to.
- Decision can be taken via the sbatch flags --account=my-project-ID or, in short, -A my-project-ID.
If no explicit decision is provided, the default applies which can be changed by the user in the Portal (section User Data).
After job submission, Slurm checks the balance of the job's project account. When a negative balance is returned by the Slurm database, the job will not be scheduled. In this case, squeue displays a corresponding message (AccountOutOfComputeTime) to the user.
$ squeue --me JOBID PARTITION NAME USER ACCOUNT STATE TIME NODES NODELIST(REASON) ... ... ... ... ... PENDING 0:00 ... (AccountOutOfComputeTime)
Interactive jobs
To use compute resources interactively, e.g. to follow the execution of MPI programs, the following steps are required. Note that non-interactive batch jobs via job scripts (see below) are the primary way of using the compute resources.
- A resource allocation for interactive usage has to be requested first with the
salloccommand which should also include your resource requirements. - When
sallocsuccessfully allocated the requested resources, you have to issue an additionalsruncommand if you want to work directly on one of the allocated nodes (see example below) - Afterwards,
srunor MPI launcher commands (mpirun,mpiexec)can be used to start parallel programs.
blogin1:~ $ salloc -t 00:10:00 -p cpu-clx:test -N2 --tasks-per-node 24
salloc: Granted job allocation [...]
salloc: Waiting for resource configuration
salloc: Nodes bcn[1001,1003] are ready for job
# To get a shell on one of the allocated nodes
blogin1:~ $ srun --pty --interactive --preserve-env ${SHELL}
bcn1001:~ $ srun hostname | sort | uniq -c
24 bcn1001
24 bcn1003
bcn1001:~ $ exit
blogin1:~ $ exit
salloc: Relinquishing job allocation [...]
Using shared nodes
In some of our Slurm partitions, compute nodes are allocated in shared mode, so that multiple jobs of a partition requesting only parts of a node can run concurrently on the same nodes. You can explicitly request less CPU cores (implies less memory, too) or less memory per node than available, and take care that memory limits are not exceeded.
Please remember that most of Lise's compute nodes belong to Slurm partitions where nodes are allocated exclusively to a job, meaning nodes are not shared between jobs (see Accounting).