Content Comparison

Using srun to create multiple jobs steps

You can use srun to start multiple job steps concurrently on a single node, e.g. if your job is not big enough to fill a whole node. There are a few details to follow:

By default, the srun command gets exclusive access to all resources of the job allocation and uses all tasks
- you therefore need to limit srun to only use part of the allocation
- this includes implicitly granted resources, i.e. memory and GPUs
- the --exact flag is needed.
- if running non-mpi programs, use the -c option to denote the number of cores, each process should have access to
srun waits for the program to finish, so you need to start concurrent processes in the background
Good default memory per cpu values (without hyperthreading) are usually are:

standard96 large96 huge96
medium40
large40/gpu
--mem-per-cpu
3770M
7781M 15854M
4525M
19075M

`Examples`

Codeblock

language	bash
title	Four concurrent Programs
linenumbers	true	collapse	true

#!/bin/bash
#SBATCH -p standard96
#SBATCH -t 06:00:00
#SBATCH -N 1

srun --exact -n1 -c 10 --mem-per-cpu 3770M  ./program1 &
srun --exact -n1 -c 80 --mem-per-cpu 3770M  ./program2 &
srun --exact -n1 -c 6 --mem-per-cpu 3770M  ./program3 &
wait

...

Codeblock

language	bash
title	Run a single GPU program four times concurrently
collapselinenumbers	true

#!/bin/bash
#SBATCH -p gpu
#SBATCH -t 12:00:00
#SBATCH -N 1

srun --exact -n1 -c 10 -G1 --mem-per-cpu 19075M  ./single-gpu-program &
srun --exact -n1 -c 10 -G1 --mem-per-cpu 19075M  ./single-gpu-program &
srun --exact -n1 -c 10 -G1 --mem-per-cpu 19075M  ./single-gpu-program &
srun --exact -n1 -c 10 -G1 --mem-per-cpu 19075M  ./single-gpu-program &
wait

Using the Linux parallel command to run a large number of tasks

If you have to run many nearly identical but small tasks (single-core, little memory) you can try to use the Linux parallel command. To use this approach you first need to write a bash-shell script, e.g. task.sh, which executes a single task. As an example we will use the following script:

Codeblock

language	bash
title	task.sh
linenumbers	true

#!/bin/bash

# parallel task
TASK_ID=$1
PARAMETER=$((10+RANDOM%10))    # determine some parameter unique for this task
                               # often this will depend on the TASK_ID

echo -n "Task $TASK_ID: sleeping for $PARAMETER seconds ... "
sleep $PARAMETER
echo "done"

This script is simply defining a variable PARAMETER which then used as the input for the actual command, which is sleep in this case. The script also takes one input parameter, which can be interpreted as the TASK_ID and could also be used for determining the PARAMETER. If we make the script executable and run it as follows, we get:

Codeblock
$ chmod u+x task.sh $ ./task.sh 4 Task 4: sleeping for 11 seconds ... done

To now run this task this task 100 times with different TASK_IDs we can write the following job script:

Codeblock

language	bash
title	parallel_job.sh
linenumbers	true

#!/bin/bash

#SBATCH --partition standard96:test      # adjust partition as needed
#SBATCH --nodes 1                        # more than 1 node can be used
#SBATCH --tasks-per-node 96              # one task per CPU core, adjust for partition

# set memory available per core
MEM_PER_CORE=4525    # must be set to value that corresponds with partition
                     # see https://www.hlrn.de/doc/display/PUB/Multiple+concurrent+programs+on+a+single+node

# Define srun arguments:
srun="srun -n1 -N1 --exclusive --mem-per-cpu $MEM_PER_CORE"
# --exclusive     ensures srun uses distinct CPUs for each job step
# -N1 -n1         allocates a single core to each task

# Define parallel arguments:
parallel="parallel -N 1 --delay .2 -j $SLURM_NTASKS --joblog parallel_job.log"
# -N                number of argument you want to pass to task script
# -j                number of parallel tasks (determined from resources provided by Slurm)
# --delay .2        prevents overloading the controlling node on short jobs
# --resume          add if needed to use joblog to continue an interrupted run (job resubmitted)
# --joblog          creates a log-file, required for resuming

# Run the tasks in parallel
$parallel "$srun ./task.sh {1}" ::: {1..100}
# task.sh          executable(!) script with the task to complete, may depend on some input parameter
# ::: {a..b}       range of parameters, alternatively $(seq 100) should also work
# {1}              parameter from range is passed here, multiple parameters can be used with
#                  additional {i}, e.g. {2} {3} (refer to parallel documentation)

The script use parallel in line 25 to run task.sh 100 times with a parameter taken from the range {1..100}. Because each task is started with srun a separate job step is created and the options used with srun (see line 12) the task is using only a single core. This simple example can be adjusted as needed by modifying the script task.sh and the job script parallel_job.sh. You can adjust the requested resources, for example, you can use more than a single node. Note that depending on the number of tasks you may have to split your job into several to keep the total time needed short enough. Once the setup is done, you can simply submit the job:

Codeblock
$ sbatch parallel_job.sh

Looping over two arrays

You can use parallel to loop over multiple arrays. The --xapply option controls, if all permuatations are used or not:

Codeblock

language	bash
title	Looping over multiple inputs
collapse	true

$ parallel --xapply echo {1} {2} ::: 1 2 3 ::: a b c
1 a
2 b
3 c
$ parallel echo {1} {2} ::: 1 2 3 ::: a b c
1 a
1 b
1 c
2 a
2 b
2 c
3 a
3 b
3 c

Version	Alte Version 1	Neue Version 12
Änderungen wurden vorgenommen von	Marcus Boden	Scientific Consultant01
Gespeichert am	Mai 07, 2021	Juni 03, 2024

Versionen im Vergleich

Schlüssel

Using srun to create multiple jobs steps

`Examples`

Using the Linux parallel command to run a large number of tasks

Looping over two arrays