Slurm partition GPU A100

Slurm partition GPU A100

The GPU A100 shares the same Slurm batch system with all partitions of System Lise. The following Slurm partitions are specific for the GPU A100 partition.

Slurm partition

Node number

CPU

Main memory (GB)

GPUs per node

GPU hardware

Walltime (hh:mm:ss)

Description

Slurm partition

Node number

CPU

Main memory (GB)

GPUs per node

GPU hardware

Walltime (hh:mm:ss)

Description

gpu-a100

34

Ice Lake 8360Y

1000

4

NVIDIA A100 80GB 

24:00:00

full node exclusive

gpu-a100:shared

5

4

NVIDIA A100 80GB 

shared node access, exclusive use of the requested GPUs

gpu-a100:shared:mig

1

28 (4 x 7)

1 to 28 1g.10gb A100 MIG slices

shared node access, shared GPU devices via Multi Instance GPU. Each of the four GPUs is logically split into usable seven slices with 10 GB of GPU memory associated to each slice

gpu-a100:test

2

4

NVIDIA A100 80GB 

01:00:00

nodes reserved for short job tests before scheduling longer jobs with more resources

See Slurm usage how to pass a 24h walltime limit with job dependencies.

Charge rates

Charge rates for the slurm partitions you find in Accounting.

Examples

Assuming a job script 

Job script example.slurm
#!/bin/bash #SBATCH --partition=gpu-a100 #SBATCH --nodes=2 #SBATCH --ntasks=8 #SBATCH --gres=gpu:4 module load openmpi/gcc.11/4.1.4 mpirun ./mycode.bin

you can submit a job to the slurm batch system via the line:

Job submission
bgnlogin2 $ sbatch example.slurm Submitted batch job 7748544 bgnlogin2 $ squeue -u myaccount ...

 

Example: Exclusive usage of two nodes with 4 GPUs each
$ srun --nodes=2 --gres=gpu:4 --partition=gpu-a100 example_cmd

 

Example: Request two GPUs within the shared partition
# Note: The two GPUs may be located on different nodes. $ srun --gpus=2 --partition=gpu-a100:shared example_cmd # Note: Two GPUs on the same node. $ srun --nodes=1 --gres=gpu:2 --partition=gpu-a100:shared example_cmd

 

Example: Request a single Multi Instance GPU slice on the according Slurm partition
$ srun --gpus=1 --partition=gpu-a100:shared:mig example_cmd