Versionen im Vergleich

Schlüssel

  • Diese Zeile wurde hinzugefügt.
  • Diese Zeile wurde entfernt.
  • Formatierung wurde geändert.
Kommentar: updated partition chart to reflect new test nodes

The compute nodes of Lise in Berlin (blogin.hlrn.de) and Emmy in Göttingen (glogin.hlrn.de) are organized via the following SLURM partitions:

Lise (Berlin)

...

Partition (number holds cores per node)

...

Max jobs (running/ queued)
per user

...

Usable memory MB per node

...

CPU

...

Charged core-hours per node

...

16 / 500

...

747 000

...

1522 000

...

very fat memory nodes for data pre- and postprocessing

12 hours are too short? See here how to pass the 12h walltime limit with job dependencies.

Emmy (Göttingen)

...

Partition (number holds cores per node)

...

Max. walltime

...

Usable memory MB per node

...

CPU, GPU type

...

gcn#

...

2 dedicated

+6 on demand

...

747 000

...

1522 000

...

very fat memory nodes for data pre- and postprocessing

...

8 dedicated

+64 on demand

...

181 000

...

764 000

...

2 dedicated

+2 on demand

...

764 000

...

2 dedicated

+6 on demand

...

500 000 MB per node

(40GB HBM per GPU)

...

see GPU Usage

...

Skylake  6148 + 4 Nvidia V100 32GB,

Zen3 EPYC 7513 + 4 NVidia A100 40GB,

and Zen2 EPYC 7662 + 8 NVidia A100 80GB

...

764 000 MB (32 GB per GPU)

or 500 000 MB (10GB or 20GB HBM per MiG slice)

...

Skylake  6148 + 4 Nvidia V100 32GB,

Zen3 EPYC 7513 + 4 NVidia A100  40GB splitted in 2g.10gb and 3g.20gb slices

...

150 per GPU (V100)

or 47 per MiG slice (A100)

see GPU Usage

A100 GPUs are split into slices via MIG (3 slices per GPU)

...

764 000 MB (32 GB per GPU)

or 500 000 MB (10GB or 20GB HBM per MiG slice)

...

Skylake  6148 + 4 Nvidia V100 32GB,

Zen3 EPYC 7513 + 4 NVidia A100  40GB splitted in 2g.10gb and 3g.20gb slices

...

150 per GPU (V100)

or 47 per MiG slice (A100)

* 600 for the nodes with 4 GPUs, and 1200 for the nodes with 8 GPUs

Which partition to choose?

If you do not request a partition, your job will be placed in the default partition, which is standard96.

The default partitions are suitable for most calculations. The :test partitions are, as the name suggests, intended for shorter and smaller test runs. These have a higher priority and a few dedicated nodes, but are limited in time and number of nodes. Shared nodes are suitable for pre- and postprocessing. A job running on a shared node is only accounted for its core fraction (cores of job / all cores per node). All non-shared nodes are exclusive to one job, which implies that full NPL are paid.

Details about the CPU/GPU types can be found below.
The network topology is described here.

The available home/local-ssd/work/perm storages are discussed in File Systems.

An overview of all partitions and node statuses is provided by: sinfo -r
To see detailed information about a nodes type: scontrol show node <nodename>

List of CPUs and GPUs at HLRN

...

Cores per unit

...

Clock speed
[GHz]

...

640/5120*

...

432/6912*

...

The GPU A100 shares the same slurm batch system with all partitions of System Lise. The following slurm partitions are specific for the GPU A100 partition.

Slurm partitionNode numberCPUMain memory (GB)GPUs per nodeGPU hardwareWalltime (hh:mm:ss)Description
gpu-a10034Ice Lake 8360Y10004NVIDIA Tesla A100 80GB 24:00:00full node exclusive
gpu-a100:shared54NVIDIA Tesla A100 80GB shared node access, exclusive use of the requested GPUs
gpu-a100:shared:mig128 (4 x 7)1 to 28 1g.10gb A100 MIG slices

shared node access, shared GPU devices via Multi Instance GPU. Each of the four GPUs is logically split into usable seven slices with 10 GB of GPU memory associated to each slice

gpu-a100:test24NVIDIA Tesla A100 80GB 01:00:00nodes reserved for short job tests before scheduling longer jobs with more resources

See Slurm usage how to pass a 24h walltime limit with job dependencies.

Charge rates

Charge rates for the slurm partitions you find in Accounting.

Examples

Assuming a job script 

Codeblock
languagetext
titleJob script example.slurm
#!/bin/bash
#SBATCH --partition=gpu-a100
#SBATCH --nodes=2
#SBATCH --ntasks=8 
#SBATCH --gres=gpu:4

module load openmpi/gcc.11/4.1.4
mpirun ./mycode.bin

you can submit a job to the slurm batch system via the line:

Codeblock
languagetext
titleJob submission
bgnlogin2 $ sbatch example.slurm
Submitted batch job 7748544
bgnlogin2 $ squeue -u myaccount
...


Codeblock
titleExample: Exclusive usage of two nodes with 4 GPUs each
$ srun --nodes=2 --gres=gpu:4 --partition=gpu-a100 example_cmd


Codeblock
titleExample: Request two GPUs within the shared partition
# Note: The two GPUs may be located on different nodes.
$ srun --gpus=2 --partition=gpu-a100:shared example_cmd

# Note: Two GPUs on the same node.
$ srun --nodes=1 --gres=gpu:2 --partition=gpu-a100:shared example_cmd


Codeblock
titleExample: Request a single Multi Instance GPU slice on the according Slurm partition
$ srun --gpus=1 --partition=gpu-a100:shared:mig example_cmd