Versionen im Vergleich

Schlüssel

  • Diese Zeile wurde hinzugefügt.
  • Diese Zeile wurde entfernt.
  • Formatierung wurde geändert.
Kommentar: updated partition chart to reflect new test nodes

The following GPU partitions are available on the GPU partition of system Lise.

...

GPU A100 shares the same slurm batch system with all partitions of System Lise. The following slurm partitions are specific for the GPU A100 partition.

...

Slurm partitionNode numberCPUMain memory (GB)GPUs per nodeGPU hardwareWalltime (hh:mm:ss)Description
gpu-a1003634Ice Lake 8360Y10004NVIDIA Tesla A100 80GB 24:00:00full node exclusive
gpu-a100:shared54NVIDIA Tesla A100 80GB shared node access, exclusive use of the requested GPUs
gpu-a100:shared:mig128 (4 x 7)1 to 28 1g.10gb A100 MIG slices

shared node access, shared GPU devices via Multi Instance GPU. Each of the four GPUs is logically split into usable seven slices with 10 GB of GPU memory associated to each slice

Cost: 150 core hours per GPU or 21.43 per MIG slice

Lise (Berlin)

...

Partition (number holds cores per node)

...

Max jobs (running/ queued)
per user

...

Usable memory MB per node

...

CPU

...

Charged core-hours per node

...

1 000 000

...

4 A100 GPUs

...

1 000 000

...

4 A100 GPUs

...

1 000 000

...

4 A100 GPUs with 7 1g10gb mig slices per GPU

gpu-a100:test24NVIDIA Tesla A100 80GB 01:00:00nodes reserved for short job tests before scheduling longer jobs with more resources

See Slurm usage how to pass a 24h walltime limit with job dependencies.

Charge rates

Charge rates for the slurm partitions you find in Accounting.

Examples

Assuming a job script 

Codeblock
languagetext
titleJob script example.slurm
#!/bin/bash
#SBATCH --partition=gpu-a100
#SBATCH --nodes=2
#SBATCH --ntasks=8 
#SBATCH --gres=gpu:4

module load openmpi/gcc.11/4.1.4
mpirun ./mycode.bin

you can submit a job to the slurm batch system via the line:

Codeblock
languagetext
titleJob submission
bgnlogin2 $ sbatch example.slurm
Submitted batch job 7748544
bgnlogin2 $ squeue -u myaccount
...


Codeblock
titleExample: Exclusive usage of two nodes with 4 GPUs each
$ srun --nodes=2 --gres=gpu:4 --partition=gpu-a100 example_cmd

...

Codeblock
titleExample: Request a single Multi Instance GPU slice on the according Slurm partition
$ srun --gpus=1 --partition=gpu-a100:shared:mig example_cmd

Hardware configuration

NHR@ZIB offers access to compute nodes equipped with Nvidia A100 GPUs. The GPU A100 partition consists of two login nodes and 42 compute nodes with the following properties for a single node:

  • 2x Intel Xeon "Ice Lake" Platinum 8360Y (36 cores per socket, 2.4 GHz, 250 W)

  • 1 TB RAM (DDR4-3200)
  • 4x Nvidia A100 (80GB HBM2, SXM), two attached to each CPU socket
  • 7.68 TB NVMe local SSD
  • 200 GBit/s InfiniBand Adapter (Mellanox MT28908).

...