General information for system usage you find on Quickstart, especially for the topics

login via ssh,
file systems.

Software and environment modules

Login and compute nodes of the A100 GPU partition are running under Rocky Linux (currently version 8.6).

Software for the A100 GPU partition provided by NHR@ZIB can be found using the module command, see QuickstartTo build and execute code on the GPU A100 partition, please login to

a GPU A100 login node, like bgnlogin.nhr.zib.de.
see also GPU A100 partition

Code build

For code generation we recommend the software package NVIDIA hpcx which is a combination of compiler and powerful libraries, like e.g. MPI.

Codeblock

language	text
title	Example: Show the currently available software and access compilersPlain OpenMP for GPU

bgnlogin1 ~ $ module avail ...
bgnlogin1 ~ $ module load gcc
...load nvhpc-hpcx/23.1
bgnlogin1 ~ $ module list
Currently Loaded Modulefiles: ... 14) HLRNenvhpcx   25) sw.a100   3) slurm   4) gcc/11.3.0(default)

Please note the presence of the sw.a100 environment module. When loaded, environment modules are shown for software installed for the NVidia A100 GPU partition. This is the default setting on the A100 GPU login and compute nodes.

When compiling applications for the A100 GPU partition, we recommend to use the A100 GPU login nodes or, in case of really demanding compilations and/or need for the presence of CUDA drivers, the use of a A100 GPU compute node via an interactive SLURM job session.

Using the batch system

The GPU nodes are available via partitions of the batch system slurm.

Lise's CPU-only partition and the A100 GPU partition share the same SLURM batch system. The main SLURM partition for the A100 GPU partition has the name "gpu-a100". An example job script is shown below.

Codeblock

title	GPU job script
linenumbers	true

nvhpc-hpcx/23.1
bgnlogin1 $ nvc -mp -target=gpu openmp_gpu.c -o openmp_gpu.bin

Codeblock

language	text
title	MPI + OpenMP for GPU

bgnlogin1 $ module load nvhpc-hpcx/23.1
bgnlogin1 $ mpicc -mp -target=gpu mpi_openmp_gpu.c -o mpi_openmp_gpu.bin

Code execution

All available slurm partitions for the A100 GPU partition you can see on Slurm partitions GPU A100.

Codeblock

language	text
title	Job script for plain OpenMP

#!/bin/bash
#SBATCH --partition=gpu-a100:shared
#SBATCH --nodesgres=2gpu:1
#SBATCH --ntasksnodes=8 1
#SBATCH --ntasks-gres=gpu:4

module load openmpi/gcc.11/4.1.4
mpirun ./mycode.bin

GPU-aware MPI

For efficient use of MPI-distributed GPU codes, an GPU/CUDA-aware MPI installation of Open MPI is available in the openmpi/gcc.11/4.1.4 environment module. Open MPI respects the resource requests made to Slurm. Thus, no special arguments are required to mpiexec/run. Nevertheless, please consider and check the correct binding for your application to CPU cores and GPUs. Use --report-bindings of mpiexec/run to check it.

Container

Apptainer is provided as a module and can be used to download, build and run e.g. Nvidia containers:

Codeblock

language	bash
title	Apptainer example

bgnlogin1 ~ $ module load apptainer
Module for Apptainer 1.1.6 loaded.

#pulling a tensorflow image from nvcr.io - needs to be compatible to local driver
bgnlogin1 ~ $ apptainer pull tensorflow-22.01-tf2-py3.sif docker://nvcr.io/nvidia/tensorflow:22.01-tf2-py3
...

#example: single node run calling python from the container in interactive job using 4 GPUs
bgnlogin1 ~ $ srun -pgpu-a100per-node=72

./openmp_gpu.bin

Codeblock

language	text
title	Job script for MPI + OpenMP

#!/bin/bash
#SBATCH --partition=gpu-a100
#SBATCH --gres=gpu:4
#SBATCH --nodes=12
--pty#SBATCH --interactive ntasks--preserve-env ${SHELL}
...
bgn1003 ~ $ apptainer run --nv tensorflow-22.01-tf2-py3.sif python
...
Python 3.8.10 (default, Nov 26 2021, 20:14:08) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.config.list_physical_devices("GPU")
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]

#optional: cleanup apptainer cache
bgnlogin1 ~ $ apptainer cache list
...
bgnlogin1 ~ $ apptainer cache cleanper-node=72

module load nvhpc-hpcx/23.1
mpirun --np 8 --map-by ppr:2:socket:pe=1 ./mpi_openmp_gpu.bin

Versionen im Vergleich

Alte Version 2

Neue Version Aktuell

Schlüssel

Software and environment modules

Code build

Using the batch system

Code execution

GPU-aware MPI

Container

Seitenvergleich

Versionen im Vergleich

Alte Version 2

Neue Version Aktuell

Schlüssel

Software and environment modules

Code build

Using the batch system

Code execution

GPU-aware MPI

Container