Versionen im Vergleich

Schlüssel

  • Diese Zeile wurde hinzugefügt.
  • Diese Zeile wurde entfernt.
  • Formatierung wurde geändert.

Login

Login to the GPU A100 partition is possible through dedicated login nodes, reachable via SSH under bgnlogin.nhr.zib.de:

Codeblock
titleExample: login
$ ssh -i $HOME/.ssh/id_rsa_zib zib_username@bgnlogin.nhr.zib.de
Enter passphrase for key '/<home_directory>/.ssh/id_rsa_zib':
bgnlogin1$

File systems

The file systems HOME and WORK on the GPU system are the same as on the CPU system, see Quickstart. Access to compute node local SSD space is provided via the environment variable LOCAL_TMPDIR defined during a SLURM session (batch or interactive job).

Software and environment modules

Login and compute nodes of the A100 GPU partition are running under Rocky Linux (currently version 8.6).

...

When compiling applications for the A100 GPU partition, we recommend to use the A100 GPU login nodes or, in case of really demanding compilations and/or need for the presence of CUDA drivers, the use of a A100 GPU compute node via an interactive SLURM job session.

Using the batch system

The GPU nodes are available via partitions of the batch system slurm.

Lise's CPU-only partition and the A100 GPU partition share the same SLURM batch system. The main SLURM partition for the A100 GPU partition has the name "gpu-a100". An example job script is shown below.

Codeblock
titleGPU job script
linenumberstrue
#!/bin/bash
#SBATCH --partition=gpu-a100
#SBATCH --nodes=2
#SBATCH --ntasks=8 
#SBATCH --gres=gpu:4

module load openmpi/gcc.11/4.1.4
mpirun ./mycode.bin

GPU-aware MPI

For efficient use of MPI-distributed GPU codes, an GPU/CUDA-aware MPI installation of Open MPI is available in the openmpi/gcc.11/4.1.4 environment module. Open MPI respects the resource requests made to Slurm. Thus, no special arguments are required to mpiexec/run. Nevertheless, please consider and check the correct binding for your application to CPU cores and GPUs. Use --report-bindings of mpiexec/run to check it.

Container

Apptainer is provided as a module and can be used to download, build and run e.g. Nvidia containers:

...