Content

General information for all Lise partitions you can find for the topics

Hardware

The GPU A100 partition offers access to two login nodes and 42 compute nodes equipped with Nvidia A100 GPUs. One single compute node holds the following properties.

Login nodes

The hardware of the login nodes nodes is similar to those of the compute nodes. Notable differences to the compute nodes are

Login authentication is possible via SSH keys only. Please visit Usage Guide.

Generic login nameList of login nodes
bgnlogin.nhr.zib.de

bgnlogin1.nhr.zib.de   bgnlogin2.nhr.zib.de

Software and environment modules

bgnlogin1 $ module avail
...
bgnlogin1 $ module load gcc
...
bgnlogin1 $ module list
Currently Loaded Modulefiles:
 1) HLRNenv   2) sw.a100   3) slurm   4) gcc/11.3.0(default)

Program build and execution

Job monitoring

A running job can be monitored interactively, directly on each of the compute nodes. Once you know the names of the job nodes you can login and monitor the host CPU as well as the GPUs.

bgnlogin1 $ squeue -u myaccount
  JOBID PARTITION     NAME      USER ST TIME  NODES NODELIST(REASON)
7748370  gpu-a100 a100_mpi myaccount  R 1:23      2 bgn[1007,1017]
bgnlogin1 $ ssh bgn1007
bgn1007 $ top
bgn1007 $ nvidia-smi
bgn1007 $ module load nvtop
bgn1007 $ nvtop

Using the slurm batch system

The GPU A100 shares the same slurm batch system with all partitions of System Lise.

#!/bin/bash
#SBATCH --partition=gpu-a100
#SBATCH --nodes=2
#SBATCH --ntasks=8 
#SBATCH --gres=gpu:4

module load openmpi/gcc.11/4.1.4
mpirun ./mycode.bin