Versionen im Vergleich

Schlüssel

  • Diese Zeile wurde hinzugefügt.
  • Diese Zeile wurde entfernt.
  • Formatierung wurde geändert.

Table of Contents

Inhalt

For questions, please contact the support crew support@nhr.zib.de.

Login

Login authentication is possible via SSH Login.

...

Codeblock
firstline1
titleExample CPU partition
office $ ssh -i $HOME/.ssh/id_rsa_nhr nhr_username@blogin.nhr.zib.de
Enter passphrase for key '...':
blogin1 $

File systems

Each complex has the following file systems available. More information about Quota, usage, and best pratices are available on Fixing Quota Issues. Hints for data transfer are given here.

  • Home file system with 340 TiByte capacity containing $HOME directories /home/${USER}/
  • Lustre parallel file system with 8.1 PiByte capacity containing
    • $WORK directories /scratch/usr/${USER}/
    • $TMPDIR directories /scratch/tmp/${USER}/
    • project data directories /scratch/projects/<projectID>/ (not yet available)
  • Tape archive with 120 TiByte capacity (accessible on the login nodes, only)
Info
Best practices for using WORK as a lustre filesystem: https://www.nas.nasa.gov/hecc/support/kb/lustre-best-practices_226.html
Info
Hints for fair usage of the shared WORK ressource: Metadata Usage on WORK

Partitions on system Lise

Compute system Lise at NHR@ZIB contains different Compute partitions for CPUs and GPUs. Your choice for the partition affects

Login nodes

To login to system Lise, please

Software and environment modules

...

To avoid conflicts between different compilers and compiler versions, builds of most important libraries are provided for all compilers and major release numbers.

Program build

Here only a brief introduction to program building using the intel compiler is given. For more detailed instructions, including important compiler flags and special libraries, refer to our webpage Compilation CPU CLX.

Examples for building a program on the Atos system

To build executables for the Atos system, call the standard compiler executables (icc, ifort, gcc, gfortran) directly.

Codeblock
languagebash
titleSerial Code
module load intel
icc -o hello.bin hello.c
Codeblock
languagebash
titleParallel Code with MPI
module load intel
module load impi
mpiicc -o hello.bin hello.c
Codeblock
languagebash
titleParallel Code with OpenMP
module load intel
icc -qopenmp -o hello.bin hello.c

MPI, Communication Libraries, OpenMP

We provide several communication libraries:

  • Intel MPI
  • OpenMPI

As Intel MPI is the communication library recommended by the system vendor, currently only documentation for Intel MPI is provided, except for application specific documentation.

OpenMP support is available with the compilers from Intel and GNU.

Using the batch system

To run your applications on the systems, you need to go through our batch system/scheduler: Slurm. The scheduler uses meta information about the job (requested node and core count, wall time, etc.) and then runs your program on the compute nodes, once the resources are available and your job is next in line. For a more in depth introduction, visit our Slurm documentation.

We distinguish two kinds of jobs:

  • Interactive job execution
  • Job script execution

Resource specification

To request resources, there are multiple flags to be used when submitting the job.

...

-p <name>

...

For using compute resources interactively, e.g. to follow the execution of MPI programs, the following steps are required. Note that non-interactive batch jobs via job scripts (see below) are the primary way of using the compute resources.

  1. A resource allocation for interactive usage has to be requested first with the salloc --interactive command which should also include your resource requirements.
  2. When salloc successfully allocated the requested resources, you have to issue an additional srun command to work one of the allocated nodes (see example below) if you want to work on the compute node.
  3. Afterwards, srun or MPI launch commands, like mpirun or mpiexec, can be used to start parallel programs (see according user guides)
Codeblock
languagetext
blogin1 ~ $ salloc -t 00:10:00 -p standard96:test -N2 --tasks-per-node 24
salloc: Granted job allocation [...]
salloc: Waiting for resource configuration
salloc: Nodes bcn[1001,1003] are ready for job
# To get a shell on one of the allocated nodes
blogin1 ~ $ srun --pty --interactive --preserve-env ${SHELL}
bcn1001 ~ $ srun hostname | sort | uniq -c
     24 bcn1001
     24 bcn1003
bcn1001 ~ $ exit
# Exit a second time for Berlin/Lise 
blogin1:~ > exit
salloc: Relinquishing job allocation [...]

Job scripts

Please go to our webpage CPU partition "Lise" for more details about job scripts. For introduction, standard batch system jobs are executed applying the following steps:

  1. Provide (write) a batch job script, see the examples below.
  2. Submit the job script with the command sbatch (sbatch jobscript.sh)
  3. Monitor and control the job execution, e.g. with the commands squeue and scancel (cancel the job).

A job script is a script (written in bash, ksh or csh syntax) containing Slurm keywords which are used as arguments for the command sbatch.

...

titleIntel MPI Job Script

Requesting 4 nodes in the medium partition with 96 cores (no hyperthreading) for 10 minutes, using Intel MPI.

Codeblock
languagebash
linenumberstrue
#!/bin/bash
#SBATCH -t 00:10:00
#SBATCH -N 4
#SBATCH --tasks-per-node 96
#SBATCH -p standard96

module load impi
export SLURM_CPU_BIND=none  # important when using "mpirun" from Intel-MPI!
							# Do NOT use this with srun!
export I_MPI_HYDRA_TOPOLIB=ipl
export I_MPI_HYDRA_BRANCH_COUNT=-1

mpirun hello_world > hello.output

...

titleOpenMP job

Requesting 1 large node with 96 CPUs (physical cores) for 20 minutes, and then using 192 hyperthreads

Codeblock
languagebash
linenumberstrue
#!/bin/bash
#SBATCH -t 00:20:00
#SBATCH -N 1
#SBATCH --cpus-per-task=96
#SBATCH -p large96:test

# This binds each thread to one core
export OMP_PROC_BIND=TRUE
# Number of threads as given by -c / --cpus-per-task
export OMP_NUM_THREADS=$(($SLURM_CPUS_PER_TASK * 2))
export KMP_AFFINITY=verbose,scatter

hello_world > hello.output

Job Accounting

Accounting gives you more information about job accounting.

...

File systems

Each complex has the following file systems available. More information about Quota, usage, and best pratices are available on Fixing Quota Issues. Hints for data transfer are given here.

  • Home file system with 340 TiByte capacity containing $HOME directories /home/${USER}/
  • Lustre parallel file system with 8.1 PiByte capacity containing
    • $WORK directories /scratch/usr/${USER}/
    • $TMPDIR directories /scratch/tmp/${USER}/
    • project data directories /scratch/projects/<projectID>/ (not yet available)
  • Tape archive with 120 TiByte capacity (accessible on the login nodes, only)


Info
Best practices for using WORK as a lustre filesystem: https://www.nas.nasa.gov/hecc/support/kb/lustre-best-practices_226.html


Info
Hints for fair usage of the shared WORK ressource: Metadata Usage on WORK