Zum Ende der Metadaten springen
Zum Anfang der Metadaten

Sie zeigen eine alte Version dieser Seite an. Zeigen Sie die aktuelle Version an.

Unterschiede anzeigen Seitenhistorie anzeigen

« Vorherige Version anzeigen Version 81 Nächste Version anzeigen »

Table of Contents

Partitions on system Lise

Compute system Lise at NHR@ZIB contains different Compute partitions for CPUs and GPUs. Please choose your partition which affects specific configurations of

Login nodes

To login to system Lise, please

Example CPU partition
office $ ssh -i $HOME/.ssh/id_rsa_nhr nhr_username@blogin.nhr.zib.de
Enter passphrase for key '...':
blogin1 $

File systems

Each complex has the following file systems available. More information about Quota, usage, and best pratices are available on Fixing Quota Issues. Hints for data transfer are given here.

  • Home file system with 340 TiByte capacity containing $HOME directories /home/${USER}/
  • Lustre parallel file system with 8.1 PiByte capacity containing
    • $WORK directories /scratch/usr/${USER}/
    • $TMPDIR directories /scratch/tmp/${USER}/
    • project data directories /scratch/projects/<projectID>/ (not yet available)
  • Tape archive with 120 TiByte capacity (accessible on the login nodes, only)
Best practices for using WORK as a lustre filesystem: https://www.nas.nasa.gov/hecc/support/kb/lustre-best-practices_226.html
Hints for fair usage of the shared WORK ressource: Metadata Usage on WORK

Software and environment modules

The webpage Software gives you information about available software on the NHR systems.

NHR provides a number of compilers and software packages for parallel computing and (serial) pre- and postprocessing:

  • Compilers: Intel, GNU
  • Libraries: NetCDF, LAPACK, ScaLAPACK, BLAS, FFTW, ...
  • Debuggers: Allinea DDT, Roguewave TotalView...
  • Tools: octave, python, R ...
  • Visualisation: mostly tools to investigate gridded data sets from earth-system modelling
  • Application software: mostly for engineering and chemistry (molecular dynamics)

Environment Modules are used to manage the access to software/libraries. The module command offers the following functionality.

  1. Show lists of available software
  2. Enables access to software in different versions


Example: Show the currently available software and access the Intel compiles
blogin1:~ $ module avail
...
blogin1:~ $ module load intel
Module for Intel Parallel Studio XE Composer Edition (version 2019 Update 5) loaded.
blogin1:~ $ module list
Currently Loaded Modulefiles:
 1) sw.skl   2) slurm   3) HLRNenv   4) intel/19.0.5(default)

To avoid conflicts between different compilers and compiler versions, builds of most important libraries are provided for all compilers and major release numbers.

Program build

Please visit the specific workflow pages of our Compute partitions.

Using slurm batch system

To run your applications on the systems, you need to go through our batch system/scheduler: Slurm. The scheduler uses meta information about the job (requested node and core count, wall time, etc.) and then runs your program on the compute nodes, once the resources are available and your job is next in line. For a more in depth introduction, visit our Slurm documentation.

We distinguish two kinds of jobs:

  • Interactive job execution
  • Job script execution

Resource specification

To request resources, there are multiple flags to be used when submitting the job.


ParameterDefault Value
# tasks-n #1
# nodes-N #1
# tasks per node--tasks-per-node #
partition

-p <name>

standard96
Timelimit-t hh:mm:ss12:00:00


Interactive jobs

For using compute resources interactively, e.g. to follow the execution of MPI programs, the following steps are required. Note that non-interactive batch jobs via job scripts (see below) are the primary way of using the compute resources.

  1. A resource allocation for interactive usage has to be requested first with the salloc --interactive command which should also include your resource requirements.
  2. When salloc successfully allocated the requested resources, you have to issue an additional srun command to work one of the allocated nodes (see example below) if you want to work on the compute node.
  3. Afterwards, srun or MPI launch commands, like mpirun or mpiexec, can be used to start parallel programs (see according user guides)
blogin1 ~ $ salloc -t 00:10:00 -p cpu-clx:test -N2 --tasks-per-node 24
salloc: Granted job allocation [...]
salloc: Waiting for resource configuration
salloc: Nodes bcn[1001,1003] are ready for job
# To get a shell on one of the allocated nodes
blogin1 ~ $ srun --pty --interactive --preserve-env ${SHELL}
bcn1001 ~ $ srun hostname | sort | uniq -c
     24 bcn1001
     24 bcn1003
bcn1001 ~ $ exit
# Exit a second time for Berlin/Lise 
blogin1:~ > exit
salloc: Relinquishing job allocation [...]

Job scripts

Please go to our webpage CPU CLX partition for more details about job scripts. For introduction, standard batch system jobs are executed applying the following steps:

  1. Provide (write) a batch job script, see the examples below.
  2. Submit the job script with the command sbatch (sbatch jobscript.sh)
  3. Monitor and control the job execution, e.g. with the commands squeue and scancel (cancel the job).

Job Accounting

Accounting gives you more information about job accounting.

Every batch job is accounted. The account (project) which is debited for a batch job can be specified using the sbatch parameter --account <account>. If a batch job does not state an account (project), a default is taken from the account database. It defaults to the personal project of the user, which has the same name as the user. Users may modify their default project by visiting the Portal NHR@ZIB.


  • Keine Stichwörter