Seitenvergleich

...

VASP is an MPI-parallel application. We recommend to use mpirun as the job starter for VASP. The environment module providing the mpirun command associated with a particular VASP installation needs to be loaded ahead of the VASP environment module for VASP.

VASP Version	User Group	VASP Modulefile	MPI Requirement	CPU/GPU	Lise/Emmy
5.4.4 with patch 16052018	vasp5_2	`vasp/5.4.4.p1`	`impi/2019.5`	/	/
6.4.1	vasp6	`vasp/6.4.1`	`impi/2021.7.1`	/	/
6.4.1	vasp6	`vasp/6.4.1`	`nvhpc-hpcx/23.1`	/	/

...

N.B.: The VTST script collection is not available from the vasp environment modules. Instead, it is provided by the vtstscripts environment module(s).

N.B.: The VASP version 6.4.1 x has been compiled with support for : OpenMP, HDF5, and Wannier90. The CPU version additionally supports additionally Libxc.

Example Jobscripts

Codeblock

language	bash
title	For Intel Skylake CPU compute nodes (Phase 1, Göttingen only):

#!/bin/bash
#SBATCH --time 12:00:00
#SBATCH --nodes 2
#SBATCH --tasks-per-node 40

export SLURM_CPU_BIND=none

module load impi/2019.5
module load vasp/5.4.4.p1

mpirun vasp_std

...

Codeblock

language	bash
title	For Intel Cascade Lake CPU compute nodes (Phase 2, Berlin):

#!/bin/bash
#SBATCH --time 12:00:00
#SBATCH --nodes 2
#SBATCH --tasks-per-node 96

export SLURM_CPU_BIND=none

module load impi/2019.5
module load vasp/5.4.4.p1

mpirun vasp_std

The following job - script exemplifies how to run vasp 6.4.1 making use of OpenMP threads: here. Here, we have 2 OpenMP threads and 48 MPI processes tasks per node (the product of these 2 numbers should ideally be equal to the number of CPU cores per node).

In many cases, running vasp VASP with the parallelization over MPI ranks alone can bring a already yields good performance. However, certain application cases can benefit from hybrid parallelization over MPI and OpenMP. A detailed discussion is found here. If you opt for hybrid parallelization, then please pay attention to the process pinning, as shown in the example below.

Codeblock

language	bash
title	For the GPU partition Intel Cascade Lake CPU compute nodes (Berlin)

#!/bin/bash
#SBATCH --time=12:00:00
#SBATCH --nodes=2
#SBATCH --tasks-per-node=48
#SBATCH --cpus-per-task=2
#SBATCH --partition=standard96
#SBATCH -A your_project_account

export SLURM_CPU_BIND=none

# Set the number of OpenMP threads as given by the slurmSLURM parameter "cpus-per-task"
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Adjust the maximum stack size of OpenMP threads
export OMP_STACKSIZE=512m

# Binding OpenMP threads
export OMP_PLACES=cores
export OMP_PROC_BIND=close

# Binding MPI rankstasks
export I_MPI_PIN=yes
export I_MPI_PIN_DOMAIN=omp
export I_MPI_PIN_CELL=core

module load impi/2021.7.1
module load vasp/6.4.1  

mpirun vasp_std

...

In the following example, we show a job - script that will run on GPUsthe Nvidia A100 GPU nodes (Berlin). Per default, vasp VASP will use one GPU per MPI processtask. If you plan to use 4 GPUs per node, you need to set 4 MPI tasks per node. Then, setting set the number of OpenMP threads to 18 (such that because 4x18=72 , which is the number of CPU cores in one node) may bring additional speedup to on these nodes) to speed up your calculation. However, this will happen only with proper This, however, also requires proper process pinning.

Codeblock

language	bash
title	For the Nvidia A100 GPU partition compute nodes (Berlin)

#!/bin/bash
#SBATCH --time=12:00:00
#SBATCH --nodes=2
#SBATCH --tasks-per-node=4
#SBATCH --cpus-per-task=18
#SBATCH --partition=gpu-a100
#SBATCH
-A your_project_account

export SLURM_CPU_BIND=none  

module load nvhpc-hpcx/23.1 vasp/6.4.1 
# Set the number of OpenMP threads as given by the SLURM parameter "cpus-per-task"
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# Binding OpenMP threads
export OMP_PLACES=cores
export OMP_PROC_BIND=close

# Avoid hcoll as MPI collective algorithm
export OMPI_MCA_coll="^hcoll"

# You may need to adjust this limit, depending on the case
export OMP_STACKSIZE=512m 

module load nvhpc-hpcx/23.1
module load vasp/6.4.1  

# Carefully adjust ppr:2, if you don't use 4 MPI processes per node
mpirun --bind-to core --map-by ppr:2:socket:PE=${SLURM_CPUS_PER_TASK} vasp_std

...

Versionen im Vergleich

Alte Version 4

Neue Version 5

Schlüssel

Example Jobscripts