...
VASP is an MPI-parallel application. We recommend to use mpirun as the job starter for VASP. The environment module providing the mpirun command associated with a particular VASP installation needs to be loaded ahead of the environment module for VASP.
VASP Version | User Group | VASP Modulefile | Compute Partitions | MPI Requirement | CPU/GPU | Lise/Emmy | Supported Features |
---|---|---|---|---|---|---|---|
5.4.4 with patch 16052018 | vasp5_2 | vasp/5.4.4.p1 | CentOS 7 | impi/2019.5 | / | / | |
6.4.1 | vasp6 | vasp/6.4.1 | CentOS 7 | impi/2021.7.1 | / | / | OpenMP, HDF5, Wannier90, Libxc |
6.4.2 | vasp6 | vasp/6.4.2 | CentOS 7 | impi/2021.7.1 | / | / | OpenMP, HDF5, Wannier90, Libxc, DFTD4 van-der-Waals functional |
6.4.3 | vasp6 | vasp/6.4.3 | Rocky Linux 9 | impi/2021.13 | / | / | OpenMP, HDF5, Wannier90, Libxc, DFTD4 van-der-Waals functional, libbeef |
6.4.1 | vasp6 | vasp/6.4.1 | GPU A100 | nvhpc-hpcx/23.1 | / | / | OpenMP, HDF5, Wannier90 |
...
HTML Kommentar | |||||||
---|---|---|---|---|---|---|---|
Commenting out this block, since Berlin and Göttingen have separate documentation pages now.
|
The following example shows a job script that will run on the Nvidia A100 GPU nodes (Berlin). Per default, VASP will use one GPU per MPI task. If you plan to use 4 GPUs per node, you need to set 4 MPI tasks per node. Then, set the number of OpenMP threads to 18 (because 4x18=72 which is the number of CPU cores on GPU A100 partition) to speed up your calculation. This, however, also requires proper process pinning.
Codeblock | ||||
---|---|---|---|---|
| ||||
#!/bin/bash #SBATCH --time =12:00:00 #SBATCH --nodes =2 #SBATCH --tasks-per-node 96 export SLURM_CPU_BIND=none=4 #SBATCH --cpus-per-task=18 #SBATCH --partition=gpu-a100 # Set the number of OpenMP threads as given by the SLURM parameter "cpus-per-task" export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} # Binding OpenMP threads export OMP_PLACES=cores export OMP_PROC_BIND=close # Avoid hcoll as MPI collective algorithm export OMPI_MCA_coll="^hcoll" # You may need to adjust this limit, depending on the case export OMP_STACKSIZE=512m module load impinvhpc-hpcx/201923.51 module load vasp/56.4.4.p1 mpirun 1 # Carefully adjust ppr:2, if you don't use 4 MPI processes per node mpirun --bind-to core --map-by ppr:2:socket:PE=${SLURM_CPUS_PER_TASK} vasp_std |
The following job script exemplifies how to run vasp 6.4.1 3 making use of OpenMP threads. Here, we have 2 OpenMP threads and 48 MPI tasks per node (the product of these 2 numbers should ideally be equal to the number of CPU cores per node).
...
Codeblock | ||||
---|---|---|---|---|
| ||||
#!/bin/bash #SBATCH --time=12:00:00 #SBATCH --nodes=2 #SBATCH --tasks-per-node=48 #SBATCH --cpus-per-task=2 #SBATCH --partition=standard96 export SLURM_CPU_BIND=none # Set the number of OpenMP threads as given by the SLURM parameter "cpus-per-task" export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK # Adjust the maximum stack size of OpenMP threads export OMP_STACKSIZE=512m # Binding OpenMP threads export OMP_PLACES=cores export OMP_PROC_BIND=close # Binding MPI tasks export I_MPI_PIN=yes export I_MPI_PIN_DOMAIN=omp export I_MPI_PIN_CELL=core module load impi/2021.7.1 module load vasp/6.4.1 mpirun vasp_std |
The following example shows a job script that will run on the Nvidia A100 GPU nodes (Berlin). Per default, VASP will use one GPU per MPI task. If you plan to use 4 GPUs per node, you need to set 4 MPI tasks per node. Then, set the number of OpenMP threads to 18 (because 4x18=72 which is the number of CPU cores on GPU A100 partition) to speed up your calculation. This, however, also requires proper process pinning.last example demonstrates how to run a job with vasp 5.4.4.p1 on nodes withe CentOS7
Codeblock | ||||
---|---|---|---|---|
| ||||
#!/bin/bash #SBATCH --time= 12:00:00 #SBATCH --nodes= 2 #SBATCH --tasks-per-node=4 #SBATCH --cpus-per-task=18 #SBATCH --partition=gpu-a100 # Set the number of OpenMP threads as given by the SLURM parameter "cpus-per-task" export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} # Binding OpenMP threads export OMP_PLACES=cores export OMP_PROC_BIND=close # Avoid hcoll as MPI collective algorithm export OMPI_MCA_coll="^hcoll" # You may need to adjust this limit, depending on the case export OMP_STACKSIZE=512m module load nvhpc-hpcx/23.1 96 export SLURM_CPU_BIND=none module load impi/2019.5 module load vasp/65.4.1 # Carefully adjust ppr:2, if you don't use 4 MPI processes per node mpirun --bind-to core --map-by ppr:2:socket:PE=${SLURM_CPUS_PER_TASK}4.p1 mpirun vasp_std |