Seitenvergleich

...

Codeblock

language	bash
title	Lise (using srun)For compute nodes with Rocky Linux 9

#!/bin/bash 
#SBATCH --time=12:00:00
#SBATCH --partition=cpu-clx
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --cpus-per-task=4
#SBATCH --job-name=cp2k

export SLURM_CPU_BIND=none
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}  

# Binding OpenMP threads
export OMP_PLACES=cores
export OMP_PROC_BIND=close

# Binding MPI tasks
export I_MPI_PIN=yes
export I_MPI_PIN_DOMAIN=omp
export I_MPI_PIN_CELL=core

# Our tests have shown that CP2K has better performance with psm2 as libfabric provider
# Check if this also apply to your system
# To stick to the default provider, comment out the following line
export FI_PROVIDER=psm2

module load intel/2021.2 impi/2021.7.113 cp2k/20232024.21
srunmpirun cp2k.psmp input > output

Codeblock

language	bash
title	Lise For compute nodes with CentOS 7 (using mpirun)

#!/bin/bash 
#SBATCH --time=12:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --cpus-per-task=4
#SBATCH --job-name=cp2k

export SLURM_CPU_BIND=none
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}  

# Binding OpenMP threads
export OMP_PLACES=cores
export OMP_PROC_BIND=close

# Binding MPI tasks
export I_MPI_PIN=yes
export I_MPI_PIN_DOMAIN=omp
export I_MPI_PIN_CELL=core

module load intel/2021.2 impi/2021.7.1 cp2k/2023.2
mpirun cp2k.psmp input > output

...

Codeblock

language	bash
title	Lise For compute nodes with CentOS 7 (using mpirun): on srun)

#!/bin/bash 
#SBATCH --time=12:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --cpus-per-task=4
#SBATCH --job-name=cp2k

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

module load intel/2021.2 impi/2021.7.1 cp2k/2023.2
srun cp2k.psmp input > output

Codeblock

language	bash
title	For Nvidia A100 GPU nodes

#!/bin/bash 
#SBATCH --partition=gpu-a100  
#SBATCH --time=12:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=18
#SBATCH --job-name=cp2k

export SLURM_CPU_BIND=none
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}    
export OMP_PLACES=cores
export OMP_PROC_BIND=close

module load gcc/11.3.0 openmpi/gcc.11/4.1.4 cuda/11.8 cp2k/2023.2

# gpu_bind.sh (see the following script) should be placed inside the same directory where cp2k will be executed
# Don't forget to make gpu_bind.sh executable by running: chmod +x gpu_bind.sh 
mpirun --bind-to core --map-by numa:PE=${SLURM_CPUS_PER_TASK} ./gpu_bind.sh cp2k.psmp input > output

...

Codeblock

language	bash
title	gpu_bind.sh

#!/bin/bash
export CUDA_VISIBLE_DEVICES=$OMPI_COMM_WORLD_LOCAL_RANK
$@

HTML Kommentar

Commenting out this block, as Lise and Emmy have separate documentation pages now.

Codeblock

language	bash
title	Emmy (using srun)

#!/bin/bash
#SBATCH --time=12:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --cpus-per-task=4
#SBATCH --job-name=cp2k

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

module load intel/2022.2 impi/2021.6 cp2k/2023.1
srun cp2k.psmp input > output

Remark on OpenMP

Depending on the problem size, it may happen that the code stops may stop with a segmentation fault due to insufficient stack size or due to threads exceeding their stack space. To circumvent this, we recommend inserting in the jobscriptjob script:

Codeblock

language	bash

export OMP_STACKSIZE=512M
ulimit -s unlimited

...

Versionen im Vergleich

Alte Version 25

Neue Version 26

Schlüssel

Remark on OpenMP