Intel MPI on CPU CLX

Intel MPI on CPU CLX

Content

Code Compilation

For code compilation you can choose one of the two compilers - Intel oneAPI or GNU. Both compilers are able to include the Intel MPI library.

Intel one API compiler

module load intel module load impi mpiicx -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.c mpiifx -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.f90 mpiicpx -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.cpp
module load intel module load impi mpiicx -fopenmp -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.c mpiifx -fopenmp -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.f90 mpiicpx -fopenmp -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.cpp

GNU compiler

module load gcc module load impi mpigcc -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.c mpif90 -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.f90 mpigxx -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.cpp
module load gcc module load impi mpigcc -fopenmp -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.c mpif90 -fopenmp -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.f90 mpigxx -fopenmp -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.cpp

Slurm job script

You need to start the MPI parallelized code on the system. You can choose between

  • using mpirun and

  • using srun.

Using mpirun

Using mpirun the pinning is controlled by the MPI library. Pinning by SLURM you need to switch off by adding export SLURM_CPU_BIND=none.

MPI only

#!/bin/bash #SBATCH --nodes=2 #SBATCH --partition=standard96:test module load impi/2019.5 export SLURM_CPU_BIND=none mpirun -ppn 96 ./hello.bin
#!/bin/bash #SBATCH --nodes=2 #SBATCH --partition=standard96:test module load impi/2019.5 export SLURM_CPU_BIND=none export I_MPI_PIN_DOMAIN=core export I_MPI_PIN_ORDER=scatter mpirun -ppn 48 ./hello.bin
#!/bin/bash #SBATCH --nodes=2 #SBATCH --partition=standard96:test module load impi/2019.5 export SLURM_CPU_BIND=none mpirun -ppn 192 ./hello.bin

MPI, OpenMP

You can run one code compiled with MPI and OpenMP. The examples cover the setup

  • 2 nodes,

  • 4 processes per node, 24 threads per process.

#!/bin/bash #SBATCH --nodes=2 #SBATCH --partition=standard96:test module load impi/2019.5 export SLURM_CPU_BIND=none export OMP_NUM_THREADS=24 mpirun -ppn 4 ./hello.bin

The example covers the setup

  • 2 nodes,

  • 4 processes per node, 12 threads per process.

#!/bin/bash #SBATCH --nodes=2 #SBATCH --partition=standard96:test module load impi/2019.5 export SLURM_CPU_BIND=none export OMP_PROC_BIND=spread export OMP_NUM_THREADS=12 mpirun -ppn 4 ./hello.bin

The example covers the setup

  • 2 nodes,

  • 4 processes per node using hyperthreading,

  • 48 threads per process.

#!/bin/bash #SBATCH --nodes=2 #SBATCH --partition=standard96:test module load impi/2019.5 export SLURM_CPU_BIND=none export OMP_PROC_BIND=spread export OMP_NUM_THREADS=48 mpirun -ppn 4 ./hello.bin

Using srun

MPI only

#!/bin/bash #SBATCH --nodes=2 #SBATCH --partition=standard96:test srun --ntasks-per-node=96 ./hello.bin
#!/bin/bash #SBATCH --nodes=2 #SBATCH --partition=standard96:test srun --ntasks-per-node=48 ./hello.bin

MPI, OpenMP

You can run one code compiled with MPI and OpenMP. The example covers the setup

  • 2 nodes,

  • 4 processes per node, 24 threads per process.

#!/bin/bash #SBATCH --nodes=2 #SBATCH --partition=standard96:test export OMP_PROC_BIND=spread export OMP_NUM_THREADS=24 srun --ntasks-per-node=4 --cpus-per-task=48 ./hello.bin

The example covers the setup

  • 2 nodes,

  • 4 processes per node, 12 threads per process.

#!/bin/bash #SBATCH --nodes=2 #SBATCH --partition=standard96:test export OMP_PROC_BIND=spread export OMP_NUM_THREADS=12 srun --ntasks-per-node=4 --cpus-per-task=24 ./hello.bin

The example covers the setup

  • 2 nodes,

  • 4 processes per node using hyperthreading,

  • 48 threads per process.

#!/bin/bash #SBATCH --nodes=2 #SBATCH --partition=standard96:test export OMP_PROC_BIND=spread export OMP_NUM_THREADS=48 srun --ntasks-per-node=4 --cpus-per-task=48 ./hello.bin