Floating point exception with Intel MPI 2019.x using one task per node
Problem
When using Intel MPI 2019 (impi/2019.x
) to start MPI jobs executing one task per node, the jobs aborts immediately with an error message
srun: error: gcnXXXX: task 0: Floating point exception srun: error: gcnYYYY: task 1: Floating point exception [...]
Solution
This is due to a problem with Intel's Hydra process manager. A workaround is to assign the environment variable I_MPI_HYDRA_TOPOLIB
the value ipl
(internal default value hwloc
).
#!/bin/bash #SBATCH [...] export I_MPI_HYDRA_TOPOLIB=ipl [...]
Related articles