Inhalt | ||
---|---|---|
|
Preface
CentOS 7 The operating system “CentOS 7” has reached its end of life. For this reason the operating system (OS) of Lise's Lise’s CPU partition will be updated to Rocky “Rocky Linux 99”. This affects all login and compute nodes equipped with Intel Xeon Cascade Lake processors ("clx" for short), especially on CPU partition "Lise". Lise's GPU partitions (GPU -A100 partition and GPU-PVC partition) are not affected.
It is important for users to follow the action items specified below. Rocky Linux 9 introduces new versions of various system tools and libraries. Some codes compiled earlier under CentOS 7 might not be working under Rocky Linux 9 anymore. Thus, legacy versions of environment modules offered under CentOS 7 were not transferred to the new OS environment or have been replaced by more recent versions. To adapt
The migration to the new OS environment quickly, it is important for users to read this page and to follow the Action Items specified below.The OS migration is organised in three consecutive phases. It is expected to be complete by the end of July September.
The first phase starts with 2 login nodes and 112 compute nodes already
...
migrated to Rocky Linux 9 for testing
...
. The other nodes remain available under CentOS 7 for continued production.
...
After the test phase, a major fraction of nodes will be switched to Rocky Linux 9 to allow for general job production under the new OS.
...
During the last
...
phase,
...
only a few nodes still remain under CentOS 7. At the very end, they will be
...
migrated to Rocky Linux 9, too.
During the migration phase the use of Rocky Linux 9 "clx" compute nodes will be was free of charge. The migration is expected to be complete by the end of July.
Current migration state: complete
nodes | CentOS 7 | Rocky Linux 9 |
---|---|---|
login | blogin[1-6] | blogin[71-8] |
compute (384 GB RAM) | 832- | 112948 |
compute (768 GB RAM) | - | 320 |
compute (1536 GB RAM) | - | 20 |
Latest news
date | subject |
---|---|
2024-09-30 | migration of remaining nodes from CentOS 7 to Rocky Linux 9 |
2024-09-16 | generic login name “blogin” resolves to blogin[3-6] |
2024-08-14 | migration of blogin[3-6] |
2024-07-30 | migration of another 576 standard compute nodes to Rocky Linux 9 |
2024-07-03 | official start of the migration phase with 2 login and 112 compute nodes running Rocky Linux 9 |
...
SLURM partitions
CentOS 7 | Rocky Linux 9 | |
---|---|---|
old partition name | new partition name | current job limits |
● | ● | 512 nodes, 12 h wall time |
● | ● | 16 nodes, 1 h wall time |
● | ● | 50 nodes, 12 h wall time |
● | ● | 32 nodes, 48 h wall time |
● | ||
● | ||
● | ● | 1 node, 48 h wall time |
( ● available ● closed/not available yet )
Jobs submitted without a partition name are placed in the default partition. The old default was standard96
, the new default is cpu-clx
.
Software and environment modules
CentOS 7 | Rocky Linux 9 | |
---|---|---|
OS components | glibc 2.17 | glibc 2.34 |
Python 3.6 | Python 3.9 | |
GCC 4.8 | GCC 11.4 | |
bash 4.2 | bash 5.1 | |
check disk quota |
|
|
Environment modules version | 4.8 (Tmod) | 5.4 (Tmod) |
Modules loaded initially |
|
|
|
| |
|
| |
compiler modules |
|
|
|
| |
MPI modules |
|
|
|
|
...
CentOS 7 | Rocky Linux 9 | ||
---|---|---|---|
| (undefined, local | ||
|
| ||
(undefined) |
| ||
(undefined | , default is
| (undefined) |
|
| |||
|
...
node hardware and node names
communication network (Intel Omnipath)
file systems (HOME, WORK, PERM) and disk quotas
environment modules system (still based on Tcl, a.k.a. “Tmod”)
access credentials (user IDs, SSH keys) and project IDs
charge rates and CPU time accounting (early migrators' jobs
arewere free of charge)Lise’s Nvidia-A100 and Intel-PVC partitions
...
For users of SLURM’s
srun
job launcher:
Open MPI 5.x has dropped support for the PMI-2 API, it solely depends on PMIx to bootstrap MPI processes. For this reason the environment setting was changed fromSLURM_MPI_TYPE=pmi2
toSLURM_MPI_TYPE=pmix
, so binaries linked against Open MPI can be started as usual “out of the box” usingsrun mybinary
. For the case of a binary linked against Intel-MPI, this works too when a recent version (≥2021.11) of Intel-MPI has been used. If an older version of Intel-MPI has been used, and relinking/recompiling is not possible, one can follow the workaround for PMI-2 withsrun
as described in the Q&A section below. Switching fromsrun
tompirun
instead should also be considered.Using more processes per node than available physical cores (PPN > 96; hyperthreads) with the OPX providerwhen defining
FI_PROVIDER=opx
:
The OPX provider currently does not support using hyperthreads/PPN > 96 on the clx partitions. Doing so may result in segmentation faults in libfabric during process startup. If a high number of PPN is really required, the libfabric provider has to be changed back to PSM2 by setting re-definingFI_PROVIDER=psm2
(which is the default setting). Note that the usage of hyperthreads may not be advisable. We encourage users to test performance before using more threads than available physical cores.
Note that Open MPI’smpirun/exec
defaults to use all hyperthreads if a Slurm job/allocation is used that does not explicitly set--ntasks-per-node
(or similar options).
Action items for users
All users of Lise are recommended to
log in to an already migrated login node and (see the current state table), for example to
blogin7.nhr.zib.de
(fully qualified domain name for external ssh connections to Lise) or simplyblogin7
(from within Lise)get familiar with the new environment
check self-compiled software for continued operability
relink/recompile software as needed
adapt and test job scripts and workflows
submit test jobs to the new "cpu-clx:test" SLURM partition
read the Q&A section and ask for support in case of further questions, problems, or software requests (support@nhr.zib.de)
...
Erweitern | ||
---|---|---|
| ||
Yes. Simply say |
Erweitern | ||
---|---|---|
| ||
This is because we need to define |
Erweitern | ||
---|---|---|
| ||
Starting with the 2022.2 release of Intel’s oneapi toolkits, the icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message. were generated when using |
Erweitern | ||
---|---|---|
| ||
No, your ssh key remains valid. We have seen this kind of problem for Windows users with an outdated version of PuTTY. Updating to a more recent PuTTY (≥ 0.81) solved this problem. The same holds for WinSCP which needs to be up-to-date, too. |
Erweitern | ||
---|---|---|
| ||
This behaviour is observed for jobs submitted to the old CentOS 7 partitions (see the table above). Please make sure you submit such jobs on |