Inhalt

style	none

The migration to the new OS is organised in three consecutive phases. It is expected to be complete by the end of ~~July~~ September.

The first phase starts with 2 login nodes and 112 compute nodes already migrated to Rocky Linux 9 for testing. The other nodes remain available under CentOS 7 for continued production.
After the test phase, a major fraction of nodes will be switched to Rocky Linux 9 to allow for general job production under the new OS.
During the last phase, only a few nodes still remain under CentOS 7. At the very end, they will be migrated to Rocky Linux 9, too.

During the migration phase the use of Rocky Linux 9 "clx" compute nodes ~~will be~~ was free of charge.

Current migration state: complete

nodes	CentOS 7	Rocky Linux 9
login	blogin[1-6]	blogin[71-8]
compute (384 GB RAM)	256-	688948
compute (768 GB RAM)	-	320
compute (1536 GB RAM)	-	2	0

Latest news

date	subject
2024-09-30	migration of remaining nodes from CentOS 7 to Rocky Linux 9
2024-09-16	generic login name “blogin” resolves to blogin[3-6]
2024-08-14	migration of blogin[3-6]
2024-07-30	migration of another 576 standard compute nodes to Rocky Linux 9
2024-07-03	official start of the migration phase with 2 login and 112 compute nodes running Rocky Linux 9

...

CentOS 7	Rocky Linux 9
old partition name	new partition name	current job limits
● `standard96`	● `cpu-clx`	672 512 nodes, 12h 12 h wall time
● `standard96:test`	● `cpu-clx:test`	32 16 nodes, 1 h wall time
● `standard96:ssd`	● `cpu-clx:ssd`	50 nodes, 12 h wall time
● `large96`	● `cpu-clx:large`	32 nodes, 48 h wall time
● `large96:test`
● `large96:shared`
● `huge96`	● `cpu-clx:huge`	1 node, 48 h wall time

( ● available ● closed/not available yet )

Jobs submitted without a partition name are placed in the default partition. The old default was standard96, the new default is cpu-clx.

Software and environment modules

...

node hardware and node names
communication network (Intel Omnipath)
file systems (HOME, WORK, PERM) and disk quotas
environment modules system (still based on Tcl, a.k.a. “Tmod”)
access credentials (user IDs, SSH keys) and project IDs
charge rates and CPU time accounting (early migrators' jobs ~~are~~ were free of charge)
Lise’s Nvidia-A100 and Intel-PVC partitions

...

For users of SLURM’s srun job launcher:
Open MPI 5.x has dropped support for the PMI-2 API, it solely depends on PMIx to bootstrap MPI processes. For this reason the environment setting was changed from SLURM_MPI_TYPE=pmi2 to SLURM_MPI_TYPE=pmix, so binaries linked against Open MPI can be started as usual “out of the box” using srun mybinary. For the case of a binary linked against Intel-MPI, this works too when a recent version (≥2021.11) of Intel-MPI has been used. If an older version of Intel-MPI has been used, and relinking/recompiling is not possible, one can follow the workaround for PMI-2 with srun as described in the Q&A section below. Switching from srun to mpirun instead should also be considered.
Using more processes per node than available physical cores (PPN > 96; hyperthreads) with the OPX providerwhen defining FI_PROVIDER=opx:
The OPX provider currently does not support using hyperthreads/PPN > 96 on the clx partitions. Doing so may result in segmentation faults in libfabric during process startup. If a high number of PPN is really required, the libfabric provider has to be changed back to PSM2 by setting re-defining FI_PROVIDER=psm2(which is the default setting). Note that the usage of hyperthreads may not be advisable. We encourage users to test performance before using more threads than available physical cores.
Note that Open MPI’s mpirun/exec defaults to use all hyperthreads if a Slurm job/allocation is used that does not explicitely sets explicitly set --ntasks-per-node (or similar options).

...

Erweitern

title	I have loaded the "intel/2024.2" environment module, but still neither the icc nor the icpc compiler is found. Why that?

Starting with the 2022.2 release of Intel’s oneapi toolkits, the icc and icpc “classic” compilers (C/C++) have been marked as “deprecated”, see here. Corresponding user warnings

icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.

were generated when using icc or icpc. The 2024.x releases of the Intel oneapi toolkits do not contain icc and icpc any longer. Users need to switch to Intel’s “next generation” icx and icpx compilers, respectively. They accept almost all of the “classic compiler” switches. More information is available from Intel’s porting guide.

Erweitern

title	I cannot establish an ssh connection to the login nodes running Rocky Linux 9. Do I need to generate a new ssh key?

No, your ssh key remains valid. We have seen this kind of problem for Windows users with an outdated version of PuTTY. Updating to a more recent PuTTY (≥ 0.81) solved this problem. The same holds for WinSCP which needs to be up-to-date, too.

Erweitern

title	My jobs generate no output, they seem to hang. A few days ago they were working fine. What happened?

This behaviour is observed for jobs submitted to the old CentOS 7 partitions (see the table above). Please make sure you submit such jobs on blogin1 or on blogin2 which currently still run CentOS, too. The generic node name “blogin” resolves to login nodes already running Rocky Linux 9 - they should not be used for job submissions to the old CentOS 7 partitions.

Versionen im Vergleich

Alte Version 47

Neue Version Aktuell

Schlüssel

Current migration state: complete

Latest news

Software and environment modules

Seitenvergleich

Versionen im Vergleich

Alte Version 47

Neue Version Aktuell

Schlüssel

Current migration state: complete

Latest news

Software and environment modules