Next-Gen Technology Pool

Next-Gen Technology Pool

 

 

The NHR@ZIB Next-Generation Technology Pool of systems serves for the exploration and evaluation of new technologies for HPC and AI workloads. NHR@ZIB has a strong partnership with various vendors sharing the common goal to give experienced users insights and hands-on to future technologies.

Systems

 

NextSilicon Maverick-2

 

Hostnames: maverick[1-2]

One dataflow engine hardware accelerator card per server (accelerator cards provided by ParTec).

Server Board Layout

 

Hardware Configuration

CPU

2x AMD EPYC 9124 (16c, 3,0 GHz)

Memory

256 GB (DDR5-4800 RDIMM)

Storage

1,6 TB NVMe SSD

Network

1 Gb/s Ethernet

Software

  • OS: Rocky Linux 9

 

Next Profiler GUI

After starting nextsystemd, the web interface is available on port 3001. Use ssh port forwarding to make it reachable from your laptop. Then, open http://127.0.0.1:3001/ in your local browser.

$ ssh -L3001:127.0.0.1:3001 -J ngt-login maverick1

 

 

NVIDIA Grace CPU Superchip Servers

Hostnames: grace[1-2]

Hardware

CPU

2x NVIDIA Grace CPU Superchip (144 cores total)

RAM

480 GB LPDDR5X-4800Mhz ECC

Disk

2TB 

Interconnect

Infiniband NDR link between grace1 and grace2

Hosts with Intel Optane Memory

Hostnames: apass{1,2}

The two systems apass{1,2}  are equipped with Intel Optane Memory components (first generation Apache Pass): Storage Class Memory modules (SCM/NVRAM) and SSDs. The main difference between the two hosts is the memory capacity.

The Optane Memory, i.e. each of the systems, can be configured in two (three) modes:

  • Memory Mode: The Optane Memory is exposed as RAM to the OS. Although the hardware technology is different to DRAM, this mode allows transparent usage of the SCM memory. The main benefits are the gained memory capacity and easy usage (no modification of applications required). The DRAM memory acts as cache for Optane Memory. In this mode, the SCM is effectively not persistent.

  • AppDirect Mode: The Optane Memory is exposed as block device(s) to the OS (usually as /dev/pmemX, depending on actual configuration). On such block devices, a file system can be created which should support the direct access (DAX) option. This allows to map data on the persistent Optane Memory into an application's virtual address space while avoiding the OS' page cache. Thus, direct media access to the SCM is possible with load/store operations. Note that data in the Optane Memory is effectively cleared when the mode is changed. Therefore, data on /dev/pmemX should be considered as ephemeral.

  • (Mixed/Hybrid Mode: The System can also be configured to provide a portion of the Optane capacity for the Memory Mode and another portion for AppDirect mode.)

By default: apass1 is configured in Memory Mode while apass2 is configured in AppDirect Mode. If you need a different configuration contact S. Christgau @Steffen Christgau (Unlicensed) . The mount points for the persistent memory are usually /mnt/pmemX. X often matches the NUMA domain of the socket/processor the memory is attached to. To be sure run lstopo from the hwloc environment module. Not every pmem device might be mounted or accessible if a system is in AppDirect mode because other software (DAOS, e.g.) may exclusively grab a device. Check the output of mount to find mount points of /dev/pmemX.

The login message (message of the day) displays the mode in which the system is currently running in. You can also check the CurrentVolatileMode property in the /var/run/optane/state file. As a further simple check for the given mode, you can run free -h. If the total memory capacity is around or larger than 3 TB the system is in memory mode. Further, if /dev/pmem[01] exists, the AppDirect (or Mixed/Hybrid) mode is in effect.

Hardware

CPU

2x Intel Xeon Platinum 8260L (24c, 2,4 GHz) Cascade Lake SP

System

Inspur NF5280M5

Memory

apass1:

  • 384 GB DDR4 (12 x 32 GB Micron 36ASF4G72PZ-2G9E2 PC4-2933 DIMMs, configured to 2666 MT/s)

  • 3 TB Optane/Apache Pass NVRAM (12 x 258496 MB Intel NMA1XXD512GPSU4, 2666 MT/s)

apass2:

  • 768 GB (12 x 64 GB Samsung M393A8G40MB2-CVF PC4-2933 DIMMs, configured to 2666 MT/s)

  • 6 TB Optane/Apache Pass NVRAM (12 x 514624 MB Intel NMA1XXD512GPSU4 DIMMs, 2666 MT/s)

All DIMM slots fully populated with Optane/DRAM pairs (2:2:2 configuration). The Optane DIMMs are interleaved and a single region spans over them (per socket)

Storage

apass1:

  • 240 GB Intel SSDSC2KB24 SATA, for OS/Home

  • 1x 8T Intel SSDPE2KX080T8 NVMe SSD, Scratch (ephemeral, might be wiped/unavailable at any time)

apass2:

  • 240 GB Intel SSDSC2KB24 SATA, for OS/Home

  • 2x 8T Intel SSDPE2KX080T8 NVMe SSD, Scratch (ephemeral, might be wiped/unavailable at any time )

Network

Single Port Omni-Path HFI Adapter 100 Series (back-to-back connected via Cu cable)

Software

  • OS: Rocky Linux 8

  • more recent software (compilers, libraries, utilities) are available via environment modules (module avail).

 

 

click to enlarge
click to enlarge

Pic 1: Server Board Layout

NEC SX-Aurora TSUBASA A300-8

Hostname: aurora

Hardware Configuration

CPU

2x Intel Xeon Gold 6126 (12c, 2,6 GHz) Skylake

Memory

192 GB (DDR4-2666 ESS RDIMM)

Accelerators

8x NEC Vector Engines 1.0 (VE) Modell B
(4x per PCI root complex)

VE Configuration

per VE:

  • 8 cores, 1.4 GHz 

  • 48 GB HBM, 1600 MHz, 1.20 TB/s

  • peak pe: 2.15 TFLOPS

Network

2x 100 Gb/s IB between the two PCI root complexes

Software

  • OS: CentOS 7.9

  • VE OS: 2.4.3

  • NEC Compiler Suite: NCC 3.3.1 

Documentation

 

 

 

 

click to enlarge
click to enlarge

Pic 2: Aurora Server with 8 VE's

3rd Gen Intel Xeon Cooper Lake

Hostname: cpl

Cooper Lake is Intel's codename for the third-generation of their Xeon scalable processors, developed as the successor to Cascade Lake.

Improvements:

  • New bfloat16 instruction

  • Support for up to 12 DIMMs of DDR4 memory per CPU socket

 

Hardware Configuration

CPU

4x Intel Xeon Platinum 8353H (18c, 2,5GHz) CooperLake

Memory

384 GB (DDR4-3200 RDIMM)

Storage

18 TB NVMe Raid local scratch (/local)

Network

2x 10 Gb/s Ethernet

Software

  • OS: CentOS Stream 8

 

 

Intel Xeon Ice Lake

Hostname: icl

Hardware Configuration

CPU

2x Intel Xeon Platinum 8360Y (36c, 2,4 GHz) IceLake

Memory

512 GB (DDR4-3200 RDIMM)

Storage

18 TB NVMe Raid local scratch (/local)

Network

2x 10 Gb/s Ethernet

Software

  • OS: CentOS Stream 8

 

Pic 3: Server Board Layout

 

 

 

Access

To gain access to the Next-Generation Technology Pool, contact support@nhr.zib.de. Please give a short description of your intention and the system you intend to use.

 

Use Slurm on the NGT login node to access individual NGT systems.

 

The NGT login node is "login-ngt", reachable using ssh via our public login nodes "blogin.nhr.zib.de" (replace USERNAME with your NHR@ZIB account name):

$ ssh -J USERNAME@blogin.nhr.zib.de USERNAME@login-ngt

 

Make use of the ssh-agent to avoid repeated prompts for the passphrase (ALL keys used to access the NHR@ZIB must have a passphrase). Run "ssh-agent" to start the agent and load your default key. Or, if your ssh key is in ~/.ssh/id_rsa_nhr, run:

$ ssh-add ~/.ssh/id_rsa_nhr

 

With a suitable ssh config, you can jump to the NGT login node using one simple command:

$ ssh login-ngt

 

The ssh config in ~/.ssh/config looks like this (replace USERNAME with your NHR@ZIB account name):

Host login-ngt ProxyJump %r@blogin.nhr.zib.de Hostname login-ngt User USERNAME IdentityFile ~/.ssh/id_rsa_nhr

 

Unused compute nodes are shut down. Slurm will start nodes when needed. Depending on the node this takes 2..5 minutes.

Use sinfo to query the node status. In the following example, "icl" is up an running, "cpl" is powered down to save energy (indicated by the "~" mark at the end):

login$ sinfo -N NODELIST NODES PARTITION STATE icl 1 icl idle cpl 1 cpl idle~

 

To start an interactive session on a compute node, use srun.

login$ srun --pty -picl bash -ls icl$

 

Alternatively, you can use "salloc" to start and allocate a node:

login$ salloc -picl

 

When a node is up, direct ssh access is still possible, but needs login-ngt as jump host. An example ssh-config (for node "cpl") is:

Host cpl-ngt ProxyJump %r@blogin.nhr.zib.de,%r@login-ngt Hostname cpl.ngt.nhr.zib.de User USERNAME IdentityFile ~/.ssh/id_rsa_nhr

Software

For some of the systems, e.g. the Nvidia Grace nodes, additional software is available in environment modules which need to be made available first using a provided shell script.

source /net/aws/zzz_ngt_aws_modules.sh