...
Each node of the GPU A100 system is a combination of a host CPU and their four attached device GPUs. There is a wide range of software to support this hardware. We restrict our presentation to examples. For that, please visit our manual on
Job monitoring
A running job can be monitored interactively, directly on each of the compute nodes. Once you know the names of the job nodes you can login and monitor the host CPU as well as the GPUs.
Codeblock | ||
---|---|---|
| ||
bgnlogin1 $ squeue -u myaccount
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7748370 gpu-a100 a100_mpi myaccount R 1:23 2 bgn[1007,1017]
bgnlogin1 $ ssh bgn1007
bgn1007 $ top
bgn1007 $ nvidia-smi |
Software and environment modules
...
Codeblock | ||||
---|---|---|---|---|
| ||||
bgnlogin1 ~ $ module avail ... bgnlogin1 ~ $ module load gcc ... bgnlogin1 ~ $ module list Currently Loaded Modulefiles: 1) HLRNenv 2) sw.a100 3) slurm 4) gcc/11.3.0(default) |
...