PyTorch

 

PyTorch_logo_icon.svg

PyTorch is a popular python deep learning/autodifferentiation/optimization library that has excellent GPU and CPU support. It features flexible eager mode execution, just-in-time compilation (“JIT”) support, and support for domain-specific tools (e.g., torchvision for image-based learning tasks). It can be loaded in a python environment, and the presence of GPU accelerators can be tested as such:

Python 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> >>> import torch >>> for i in range(torch.cuda.device_count()): ... print(torch.cuda.get_device_properties(i).name) ... NVIDIA A100-SXM4-80GB NVIDIA A100-SXM4-80GB NVIDIA A100-SXM4-80GB NVIDIA A100-SXM4-80GB

Extensions

The anaconda3/2023.09 module’s python distribution also contains some useful extensions to PyTorch :

  • PyTorch Lightning - Powerful, HPC-friendly, boilerplate-removing library for training, logging, and reproducibility with deep learning models.

  • PyTorch Geometric - Flexible graph neural network package for use in molecular/materials science, network science, and many other application domains of graph theory.

Examples

Examples of CPU, (multi) GPU, and multi-node training tasks for HPC environments can be found here. Below are reproduced examples for training convolutional neural network image classification models on the Fashion-MNIST dataset.

Setup (on login node):

This sets up some simple packages:

$ module load anaconda3/2023.09 $ conda activate base $ git clone https://github.com/Ruunyox/pytorch-hpc $ cd pytorch-hpc $ pip install --user .

1. Single node, single GPU:

We start with a training YAML file (fashion_mnist_conv_gpu.yaml) appropriate for PyTorch Lightning (note that a similar training jobs can be set up without PyTorch Lightning - see the official PyTorch tutorials for more granular examples):

Since only 1 GPU is needed, it is better to use the gpu-a100:shared partition and request just one GPU (gres=gpu:A100:1) rather than queuing for a full node with 4 GPUs. The following SLURM submission script details the options:

#! /bin/bash #SBATCH -J pyt_cli_test_conv_gpu #SBATCH -o pyt_cli_test_conv_gpu.out #SBATCH --time=00:30:00 #SBATCH --partition=gpu-a100:shared #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --gres=gpu:A100:1 #SBATCH --mem-per-cpu=1G #SBATCH --cpus-per-task=4 module load cuda/11.8 module load anaconda3/2023.09 conda activate base srun pythpc --config fashion_mnist_conv_gpu.yaml fit

and can be run using:

The results can be inspected using TensorBoard package (also included in the anaconda3/2023.09 module):

which can be viewed on your local machine via SSH tunneling:

Note: you may change the port 8877 to something else if needed. Alternatively, you may copy your events* logfiles to your local machine and inspect them with tensorboard there.

2. Single node, multiple GPUs

Adding more GPUs with Pytorch Lightning is as simple as setting:

In the training yaml (see fashion_mnist_conv_multi_gpu.yaml), and requesting a non-shared partition in the SBATCH options:

Remember that the number of nodes/GPUs requested through SLURM must match those requested in the PyTorch Lightning training YAML.

3. Multiple nodes, multiple GPUs

Training across multiple nodes with multiple GPUs on a cluster is seamless with Pytorch Lightning. Simply change the training YAML to include:

Which expects 2 nodes with 4 GPUs each, for a total of 8 GPUs, using a distributed data parallel strategy (see here for alternative PyTorch Lightning distributed training strategies). Accordingly, the SLURM submission script must now be changed to include: