Versionen im Vergleich

Schlüssel

  • Diese Zeile wurde hinzugefügt.
  • Diese Zeile wurde entfernt.
  • Formatierung wurde geändert.

...

Popular tools such as Pytorch, TensorFlow, and JAX can be used with the Intel distribution for Python (use the offline installer on the login nodes) together with certain special framework-specific extensions. Environments can be separately prepared for each framework below for use with Intel GPUs. Note that the module intel/2024.0.0 (under sw.pvc) must be loaded for these frameworks to be installed or run properly.

Hinweis

The latest Intel AI tools have specific Intel GPU driver requirements. Currently, only the PVC compute nodes bgi1007 and bgi1008 have these drivers installed and are reserved under pvcup. Please use the bgnlogin* or bgilogin* login nodes when preparing your environments.

Pytorch

Load the Intel OneAPI module and create a new conda environment within your Intel python distribution:

Codeblock
module load intel/2024.0.0

conda create -n intel_pytorch_gpu python=3.9
conda activate intel_pytorch_gpu

Once the new environment has been activated, the following commands install Pytorch:

...


We also offer a standalone module (intel_AI_tools/2024.0.0) that loads a conda installation with the following pre-installed, Intel GPU/XPU-ready environments:

  • intel_pytorch_2.1.0a0

  • intel_tensorflow_2.14.0

  • intel_jax_0.4.20

Hinweis

Please note that PVC nodes currently run on Rocky 8 linux, and so only python versions <=3.9 are supported.

Info

NumPy 2.0.0 breaks binary backwards compatibility. If Numpy-related runtime errors are encountered, please consider downgrading to a version <2.0.0

Pytorch

Load the Intel OneAPI module and create a new conda environment within your Intel python distribution:

Codeblock
module load intel/2024.0.0

conda create -n intel_pytorch_gpu python=3.9
conda activate intel_pytorch_gpu

Once the new environment has been activated, the following commands install Pytorch:

Codeblock
python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

This installs Pytorch together with Intel extension for Pytorch necessary to run non-CUDA operations on Intel GPUs. On a compute node, the presence of GPUs can be assessed:

...

Codeblock
pip install tensorflow==2.14.0
pip install --upgrade intel-extension-for-tensorflow[xpu]

...

==2.14.0

This installs TensorFlow together with it's Intel extension necessary to run non-CUDA operations on Intel GPUs. On a compute node, the presence of GPUs can be assessed:

...

Hinweis

Intel XPU support is still experimental for JAX, as of version 0.4.20

Like Pytorch and TensorFlow, JAX also has an extension via OpenXLA. To prepare a JAX environment for use with Intel GPUs, first create a new conda environment:

...

Once the environment is activated, the following commands install JAX

Codeblock
pip install numpy==1.24.4
pip install jax==0.4.20 jaxlib==0.4.20
pip install --upgrade intel-_extension-_for-openxla_openxla==0.2.1

This installs JAX together with its Intel extension necessary to run non-CUDA operations on Intel GPUs. On a compute node, the presence of GPUs can be assessed:

...

Examples for using the Intel extension for JAX can be found here.

Distributed Training

multigpu and multinode jobs can be executed using the following strategy in a job submission script:

Codeblock
module load intel/2024.0.0
module load impi

export CCL_ROOT=/sw/compiler/intel/oneapi/ccl/2021.12
export LD_LIBRARY_PATH=$I_MPI_ROOT/lib:$LD_LIBRARY_PATH
hnode=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)
export MASTER_ADDR=$(scontrol getaddrs $hnode | cut -d' ' -f 2 | cut -d':' -f 1)
export MASTER_PORT=29500

It is advantageous to define the GPU tile usage (each Intel Max 1550 has two compute “tiles”) using affinity masks, wherein the format GPU_ID.TILE_ID (zero-base index) specifies which GPU(s) and tile(s) to use. Eg, two use two GPUs and four tiles, one can specify:

Codeblock
export ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
export ZE_AFFINITY_MASK=0.0,0.1,1.0,1.1

To use four GPUs and eight tiles, one would specify:

Codeblock
export ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
export ZE_AFFINITY_MASK=0.0,0.1,1.0,1.1,2.0,2.1,3.0,3.1

These specifications are applied to all nodes of a job. For more information, and alternative modes, please see the intel level-zero documentation.

Intel MPI can then be used to distribute and run your job, eg:

Codeblock
mpirun -np 8 -ppn 8 your_exe your_exe_flags