...
Popular tools such as Pytorch
, TensorFlow
, and JAX
can be used with the Intel distribution for Python (use the offline installer on the login nodes) together with certain special framework-specific extensions. Environments can be separately prepared for each framework below for use with Intel GPUs. Note that the module intel/2024.0.0
(under sw.pvc
) must be loaded for these frameworks to be installed or run properly.
Hinweis |
---|
The latest Intel AI tools have specific Intel GPU driver requirements. Currently, only the PVC compute nodes |
Pytorch
Load the Intel OneAPI module and create a new conda environment within your Intel python distribution:
Codeblock |
---|
module load intel/2024.0.0
conda create -n intel_pytorch_gpu python=3.9
conda activate intel_pytorch_gpu |
Once the new environment has been activated, the following commands install Pytorch
:
Codeblock |
---|
python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ |
This installs Pytorch
together with Intel extension for Pytorch necessary to run non-CUDA operations on Intel GPUs. On a compute node, the presence of GPUs can be assessed:
...
language | py |
---|
...
We also offer a standalone module (intel_AI_tools/2024.0.0
) that loads a conda
installation with the following pre-installed, Intel GPU/XPU-ready environments:
intel_pytorch_2.1.0a0
intel_tensorflow_2.14.0
intel_jax_0.4.20
Hinweis |
---|
Please note that PVC nodes currently run on Rocky 8 linux, and so only python versions <=3.9 are supported. |
Info |
---|
NumPy 2.0.0 breaks binary backwards compatibility. If Numpy-related runtime errors are encountered, please consider downgrading to a version <2.0.0 |
Pytorch
Load the Intel OneAPI module and create a new conda environment within your Intel python distribution:
Codeblock |
---|
module load intel/2024.0.0
conda create -n intel_pytorch_gpu python=3.9
conda activate intel_pytorch_gpu |
Once the new environment has been activated, the following commands install Pytorch
:
Codeblock |
---|
python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ |
This installs Pytorch
together with Intel extension for Pytorch necessary to run non-CUDA operations on Intel GPUs. On a compute node, the presence of GPUs can be assessed:
Codeblock | ||
---|---|---|
| ||
Python 3.9.18 (tags/v3.9.18-26-g6b320c3b2f6-dirty:6b320c3b2f6, Sep 28 2023, 00:35:27)
[GCC 13.2.0] :: Intel Corporation on linux
(null)Type "help", "copyright", "credits" or "license" for more information.
Intel(R) Distribution for Python is brought to you by Intel Corporation.
Please check out: https://software.intel.com/en-us/python-distribution
>>> import torch
>>> import intel_extension_for_pytorch as ipex
My guessed rank = 0
>>> [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())]
[0]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[1]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[2]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[3]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[4]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[5]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[6]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[7]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[None, None, None, None, None, None, None, None] |
Examples of how to use the Intel extension for Pytorch
can be found here.
TensorFlow
Similar to Pytorch
, an Intel extension for TensorFlow exists. To prepare a TensorFlow
environment for use with Intel GPUs, first create a new conda environment:
...
Codeblock |
---|
pip install tensorflow==2.14.0
pip install --upgrade intel-extension-for-tensorflow[xpu]==2.14.0 |
This installs TensorFlow
together with it's Intel extension necessary to run non-CUDA operations on Intel GPUs. On a compute node, the presence of GPUs can be assessed:
Codeblock | ||
---|---|---|
| ||
Python 3.9.18 (tags/v3.9.18-26-g6b320c3b2f6-dirty:6b320c3b2f6, Sep 28 2023, 00:35:27)
[GCC 13.2.0] :: Intel Corporation on linux
(null)Type "help", "copyright", "credits" or "license" for more information.
Intel(R) Distribution for Python is brought to you by Intel Corporation.
Please check out: https://software.intel.com/en-us/python-distribution
>>> import tensorflow
2024-02-09 14:26:07.737940: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-09 14:26:07.740082: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-09 14:26:07.764245: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-09 14:26:07.764268: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-09 14:26:07.764290: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-09 14:26:07.769201: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-09 14:26:07.769345: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-09 14:26:08.459403: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-09 14:26:09.416471: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow* GPU backend is loaded.
2024-02-09 14:26:09.457055: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow* AVX512 CPU backend is loaded.
2024-02-09 14:26:09.551955: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2024-02-09 14:26:09.552267: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552272: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552276: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552279: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552283: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552286: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552290: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552293: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device. |
Examples of how to use the Intel extension for TensorFlow
can be found here.
JAX
Hinweis |
---|
Intel XPU support is still experimental for JAX. |
Like Pytorch
and TensorFlow
, JAX
also has an extension via OpenXLA. To prepare a JAX
environment for use with Intel GPUs, first create a new conda environment:
Codeblock |
---|
module load intel/2024.0.0
conda create -n intel_jax_gpu python=3.9
conda activate intel_jax_gpu |
Once the environment is activated, the following commands install JAX
Codeblock |
---|
pip install jax==0.4.20 jaxlib==0.4.20
pip install --upgrade intel-extension-for-openxla |
This installs JAX
together with its Intel extension necessary to run non-CUDA operations on Intel GPUs. On a compute node, the presence of GPUs can be assessed:
Codeblock | ||
---|---|---|
| ||
Python 3.9.18 (main, Sep 11 2023, 13:41:44)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jax
>>> print("jax.local_devices(): ", jax.local_devices())
Platform 'xpu' is experimental and not all JAX functionality may be correctly supported!
jax.local_devices(): [xpu(id=0), xpu(id=1), xpu(id=2), xpu(id=3), xpu(id=4), xpu(id=5), xpu(id=6), xpu(id=7)] |
...
/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552286: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552290: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552293: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device. |
Examples of how to use the Intel extension for TensorFlow
can be found here.
JAX
Hinweis |
---|
Intel XPU support is still experimental for JAX, as of version 0.4.20 |
Like Pytorch
and TensorFlow
, JAX
also has an extension via OpenXLA. To prepare a JAX
environment for use with Intel GPUs, first create a new conda environment:
Codeblock |
---|
module load intel/2024.0.0
conda create -n intel_jax_gpu python=3.9
conda activate intel_jax_gpu |
Once the environment is activated, the following commands install JAX
Codeblock |
---|
pip install numpy==1.24.4
pip install jax=0.4.20 jaxlib=0.4.20
pip install intel_extension_for_openxla==0.2.1 |
This installs JAX
together with its Intel extension necessary to run non-CUDA operations on Intel GPUs. On a compute node, the presence of GPUs can be assessed:
Codeblock | ||
---|---|---|
| ||
Python 3.9.18 (main, Sep 11 2023, 13:41:44)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jax
>>> print("jax.local_devices(): ", jax.local_devices())
Platform 'xpu' is experimental and not all JAX functionality may be correctly supported!
jax.local_devices(): [xpu(id=0), xpu(id=1), xpu(id=2), xpu(id=3), xpu(id=4), xpu(id=5), xpu(id=6), xpu(id=7)] |
Examples for using the Intel extension for JAX can be found here.
Distributed Training
multigpu and multinode jobs can be executed using the following strategy in a job submission script:
Codeblock |
---|
module load intel/2024.0.0
module load impi
export CCL_ROOT=/sw/compiler/intel/oneapi/ccl/2021.12
export LD_LIBRARY_PATH=$I_MPI_ROOT/lib:$LD_LIBRARY_PATH
hnode=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)
export MASTER_ADDR=$(scontrol getaddrs $hnode | cut -d' ' -f 2 | cut -d':' -f 1)
export MASTER_PORT=29500 |
It is advantageous to define the GPU tile usage (each Intel Max 1550 has two compute “tiles”) using affinity masks, wherein the format GPU_ID.TILE_ID
(zero-base index) specifies which GPU(s) and tile(s) to use. Eg, two use two GPUs and four tiles, one can specify:
Codeblock |
---|
export ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
export ZE_AFFINITY_MASK=0.0,0.1,1.0,1.1 |
To use four GPUs and eight tiles, one would specify:
Codeblock |
---|
export ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
export ZE_AFFINITY_MASK=0.0,0.1,1.0,1.1,2.0,2.1,3.0,3.1 |
These specifications are applied to all nodes of a job. For more information, and alternative modes, please see the intel level-zero documentation.
Intel MPI can then be used to distribute and run your job, eg:
Codeblock |
---|
mpirun -np 8 -ppn 8 your_exe your_exe_flags |