SYCL

SYCL for C++, i.e. in a data-parallel and explicit fashion. Similar to CUDA but open and cross-platform.SYCL is an open industry and cross-platform programming standard for C++ in a data-parallel and explicit fashion. Different to OpenMP, SYCL provides for more control over the code, data movements, allocations and so forth that are actually executed on the GPU. There is no SYCL for C or Fortran.

The SYCL implementation for Intel GPUs is provided by oneAPI. On the PVC partition, SYCL code can be compiled with the icpx compiler from the intel environment module(s).

Migration of existing CUDA codes to SYCL can be assisted using the DPC++ Compatibility Tool. The dpct binary is available via the intel/... environment modules.

Note that SYCL codes can also be executed on Nvidia (and AMD) GPUs.

Minimal SYCL C++ Code Example:

The code below originates from https://intel.github.io/llvm-docs/GetStartedGuide.html or https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedGuide.md.

Compile the code for an Intel GPU with
icpx -fsycl minimal-sycl-code.cpp -o pvc.exe
and then run it via:
SYCL_PI_TRACE=1 ./pvc.exe
The environment variable SYCL_PI_TRACE adds SYCL debug information, e.g. the name of the used offload device.

The same code could be executed on a Nvidia GPU by loading thecudaenvironment module, recompiling via
icpx -fsycl -fsycl-targets=nvptx64-nvidia-cuda minimal-sycl-code.cpp -o a100.exe
and running
ONEAPI_DEVICE_SELECTOR=cuda:* SYCL_PI_TRACE=1 ./a100.exe

#include <sycl/sycl.hpp>

int main() {
  // Creating buffer of 4 elements to be used inside the kernel code
  sycl::buffer<size_t, 1> Buffer(4);

  // Creating SYCL queue
  sycl::queue Queue;

  // Size of index space for kernel
  sycl::range<1> NumOfWorkItems{Buffer.size()};

  // Submitting command group(work) to queue
  Queue.submit([&](sycl::handler &cgh) {
    // Getting write only access to the buffer on a device.
    sycl::accessor Accessor{Buffer, cgh, sycl::write_only};
    // Executing kernel
    cgh.parallel_for<class FillBuffer>(
        NumOfWorkItems, [=](sycl::id<1> WIid) {
          // Fill buffer with indexes.
          Accessor[WIid] = WIid.get(0);
        });
  });

  // Getting read only access to the buffer on the host.
  // Implicit barrier waiting for queue to complete the work.
  sycl::host_accessor HostAccessor{Buffer, sycl::read_only};

  // Check the results
  bool MismatchFound = false;
  for (size_t I = 0; I < Buffer.size(); ++I) {
    if (HostAccessor[I] != I) {
      std::cout << "The result is incorrect for element: " << I
                << " , expected: " << I << " , got: " << HostAccessor[I]
                << std::endl;
      MismatchFound = true;
    }
  }

  if (!MismatchFound) {
    std::cout << "The results are correct!" << std::endl;
  }

  return MismatchFound;
}