Quick Guide to SYCL Implementations

Intel

5.00/5 (1 vote)

Jun 1, 2023

CPOL

7 min read

3593

A high level look at different implementations of SYCL

Offloading part of an application workload to a dedicated accelerator has become a common optimization practice. For example, developers render photorealistic graphics on GPUs for efficiency. This trend has pushed traditional computing towards heterogeneous systems.

However, optimizing workloads within heterogeneous systems often requires developers to learn countless hardware-specific libraries and programming languages. Therefore, they can significantly benefit from an abstraction layer that can deploy applications on heterogeneous systems without rewriting the code. This is what the SYCL* (pronounced “sickle”) specification achieves.

What is SYCL?

It’s a cross-platform abstraction layer that allows algorithms to switch between hardware accelerators—such as CPUs, GPUs, and FPGAs—without changing a single line of code. SYCL is a royalty-free open standard developed by the Khronos Group that allows developers to program heterogeneous architectures in standard C++. In addition, its programming model uses a single source, allowing both host and kernel code to be written in a single source file.

The popularity of SYCL has given rise to an ecosystem of different implementations, such as:

ComputeCpp*
Open SYCL (formerly known as hipSYCL)
neoSYCL
triSYCL
Intel® oneAPI tools

This article takes a high-level look at each, presenting specific use cases to demonstrate key capabilities and unique features to help developers choose the ideal SYCL implementation for an application.

A Rundown of Popular SYCL Implementations

Since the various SYCL implementations tend to follow similar specifications, code should compile and run successfully regardless of the chosen implementation. However, they don’t all provide the same features because they are developing at different paces, focus on different architectures, or have diverged from the latest SYCL specification. It’s essential to note that all implementations support execution on CPUs with the most popular existing architectures.

Let’s explore the benefits of each implementation, which architectures they support, and their main advantages from a developer’s perspective.

Codeplay ComputeCpp*

ComputeCpp* is a SYCL 1.2.1-conformant implementation developed and maintained by Codeplay It is available as a community edition to promote the expansion of the SYCL ecosystem and, for more advanced users, there’s a professional edition that provides additional features, such as:

Offline kernel compilation
Program execution tracing
Multi-binary support
Additional development tools
Helpdesk developer support

ComputeCpp supports much OpenCL™ -compliant hardware, such as Intel^® CPUs, GPUs and FPGAs; AMD GPUs (with some limitations); Arm Mali*; IMG (formerly PowerVR); and Renesas R-Car* using SPIR and NVIDIA hardware through parallel thread execution (PTX). Codeplay is also working on adopting the latest SYCL 2020 specification features. This list shows the currently available features in ComputeCpp v2.11.0.

In a nutshell, ComputeCpp is a mature implementation of SYCL targeting a wide range of accelerators. The official repository provides a step-by-step guide to building and running the software development kit (SDK) samples and offers guidelines for integrating ComputeCpp with an existing application.

To learn more about ComputeCpp, consult the company’s collection of code samples.

Open SYCL

The Open SYCL implementation is a modern version of SYCL 2020 and supports older SYCL 1.2.1 features with some limitations. Open SYCL (formerly known as hipSYCL) has academic roots at the University of Heidelberg. Instead of using OpenCL, it relies on existing compiler toolchains such as CUDA to support NVIDIA devices and AMD’s Heterogeneous Interface for Portability (HIP). It has yet to become a fully SYCL-compliant implementation, but its mature support for NVIDIA and AMD hardware makes Open SYCL a solid choice for high-performance computing. In addition, it supports any CPU using OpenMP* and Intel GPUs using oneAPI Level Zero plus SPIR-V, which is still highly experimental.

The value of Open SYCL comes from how it aggregates multiple toolchains under a single SYCL interface. This allows developers to compile SYCL code alongside mixed CUDA and HIP code in the same source file. The Open SYCL implementation started as a hobby project and has now reached maturity, serving as a research platform to implement new features and improve existing ones.

For more details, check out Open SYCL’s official repository.

neoSYCL

The neoSYCL implementation is the only one that supports NEC’s SX-Aurora TSUBASA* (SX-AT) architecture. SX-Aurora has a vector engine that neoSYCL uses to automatically accelerate parts of the application pipeline. This implementation is based on the LLVM and Clang infrastructure and allows users to write modern C++ code. The current implementation of neoSYCL supports most of the core features of the SYCL 1.2.1 specification. However, it doesn’t support OpenCL-specific features such as image support. The developers at Takizawa lab plan to extend neoSYCL to different architectures like NVIDIA and AMD GPUs and FPGAs.

In short, neoSYCL focuses exclusively on the SX-Aurora TSUBASA architecture. Although this may currently limit its scope, neoSYCL is a young project in development that can bring a lot to the SYCL ecosystem.

If you want to learn more details, check out this research paper.

triSYCL

The triSYCL implementation originally started as an open source research project at AMD and is now primarily funded by Xilinx. Currently, this implementation of the SYCL 1.2.1 specification is incomplete and experimental. However, it has been used to provide feedback for SYCL 1.2, 1.2.1, 2.2, and 2020—and it builds on modern C++20, which makes it attractive to developers. The triSYCL implementation targets Xilinx FPGA architectures and Versal advanced compute acceleration platform (ACAP*) coarse-grained reconfigurable array. So, for focusing Xilinx device compilers, triSYCL is the way to go. In addition, triSYCL supports CPUs with OpenMP and more accelerators using OpenCL and SPIR or LLVM. There is also ongoing work to merge oneAPI’s SYCL implementation and triSYCL.

Conversely, the single-source compilation model may be beneficial for FPGAs to reduce compilation times. The triSYCL community also contributes constructive feedback for the newer specifications to the SYCL community. In addition, this implementation focuses on specific hardware that other implementations don’t support. However, it’s worth noting that it is wholly a research and development project—testing new features and ideas that may contribute to the SYCL ecosystem—rather than intended for an eventual “production” release.

Check out this code sample to learn more.

Intel® oneAPI Tools

oneAPI is a cross-industry specification for heterogeneous programming that implements the Khronos SYCL 2020 standard in the Intel® oneAPI DPC++ Library (oneDPL). It is a companion to the oneAPI DPC++/C++ Compiler. Its API can be understood as a combination of modern C++ and SYCL to program heterogeneous architectures effectively. Besides SYCL, oneAPI also includes additional tools and specific libraries that make it easier for new developers to take full advantage of SYCL’s potential.

The Intel® oneAPI Base Toolkit includes optimized libraries and advanced tools such as the Intel® DPC++ Compatibility Tool and SYCLomatic that convert CUDA code to SYCL. It also provides tools to analyze and debug your code while boosting productivity. Check out the hands-on demo on converting a functional CUDA implementation to SYCL.

The oneAPI DPC++ compiler is based on the LLVM compiler, which speeds up compilation times. It uses Clang, which provides a front end for the C, C++, Objective-C, and Objective-C++ programming languages compatible with the latest standards. With the help of SYCL and plugins for hardware abstraction layer translation, generated code can work for CPUs, GPUs, and FPGAs from different vendors.

Intel actively contributes to LLVM and Clang with specific optimizations to take full advantage of Intel's latest architectures. For this reason, Intel’s compiler should perform better on proprietary architectures than the base LLVM-plus-Clang compiler. (As an example, one interesting difference compared to most other SYCL implementations is that Codeplay’s oneAPI implementation for Nvidia GPUs targets the company’s hardware without going through OpenCL. However, ComputeCpp offers experimental support for NVIDIA GPUs using OpenCL and PTX.)

Intel’s implementations constantly refine the oneAPI specification to meet new industry standards. Therefore, some of the critical features in the SYCL 2020 specification originated in oneAPI and oneDPL. This difference makes oneAPI SYCL as expressed in oneDPL the most viable SYCL implementation for the end user who cares about performance in a new era of accelerated computing.

Conclusion

There is currently a wide range of SYCL implementations available that enable developers to get the most out of heterogeneous systems. Some implementations, such as triSYCL and neoSYCL, still lack essential features needed to meet certain industry standards. However, developers may want to consider them as they target specific hardware.

More mature implementations such as Codeplay ComputeCpp and DPC++ (the oneAPI implementation of SYCL) have much more to offer. DPC++ is at the forefront of innovation and constantly incorporates new features that eventually become part of the latest SYCL specification. It also provides advanced tools such as SYCLomatic and the Intel DPC++ Compatibility Tool that ease the migration of CUDA to SYCL and GPU vendor flexibility.

Try the Intel® oneAPI Base Toolkit to see the difference it can make in performance!

Get the Software

Get the standalone version of the Intel DPC++ Compatibility Tool or as part of the Intel® oneAPI Base Toolkit—a core set of tools and libraries for developing high-performance, data-centric applications across diverse architectures.