|
| 1 | +| Proposal ID | CP023 | |
| 2 | +|-------------|--------| |
| 3 | +| Name | SYCL Context Destruction Callback | |
| 4 | +| Date of Creation | 23 January 2020 | |
| 5 | +| Target | SYCL Next | |
| 6 | +| Current Status | _Work in progress_ | |
| 7 | +| Reply-to | Stuart Adams <stuart.adams@codeplay.com> | |
| 8 | +| Original author | Stuart Adams <stuart.adams@codeplay.com> | |
| 9 | +| Contributors | Stuart Adams <stuart.adams@codeplay.com> | |
| 10 | + |
| 11 | +# SYCL Context Destruction Callback |
| 12 | + |
| 13 | +## Motivation |
| 14 | + |
| 15 | +[SYCL's new interoperability features](https://github.com/KhronosGroup/SYCL-Shared/blob/master/proposals/sycl_generalization.md) enable it to be integrated into legacy applications. |
| 16 | +A developer can use the `get_native` function to access an object's native handle for |
| 17 | +use in existing code. However, developers currently have no way to bind the lifetime of |
| 18 | +non-SYCL state to the lifetime of their SYCL contexts, making the adoption of SYCL in |
| 19 | +legacy codebases difficult. |
| 20 | + |
| 21 | +## Overview |
| 22 | + |
| 23 | +Several problems can arise upon the destruction of the application's `sycl::context`. |
| 24 | +All native handles that were related to the context are invalidated, and the application |
| 25 | +must release any resources it has allocated to store them. Further issues can arise if |
| 26 | +the native handles exhibit different behavior than their SYCL counterparts, for instance, |
| 27 | +if a native context is thread-bound or globally accessible. While it is possible to write an |
| 28 | +ad-hoc solution, the SYCL standard does not mandate when a `sycl::context` is |
| 29 | +destroyed, making it difficult to implement a solution that works across different SYCL |
| 30 | +implementations. |
| 31 | + |
| 32 | +This proposal suggests that the `sycl::context` class be extended. A new member |
| 33 | +function, `set_destruction_callback` is described. Developers will use this function to |
| 34 | +register a callback, allowing them to respond to the destruction of the context in a way |
| 35 | +that is appropriate for their application. |
| 36 | + |
| 37 | +## Examples |
| 38 | + |
| 39 | +### CUDA Libraries |
| 40 | + |
| 41 | +This code sample demonstrates interop with an imaginary legacy codebase, LegacyLib, |
| 42 | +which uses CUDA to offload expensive computation to the GPU. LegacyLibHandle_t |
| 43 | +is a pointer type to an opaque structure holding the LegacyLib library context, which is |
| 44 | +itself bound to a specific CUDA context. This style of API can be seen in [cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-api/index.html#cudnnHandle_t) and [cuBLAS](https://docs.nvidia.com/cuda/cublas/index.html#cublashandle_t). |
| 45 | +Different approaches to this solution already exist in the landscape of heterogeneous programming [1][2][3][4][5]. |
| 46 | + |
| 47 | +```cpp |
| 48 | +#include <SYCL/sycl.hpp> |
| 49 | +#include <unordered_map> |
| 50 | +#include "LegacyLib.h" |
| 51 | + |
| 52 | +class handle_map { |
| 53 | +public: |
| 54 | + static LegacyLibHandle_t get_lib_handle(CUcontext context) { |
| 55 | + auto it = data.find(context); |
| 56 | + |
| 57 | + if(it != data.end()) { |
| 58 | + LegacyLibHandle_t handle = it->second; |
| 59 | + return handle; |
| 60 | + } |
| 61 | + |
| 62 | + LegacyLibHandle_t handle; |
| 63 | + legacyLibCreate(&handle); |
| 64 | + data[context] = handle; |
| 65 | + return handle; |
| 66 | + } |
| 67 | + |
| 68 | + static void destroy_lib_handle(CUcontext context) { |
| 69 | + auto it = data.find(context); |
| 70 | + |
| 71 | + if(it != data.end()) { |
| 72 | + LegacyLibHandle_t handle = it->second; |
| 73 | + legacyLibDestroy(handle); |
| 74 | + data.erase(it); |
| 75 | + } |
| 76 | + } |
| 77 | + |
| 78 | +private: |
| 79 | + static std::unordered_map<CUcontext, LegacyLibHandle_t> data; |
| 80 | +}; |
| 81 | + |
| 82 | +void runA(const sycl::queue& queue) |
| 83 | +{ |
| 84 | + CUstream stream = get_native<sycl::backend::cuda>(queue); |
| 85 | + CUcontext context = get_native<sycl::backend::cuda>(queue.get_context()); |
| 86 | + LegacyLibHandle_t handle = handle_map::get_lib_handle(context); |
| 87 | + |
| 88 | + legacyLibSetStream(handle, stream); |
| 89 | + legacyLibRunA(handle); |
| 90 | +} |
| 91 | + |
| 92 | +void runB(const sycl::queue& queue); |
| 93 | +void runC(const sycl::queue& queue); |
| 94 | + |
| 95 | +int main() { |
| 96 | + sycl::gpu_selector selector; |
| 97 | + |
| 98 | + // run on multiple threads: |
| 99 | + { |
| 100 | + sycl::queue queue(selector); |
| 101 | + runA(queue); |
| 102 | + runB(queue); |
| 103 | + runC(queue); |
| 104 | + } |
| 105 | +} |
| 106 | +``` |
| 107 | +
|
| 108 | +With this technique, a global mapping of contexts to the lib handles is established. |
| 109 | +The application can use this to access the correct handle by passing in the active |
| 110 | +context. If we ignore the issue of thread-safe access to the map, a few problems |
| 111 | +remain. In a complex multithreaded application with multiple sycl contexts, it will |
| 112 | +become difficult ensure the correctness of an application with this design. The |
| 113 | +map will grow for every context used, and the library handles will not be released. |
| 114 | +Given `CUcontext` objects are thread-bound, it is not clear how to call |
| 115 | +`handle_map::destroy_lib_handle` and free the legacy lib objects when they are |
| 116 | +no longer in use. If a `sycl::context` is destroyed before the data is removed from |
| 117 | +the map, it will contain invalid data. Despite these issues, this technique can be seen |
| 118 | +in a surprising number of libraries. |
| 119 | +
|
| 120 | +In a SYCL application, the developer cannot reliably know when the SYCL runtime will |
| 121 | +destroy a context and invalidate the handles. Ideally, all the resources and handles |
| 122 | +reliant on a `sycl::context` object should be destroyed once its destructor is called. |
| 123 | +To achieve this, the class must be extended to allow developers to register a callback. |
| 124 | +
|
| 125 | +```cpp |
| 126 | +int main() { |
| 127 | + sycl::gpu_selector selector; |
| 128 | +
|
| 129 | + // run on multiple threads: |
| 130 | + { |
| 131 | + sycl::queue queue(selector); |
| 132 | + sycl::context context = queue.get_context(); |
| 133 | +
|
| 134 | + context.set_destruction_callback([](const sycl::context& this_context) { |
| 135 | + CUcontext context = get_native<sycl::backend::cuda>(this_context); |
| 136 | + handle_map::destroy_lib_handle(context); |
| 137 | + }); |
| 138 | +
|
| 139 | + runA(queue); |
| 140 | + runB(queue); |
| 141 | + runC(queue); |
| 142 | + } |
| 143 | +} |
| 144 | +``` |
| 145 | + |
| 146 | +With this new version, all state related to the `sycl::context` object is cleaned up |
| 147 | +automatically when its destructor is called. |
| 148 | + |
| 149 | +Similar problems exist in other libraries and APIs in the heterogeneous programming |
| 150 | +ecosystem, requiring the user to explicitly release handles that have been passed to |
| 151 | +an existing library. The ```set_destruction_callback``` function will also be useful in these cases. |
| 152 | + |
| 153 | +### ArrayFire |
| 154 | +The ArrayFire GPGPU library provides functions to enable the use of user-defined OpenCL |
| 155 | +contexts for internal operations. The user is responsible for calling `afcl_add_device_context` |
| 156 | +and `afcl_delete_device_context` [6]. The developer must ensure that `afcl_delete_device_context` |
| 157 | +is called before the runtime destroys the `sycl::context`. |
| 158 | + |
| 159 | +### OpenVX |
| 160 | +OpenVX defines an extension for interoperability with OpenCL, defining the `vxCreateContextFromCL` |
| 161 | +function to create an OpenVX context from a user-generated OpenCL context and command queue [7]. |
| 162 | +If this function is used then the developer must ensure that `vxReleaseContext` is called before |
| 163 | +the runtime destroys the `sycl::context`. |
| 164 | + |
| 165 | +### DNNL |
| 166 | +DNNL can enable OpenCL interop through the `dnnl_engine_create_ocl` function, which allows the |
| 167 | +developer to create an execution engine from an existing OpenCL context and device [8]. If this function |
| 168 | +is used then the developer must ensure that `mkldnn_engine_destroy` is called before the runtime |
| 169 | +destroys the `sycl::context`. |
| 170 | + |
| 171 | +## Specification Changes |
| 172 | +4.9.3.1 Context interface |
| 173 | +```cpp |
| 174 | +namespace sycl { |
| 175 | + class context { |
| 176 | + public: |
| 177 | + template<typename F = std::nullptr_t> |
| 178 | + void set_destruction_callback(F&& callback = nullptr); |
| 179 | + }; |
| 180 | +} |
| 181 | +``` |
| 182 | +
|
| 183 | +| Member function | Description | |
| 184 | +|-------------|--------| |
| 185 | +| `template<typename F = std::nullptr_t> void set_destruction_callback(F&& callback = nullptr);` | Registers a callable object, `callback`, with the context. The callable object will be invoked once *immediately before* the native context is destroyed. `F` must be a callable type with the signature `void(const sycl::context&)`. It must be well-formed for a `sycl::context` destructor to call the callback using the form `callback(*this);`. Only one callback may be registered - subsequent calls to this member function will overwrite the previously registered callback. If `F` is `std::nullptr_t`, no callback is registered and any previous callback is destroyed. It is undefined behavior if an instance of any SYCL class with reference semantics (see 4.6.2) is stored in a function object, or captured in the closure of a lambda that is used as a callback. |
| 186 | +<center>Table 4.15: Member functions of the context class</center> |
| 187 | +
|
| 188 | +##References |
| 189 | +
|
| 190 | +[1] JuliaGPU: https://github.com/JuliaGPU/CuArrays.jl/blob/4dedd0fadcf260cf008a9e73d8702e2f259b2cfc/src/blas/CUBLAS.jl#L30 |
| 191 | +[2] Intel ISAAC: https://github.com/intel/isaac/blob/b0a265ee45337f92f1e8e9f2fb08a057292b0240/lib/driver/dispatch.cpp#L223 |
| 192 | +[3] ISAAC: https://github.com/ptillet/isaac/blob/8ea6498a841fe1e63a4518797d332f538a9ba37e/lib/driver/dispatch.cpp#L216 |
| 193 | +[4] ND4J: https://github.com/deeplearning4j/nd4j/blob/8f005bcecb240d1fbb83b9d390ad801d1d3b6933/nd4j-backends/nd4j-backend-impls/nd4j-cuda/src/main/java/org/nd4j/jita/allocator/context/impl/BasicContextPool.java#L43 |
| 194 | +[5] GADGETRON: https://github.com/gadgetron/gadgetron/blob/fa39a340558a032bea10db07af00fbe6da5ff4cb/toolboxes/core/gpu/CUBLASContextProvider.cpp#L35 |
| 195 | +[6] ArrayFire: http://arrayfire.org/docs/group__opencl__mat.htm#ga37969cfa49416bbdb25910d15c454d01 |
| 196 | +[7] OpenVX: https://www.khronos.org/registry/OpenVX/extensions/vx_khr_opencl_interop/1.0/vx_khr_opencl_interop_1_0.html#_vxcreatecontextfromcl |
| 197 | +[8] DNNL: https://github.com/intel/mkl-dnn/blob/master/doc/advanced/opencl_interoperability.md#c-api-extensions-for-interoperability-with-opencl-1 |
0 commit comments