Skip to content

Commit ee04180

Browse files
nyallocAerialMantis
authored andcommitted
Initial structure of a context destruction callback proposal
1 parent 7789a44 commit ee04180

File tree

1 file changed

+197
-0
lines changed

1 file changed

+197
-0
lines changed

destruction-callbacks/index.md

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
| Proposal ID | CP023 |
2+
|-------------|--------|
3+
| Name | SYCL Context Destruction Callback |
4+
| Date of Creation | 23 January 2020 |
5+
| Target | SYCL Next |
6+
| Current Status | _Work in progress_ |
7+
| Reply-to | Stuart Adams <stuart.adams@codeplay.com> |
8+
| Original author | Stuart Adams <stuart.adams@codeplay.com> |
9+
| Contributors | Stuart Adams <stuart.adams@codeplay.com> |
10+
11+
# SYCL Context Destruction Callback
12+
13+
## Motivation
14+
15+
[SYCL's new interoperability features](https://github.com/KhronosGroup/SYCL-Shared/blob/master/proposals/sycl_generalization.md) enable it to be integrated into legacy applications.
16+
A developer can use the `get_native` function to access an object's native handle for
17+
use in existing code. However, developers currently have no way to bind the lifetime of
18+
non-SYCL state to the lifetime of their SYCL contexts, making the adoption of SYCL in
19+
legacy codebases difficult.
20+
21+
## Overview
22+
23+
Several problems can arise upon the destruction of the application's `sycl::context`.
24+
All native handles that were related to the context are invalidated, and the application
25+
must release any resources it has allocated to store them. Further issues can arise if
26+
the native handles exhibit different behavior than their SYCL counterparts, for instance,
27+
if a native context is thread-bound or globally accessible. While it is possible to write an
28+
ad-hoc solution, the SYCL standard does not mandate when a `sycl::context` is
29+
destroyed, making it difficult to implement a solution that works across different SYCL
30+
implementations.
31+
32+
This proposal suggests that the `sycl::context` class be extended. A new member
33+
function, `set_destruction_callback` is described. Developers will use this function to
34+
register a callback, allowing them to respond to the destruction of the context in a way
35+
that is appropriate for their application.
36+
37+
## Examples
38+
39+
### CUDA Libraries
40+
41+
This code sample demonstrates interop with an imaginary legacy codebase, LegacyLib,
42+
which uses CUDA to offload expensive computation to the GPU. LegacyLibHandle_t
43+
is a pointer type to an opaque structure holding the LegacyLib library context, which is
44+
itself bound to a specific CUDA context. This style of API can be seen in [cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-api/index.html#cudnnHandle_t) and [cuBLAS](https://docs.nvidia.com/cuda/cublas/index.html#cublashandle_t).
45+
Different approaches to this solution already exist in the landscape of heterogeneous programming [1][2][3][4][5].
46+
47+
```cpp
48+
#include <SYCL/sycl.hpp>
49+
#include <unordered_map>
50+
#include "LegacyLib.h"
51+
52+
class handle_map {
53+
public:
54+
static LegacyLibHandle_t get_lib_handle(CUcontext context) {
55+
auto it = data.find(context);
56+
57+
if(it != data.end()) {
58+
LegacyLibHandle_t handle = it->second;
59+
return handle;
60+
}
61+
62+
LegacyLibHandle_t handle;
63+
legacyLibCreate(&handle);
64+
data[context] = handle;
65+
return handle;
66+
}
67+
68+
static void destroy_lib_handle(CUcontext context) {
69+
auto it = data.find(context);
70+
71+
if(it != data.end()) {
72+
LegacyLibHandle_t handle = it->second;
73+
legacyLibDestroy(handle);
74+
data.erase(it);
75+
}
76+
}
77+
78+
private:
79+
static std::unordered_map<CUcontext, LegacyLibHandle_t> data;
80+
};
81+
82+
void runA(const sycl::queue& queue)
83+
{
84+
CUstream stream = get_native<sycl::backend::cuda>(queue);
85+
CUcontext context = get_native<sycl::backend::cuda>(queue.get_context());
86+
LegacyLibHandle_t handle = handle_map::get_lib_handle(context);
87+
88+
legacyLibSetStream(handle, stream);
89+
legacyLibRunA(handle);
90+
}
91+
92+
void runB(const sycl::queue& queue);
93+
void runC(const sycl::queue& queue);
94+
95+
int main() {
96+
sycl::gpu_selector selector;
97+
98+
// run on multiple threads:
99+
{
100+
sycl::queue queue(selector);
101+
runA(queue);
102+
runB(queue);
103+
runC(queue);
104+
}
105+
}
106+
```
107+
108+
With this technique, a global mapping of contexts to the lib handles is established.
109+
The application can use this to access the correct handle by passing in the active
110+
context. If we ignore the issue of thread-safe access to the map, a few problems
111+
remain. In a complex multithreaded application with multiple sycl contexts, it will
112+
become difficult ensure the correctness of an application with this design. The
113+
map will grow for every context used, and the library handles will not be released.
114+
Given `CUcontext` objects are thread-bound, it is not clear how to call
115+
`handle_map::destroy_lib_handle` and free the legacy lib objects when they are
116+
no longer in use. If a `sycl::context` is destroyed before the data is removed from
117+
the map, it will contain invalid data. Despite these issues, this technique can be seen
118+
in a surprising number of libraries.
119+
120+
In a SYCL application, the developer cannot reliably know when the SYCL runtime will
121+
destroy a context and invalidate the handles. Ideally, all the resources and handles
122+
reliant on a `sycl::context` object should be destroyed once its destructor is called.
123+
To achieve this, the class must be extended to allow developers to register a callback.
124+
125+
```cpp
126+
int main() {
127+
sycl::gpu_selector selector;
128+
129+
// run on multiple threads:
130+
{
131+
sycl::queue queue(selector);
132+
sycl::context context = queue.get_context();
133+
134+
context.set_destruction_callback([](const sycl::context& this_context) {
135+
CUcontext context = get_native<sycl::backend::cuda>(this_context);
136+
handle_map::destroy_lib_handle(context);
137+
});
138+
139+
runA(queue);
140+
runB(queue);
141+
runC(queue);
142+
}
143+
}
144+
```
145+
146+
With this new version, all state related to the `sycl::context` object is cleaned up
147+
automatically when its destructor is called.
148+
149+
Similar problems exist in other libraries and APIs in the heterogeneous programming
150+
ecosystem, requiring the user to explicitly release handles that have been passed to
151+
an existing library. The ```set_destruction_callback``` function will also be useful in these cases.
152+
153+
### ArrayFire
154+
The ArrayFire GPGPU library provides functions to enable the use of user-defined OpenCL
155+
contexts for internal operations. The user is responsible for calling `afcl_add_device_context`
156+
and `afcl_delete_device_context` [6]. The developer must ensure that `afcl_delete_device_context`
157+
is called before the runtime destroys the `sycl::context`.
158+
159+
### OpenVX
160+
OpenVX defines an extension for interoperability with OpenCL, defining the `vxCreateContextFromCL`
161+
function to create an OpenVX context from a user-generated OpenCL context and command queue [7].
162+
If this function is used then the developer must ensure that `vxReleaseContext` is called before
163+
the runtime destroys the `sycl::context`.
164+
165+
### DNNL
166+
DNNL can enable OpenCL interop through the `dnnl_engine_create_ocl` function, which allows the
167+
developer to create an execution engine from an existing OpenCL context and device [8]. If this function
168+
is used then the developer must ensure that `mkldnn_engine_destroy` is called before the runtime
169+
destroys the `sycl::context`.
170+
171+
## Specification Changes
172+
4.9.3.1 Context interface
173+
```cpp
174+
namespace sycl {
175+
class context {
176+
public:
177+
template<typename F = std::nullptr_t>
178+
void set_destruction_callback(F&& callback = nullptr);
179+
};
180+
}
181+
```
182+
183+
| Member function | Description |
184+
|-------------|--------|
185+
| `template<typename F = std::nullptr_t> void set_destruction_callback(F&& callback = nullptr);` | Registers a callable object, `callback`, with the context. The callable object will be invoked once *immediately before* the native context is destroyed. `F` must be a callable type with the signature `void(const sycl::context&)`. It must be well-formed for a `sycl::context` destructor to call the callback using the form `callback(*this);`. Only one callback may be registered - subsequent calls to this member function will overwrite the previously registered callback. If `F` is `std::nullptr_t`, no callback is registered and any previous callback is destroyed. It is undefined behavior if an instance of any SYCL class with reference semantics (see 4.6.2) is stored in a function object, or captured in the closure of a lambda that is used as a callback.
186+
<center>Table 4.15: Member functions of the context class</center>
187+
188+
##References
189+
190+
[1] JuliaGPU: https://github.com/JuliaGPU/CuArrays.jl/blob/4dedd0fadcf260cf008a9e73d8702e2f259b2cfc/src/blas/CUBLAS.jl#L30
191+
[2] Intel ISAAC: https://github.com/intel/isaac/blob/b0a265ee45337f92f1e8e9f2fb08a057292b0240/lib/driver/dispatch.cpp#L223
192+
[3] ISAAC: https://github.com/ptillet/isaac/blob/8ea6498a841fe1e63a4518797d332f538a9ba37e/lib/driver/dispatch.cpp#L216
193+
[4] ND4J: https://github.com/deeplearning4j/nd4j/blob/8f005bcecb240d1fbb83b9d390ad801d1d3b6933/nd4j-backends/nd4j-backend-impls/nd4j-cuda/src/main/java/org/nd4j/jita/allocator/context/impl/BasicContextPool.java#L43
194+
[5] GADGETRON: https://github.com/gadgetron/gadgetron/blob/fa39a340558a032bea10db07af00fbe6da5ff4cb/toolboxes/core/gpu/CUBLASContextProvider.cpp#L35
195+
[6] ArrayFire: http://arrayfire.org/docs/group__opencl__mat.htm#ga37969cfa49416bbdb25910d15c454d01
196+
[7] OpenVX: https://www.khronos.org/registry/OpenVX/extensions/vx_khr_opencl_interop/1.0/vx_khr_opencl_interop_1_0.html#_vxcreatecontextfromcl
197+
[8] DNNL: https://github.com/intel/mkl-dnn/blob/master/doc/advanced/opencl_interoperability.md#c-api-extensions-for-interoperability-with-opencl-1

0 commit comments

Comments
 (0)