Skip to content

Commit 89050df

Browse files
authored
[SYCL][Doc] Add spec to wait on a device (#20266)
Add a proposed extension specification which allows the application to wait for all commands submitted to a device to complete.
1 parent 5c5b121 commit 89050df

File tree

1 file changed

+201
-0
lines changed

1 file changed

+201
-0
lines changed
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
= sycl_ext_oneapi_device_wait
2+
3+
:source-highlighter: coderay
4+
:coderay-linenums-mode: table
5+
6+
// This section needs to be after the document title.
7+
:doctype: book
8+
:toc2:
9+
:toc: left
10+
:encoding: utf-8
11+
:lang: en
12+
:dpcpp: pass:[DPC++]
13+
:endnote: —{nbsp}end{nbsp}note
14+
15+
// Set the default source code type in this document to C++,
16+
// for syntax highlighting purposes. This is needed because
17+
// docbook uses c++ and html5 uses cpp.
18+
:language: {basebackend@docbook:c++:cpp}
19+
20+
21+
== Notice
22+
23+
[%hardbreaks]
24+
Copyright (C) 2025 Intel Corporation. All rights reserved.
25+
26+
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
27+
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
28+
permission by Khronos.
29+
30+
31+
== Contact
32+
33+
To report problems with this extension, please open a new issue at:
34+
35+
https://github.com/intel/llvm/issues
36+
37+
38+
== Dependencies
39+
40+
This extension is written against the SYCL 2020 revision 10 specification.
41+
All references below to the "core SYCL specification" or to section numbers in
42+
the SYCL specification refer to that revision.
43+
44+
45+
== Status
46+
47+
This is a proposed extension specification, intended to gather community
48+
feedback.
49+
Interfaces defined in this specification may not be implemented yet or may be in
50+
a preliminary state.
51+
The specification itself may also change in incompatible ways before it is
52+
finalized.
53+
*Shipping software products should not rely on APIs defined in this
54+
specification.*
55+
56+
57+
== Overview
58+
59+
This extension adds a way for the host to wait for all commands submitted to a
60+
device to complete.
61+
This functionality is similar to the CUDA API `cudaDeviceSynchronize`.
62+
63+
64+
== Specification
65+
66+
=== Feature test macro
67+
68+
This extension provides a feature-test macro as described in the core SYCL
69+
specification. An implementation supporting this extension must predefine the
70+
macro `SYCL_EXT_ONEAPI_DEVICE_WAIT` to one of the values defined in the table
71+
below. Applications can test for the existence of this macro to determine if
72+
the implementation supports this feature, or applications can test the macro's
73+
value to determine which of the extension's features the implementation
74+
supports.
75+
76+
[%header,cols="1,5"]
77+
|===
78+
|Value
79+
|Description
80+
81+
|1
82+
|The APIs of this experimental extension are not versioned, so the
83+
feature-test macro always has this value.
84+
|===
85+
86+
=== New aspect
87+
88+
This extension adds the following aspect.
89+
90+
[source,c++]
91+
----
92+
namespace sycl {
93+
94+
enum class aspect {
95+
// ...
96+
ext_oneapi_device_wait
97+
};
98+
99+
} // namespace sycl
100+
----
101+
102+
'''
103+
104+
[source,c++]
105+
----
106+
ext_oneapi_device_wait
107+
----
108+
109+
Indicates that the device supports the member functions described below.
110+
111+
'''
112+
113+
=== New member functions for the device class
114+
115+
This extension adds the following member functions to the `device` class.
116+
117+
[source,c++]
118+
----
119+
namespace sycl {
120+
121+
class device {
122+
// ...
123+
void ext_oneapi_wait();
124+
void ext_oneapi_wait_and_throw();
125+
void ext_oneapi_throw_asynchronous();
126+
};
127+
128+
} // namespace sycl
129+
----
130+
131+
'''
132+
133+
[source,c++]
134+
----
135+
void ext_oneapi_wait();
136+
----
137+
138+
_Effects:_ Blocks the calling thread until all commands previously submitted to
139+
any queue on this device have completed.
140+
141+
_Throws:_ A synchronous `exception` with the `errc::feature_not_supported`
142+
error code if the device does not have `aspect::ext_oneapi_device_wait`.
143+
144+
'''
145+
146+
[source,c++]
147+
----
148+
void ext_oneapi_wait_and_throw();
149+
----
150+
151+
_Effects:_ Blocks the calling thread until all commands previously submitted to
152+
any queue on this device have completed.
153+
154+
At least all unconsumed asynchronous errors held by any queue (or its associated
155+
context) on this device are passed to the appropriate async_handler as described
156+
in section 4.13.1.3 "Priorities of async handlers" of the core SYCL
157+
specification.
158+
159+
_Throws:_ A synchronous `exception` with the `errc::feature_not_supported`
160+
error code if the device does not have `aspect::ext_oneapi_device_wait`.
161+
162+
'''
163+
164+
[source,c++]
165+
----
166+
void ext_oneapi_throw_asynchronous();
167+
----
168+
169+
_Effects:_ Checks to see if any unconsumed asynchronous errors have been
170+
produced by any queue (or its associated context) on this device.
171+
If so, they are passed to the appropriate async_handler as described in section
172+
4.13.1.3 "Priorities of async handlers" of the core SYCL specification.
173+
174+
_Throws:_ A synchronous `exception` with the `errc::feature_not_supported`
175+
error code if the device does not have `aspect::ext_oneapi_device_wait`.
176+
177+
'''
178+
179+
180+
== Implementation notes
181+
182+
Note that these functions wait for "commands", which includes host tasks and
183+
memory copy operations.
184+
The implementation and the tests should cover these cases too.
185+
186+
187+
== Issues
188+
189+
* The API described above is implementable on Level Zero.
190+
If we are being pedantic, we cannot easily implement this API on CUDA because
191+
`cudaDeviceSynchronize` waits only for tasks that were submitted to the device
192+
using the current context.
193+
(The current context can only be changed from the CUDA driver API.)
194+
We cannot implement that semantic on Level Zero because Level Zero provides
195+
only a function that waits for all tasks on the device (from all contexts).
196+
If we wanted to support this extension on CUDA in the future, one option would
197+
be to expose the difference to users.
198+
In that case, we'd add an additional aspect and additional APIs that take a
199+
context.
200+
CUDA would support the APIs that take a context and Level Zero would support
201+
the APIs that do not.

0 commit comments

Comments
 (0)