Skip to content

Commit 6d7db0a

Browse files
committed
Merge branch 'master' into ProGTX/multi_ptr_cast
# Conflicts: # README.md
2 parents 259ce47 + 5cd8134 commit 6d7db0a

File tree

10 files changed

+900
-196
lines changed

10 files changed

+900
-196
lines changed

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -51,8 +51,10 @@ Each proposal in the table below will be tagged with one of the following states
5151
| CP009 | [Async Work Group Copy & Prefetch Builtins](async-work-group-copy/index.md) | SYCL 1.2.1 | 07 August 2017 | 07 August 2017 | _Accepted with changes_ |
5252
| CP011 | [Mem Fence Builtins](mem-fence/index.md) | SYCL 1.2.1 | 11 August 2017 | 9 September 2017 | _Accepted_ |
5353
| CP012 | [Data Movement in C++](data-movement/index.md) | ISO C++ SG1, SG14 | 30 May 2017 | 28 August 2017 | _Work in Progress_ |
54-
| CP013 | [Supporting Heterogeneous & Distributed Computing Through Affinity](affinity/index.md) | ISO C++ SG1, SG14 | 15 November 2017 | 12 August 2018 | _Work in Progress_ |
54+
| CP013 | [P1436: Executor properties for affinity-based execution](affinity/index.md) | ISO C++ SG1, SG14, LEWG | 15 November 2017 | 21 January 2019 | _Work in Progress_ |
5555
| CP014 | [Shared Virtual Memory](svm/index.md) | SYCL 2.2 | 22 January 2018 | 22 January 2018 | _Work in Progress_ |
56-
| CP015 | [Specialization Constant](spec-constant/index.md) | SYCL 1.2.1 extension / SYCL 2.2 | 24 April 2018 | 24 April 2018 | _Work in Progress_ |
57-
| CP016 | [Casting multi_ptr pointers](multi_ptr-cast/index.md) | SYCL 1.2.1 extension / SYCL 2.2 | 19 December 2018 | 19 December 2018 | _Work in Progress_ |
58-
| CP019 | [On-chip Memory Allocation](onchip-memory/index.md) | SYCL 1.2.1 extension / SYCL 2.2 | 03 December 2018 | 03 December 2018 | _Work in Progress_ |
56+
| CP015 | [Specialization Constant](spec-constant/index.md) | SYCL 1.2.1 extension | 24 April 2018 | 24 April 2018 | _Work in Progress_ |
57+
| CP017 | [Host Access](host_access/index.md) | SYCL 1.2.1 vendor extension | 17 September 2018 | 13 December 2018 | _Available since CE 1.0.3_ |
58+
| CP018 | [Built-in kernels](builtin_kernels/index.md) | SYCL 1.2.1 vendor extension | 12 October 2018 | 12 October 2018 | _Available since CE 1.0.3_ |
59+
| CP019 | [On-chip Memory Allocation](onchip-memory/index.md) | SYCL 1.2.1 vendor extension | 03 December 2018 | 03 December 2018 | _Available since CE 1.0.3_ |
60+
| CP020 | [Casting multi_ptr pointers](multi_ptr-cast/index.md) | SYCL 1.2.1 vendor extension | 19 December 2018 | 26 March 2019 | _Work in Progress_ |

affinity/cpp-20/d1436r0.md

Lines changed: 562 additions & 0 deletions
Large diffs are not rendered by default.

affinity/cpp-20/d0796r4.md renamed to affinity/cpp-20/d1437r0.md

Lines changed: 121 additions & 176 deletions
Large diffs are not rendered by default.

affinity/index.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,21 @@
1-
# Supporting Heterogeneous & Distributed Computing Through Affinity
1+
# P1436: Executor properties for affinity-based execution
22

33
| | |
44
|---|---|
55
| ID | CP013 |
6-
| Name | Supporting Heterogeneous & Distributed Computing Through Affinity |
7-
| Target | ISO C++ SG1 SG14 |
6+
| Name | Executor properties for affinity-based execution |
7+
| Target | ISO C++ SG1, SG14, LEWG |
88
| Initial creation | 15 November 2017 |
9-
| Last update | 12 August 2018 |
10-
| Reply-to | Michael Wong <michael.wong@codeplay.com> |
9+
| Last update | 21 January 2019 |
10+
| Reply-to | Gordon Brown <gordon@codeplay.com> |
1111
| Original author | Gordon Brown <gordon@codeplay.com> |
12-
| Contributors | Ruyman Reyes <ruyman@codeplay.com>, Michael Wong <michael.wong@codeplay.com>, H. Carter Edwards <hcedwar@sandia.gov>, Thomas Rodgers <rodgert@twrodgers.com> |
12+
| Contributors | Ruyman Reyes <ruyman@codeplay.com>, Michael Wong <michael.wong@codeplay.com>, H. Carter Edwards <hcedwar@sandia.gov>, Thomas Rodgers <rodgert@twrodgers.com>, Mark Hoemmen <mhoemme@sandia.gov> |
1313

1414
## Overview
1515

16-
This paper provides an initial meta-framework for the drives toward memory affinity for C++, given the direction from Toronto 2017 SG1 meeting that we should look towards defining affinity for C++ before looking at inaccessible memory as a solution to the separate memory problem towards supporting heterogeneous and distributed computing.
16+
This paper is the result of a request from SG1 at the 2018 San Diego meeting to split P0796: Supporting Heterogeneous & Distributed Computing Through Affinity [[35]][p0796] into two separate papers, one for the high-level interface and one for the low-level interface. This paper focusses on the high-level interface: a series of properties for querying affinity relationships and requesting affinity on work being executed. [[36]][p1437] focusses on the low-level interface: a mechanism for discovering the topology and affinity properties of a given system.
17+
18+
The aim of this paper is to provide a number of executor properties that if supported allow the user of an executor to query and manipulate the binding of *execution agents* and the underlying *execution resources* of the *threads of execution* they are run on.
1719

1820
## Versions
1921

@@ -23,7 +25,7 @@ This paper provides an initial meta-framework for the drives toward memory affin
2325
| [P0796r1][p0796r1] | _Published_ |
2426
| [D0796r2][p0796r2] | _Published_ |
2527
| [D0796r3][p0796r3] | _Published_ |
26-
| [D0796r4](cpp-20/d0796r4.md) | _Work In Progress_ |
28+
| [DXXX1r0](cpp-20/d1436r0.md) | _Work In Progress_ |
2729

2830
[p0796r0]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0796r0.pdf
2931
[p0796r1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0796r1.pdf

asynchronous-data-flow/sycl-2.2/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ This proposal aims to
4343
contribute the following to the SYCL 2.2 specification:
4444

4545
1. [A description of the expected behaviour of a SYCL program in terms of data dependencies between command groups](01_command_group_requirements_and_actions.md)
46-
2. [How the definition of accessors affects the synchronization of the program](02_memory_consistency.md)
46+
2. [How the definition of accessors affects the synchronization of the program](02_memory_consistence.md)
4747
3. [Additional functionality to interact with data on the host](03_interacting_with_data_on_the_host.md)
4848
4. [Extending the buffer/image interface to provide hints to the runtime](04_update_on_specific_context.md)
4949

builtin_kernels/sycl-1.2.1/builtin_kernels.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
| Date of Creation | 12 October 2018 |
77
| Target | SYCL 1.2.1 vendor extension |
88
| Current Status | _Work In Progress_ |
9-
| Implemented in | _N/A_ |
9+
| Available since | _ComputeCpp CE 1.0.3_ |
1010
| Reply-to | Ruyman Reyes <ruyman@codeplay.com> |
1111
| Original author | Ruyman Reyes <ruyman@codeplay.com> |
1212

host_access/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
| Previous Names | n/a |
77
| Date of Creation | 17 September 2017 |
88
| Target | SYCL 1.2.1 vendor extension |
9-
| Current Status | _Work in Progress_ |
9+
| Current Status | _Work in progress_ |
1010
| Reply-to | Ruyman Reyes <ruyman@codeplay.com> |
1111
| Original author | Ruyman Reyes <ruyman@codeplay.com> |
1212
| Contributors | Ruyman Reyes <ruyman@codeplay.com> |
@@ -20,4 +20,5 @@ See sycl-1.2.1/host_access.md
2020
| Version | Last Modified | Document |
2121
|---------|----- | ---------|
2222
| 0.1 Draft | 15 September 2018 | [Link](sycl-1.2.1/host-access.md) |
23+
| 0.2 Draft | 13 December 2018 | [Link](sycl-1.2.1/host-access.md) |
2324

host_access/sycl-1.2.1/host_access.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
1-
# Temporary buffer
1+
# Restricting host access to SYCL buffers
22

33
| Proposal ID | CP017 |
44
|-------------|--------|
55
| Name | Host access |
66
| Date of Creation | 17 September 2017 |
77
| Target | SYCL 1.2.1 vendor extension |
8-
| Current Status | _Work In Progress_ |
8+
| Current Status | _Implemented_ |
9+
| Available | Since ComputeCpp 1.0.3 CE |
910
| Reply-to | Ruyman Reyes <ruyman@codeplay.com> |
1011
| Original author | Ruyman Reyes <ruyman@codeplay.com> |
1112

@@ -81,7 +82,8 @@ std::for_each(sycl_named_policy<example>(otherQueue),
8182
* performed.
8283
*/
8384
buffer<float, 1> tmp{myRange,
84-
{property::buffer::context_bound(deviceContext), codeplay::property::buffer::host_access(host_access_mode::none)} };
85+
{property::buffer::context_bound(deviceContext),
86+
codeplay::property::buffer::host_access(host_access_mode::none)} };
8587
8688
bool firstIter = true;
8789

interop_task/interop_task.md

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
| Proposal ID | TBC |
2+
|-------------|--------|
3+
| Name | |
4+
| Date of Creation | 16 January 2019 |
5+
| Target | Vendor extension |
6+
| Current Status | _Work in progress_ |
7+
| Reply-to | Victor Lomüller <victor@codeplay.com> |
8+
| Original author | Victor Lomüller <victor@codeplay.com>, Gordon Brown <gordon@codeplay.com>, Peter Zuzek <peter@codeplay.com> |
9+
| Contributors | Victor Lomüller <victor@codeplay.com>, Gordon Brown <gordon@codeplay.com>, Peter Zuzek <peter@codeplay.com> |
10+
11+
# interop_task: Improving SYCL-OpenCL Interoperability
12+
13+
## Motivation
14+
15+
SYCL does not allow a user to access cl_mem object out of an cl::sycl::accessor, it is difficult to integrate low-level API functionality inside the data-flow execution model of SYCL, as the only current way to do this is to create all OpenCL buffers up-front, which is not always possible.
16+
17+
This proposal introduces a way for a user to retrieve the low-level objects associated with SYCL buffers and enqueue a host task that can execute an arbitrary portion of host code within the SYCL runtime, therefore taking advantage of SYCL dependency analysis and scheduling.
18+
19+
## Accessing low-level API functionality on SYCL queues
20+
21+
We introduce a new type of handler, the **codeplay::handler**, which includes a new
22+
**interop\_task** method that enables submission of low-level API code from the host.
23+
By submitting this command group to the SYCL device queue, we guarantee it is
24+
executed in-order w.r.t the other command groups on the same queue.
25+
Simultaneously, we guarantee that this operation is performed
26+
asynchronously w.r.t to the user-thread (therefore, enabling the user
27+
thread to continue submitting command groups).
28+
Other command groups enqueued in the same or different queues
29+
can be executed following the sequential consistency by guaranteeing the
30+
satisfaction of the requisites of this command group.
31+
It is the user's responsibility to ensure the lambda submitted via interop_task does not create race conditions with other command groups or with the host.
32+
33+
The possibility of enqueuing host tasks on SYCL queues also enables the
34+
runtime to perform further optimizations when available.
35+
For example, a SYCL runtime may decide to map / unmap instead of performing copy operations,
36+
or perform asynchronous transfers while data is being computed.
37+
38+
### cl::sycl::codeplay::handler
39+
40+
```cpp
41+
namespace cl {
42+
namespace sycl {
43+
namespace codeplay {
44+
45+
class handler : public cl::sycl::handler {
46+
private:
47+
// implementation defined constructor
48+
handler(__unspecified__);
49+
50+
public:
51+
/* Submit a task with interoperability statements. */
52+
template <typename FunctorT>
53+
void interop_task(FunctorT hostFunction);
54+
};
55+
} // namespace codeplay
56+
} // namespace sycl
57+
} // namespace cl
58+
```
59+
60+
### codeplay::handler::interop_task
61+
62+
The `interop_task` allows users to submit tasks containing C++ statements with low-level API calls (e.g. OpenCL Host API entries).
63+
The command group that encapsulates the task will execute following the usual SYCL dataflow execution rules.
64+
The functor passed to the `interop_task` takes as input a const reference to a `cl::sycl::codeplay::interop_handle`. The handle can be used to retrieve underlying OpenCL objects relative to the execution of the task.
65+
66+
It is not allowed to allocate new SYCL object inside an `interop_task`.
67+
It is the user's responsibility to ensure that all operations performed inside the `interop_task` are finished before returning from it.
68+
69+
Although the statements inside the lambda submitted to the `interop_task` are executed on the host, the requirements and actions for the command group are satisied for the device.
70+
This is the opposite of the `host_handler` vendor extension, where requisites are satisfied for the host since the statements on the lambda submited to the single task are meant to have side effects on the host only.
71+
The interop task lambda can have side effects on the host, but it is the programmer responsability to ensure requirements dont need to be satisfied for the host.
72+
73+
## Accessing low-level API objects
74+
75+
We introduce the `interop_handle` class which provide access to underlying OpenCL objects during the execution of the `interop_task`.
76+
`interop_handle` objects are immutable objects whose purpose is to enable users access to low-level API functionality.
77+
78+
The interface of the `interop_handle` is defined as follow:
79+
```cpp
80+
namespace cl {
81+
namespace sycl {
82+
namespace codeplay {
83+
84+
class interop_handle {
85+
private:
86+
// implementation defined constructor
87+
interop_handle(__unspecified__);
88+
89+
public:
90+
/* Return the context */
91+
cl_context get_context() const;
92+
93+
/* Return the device id */
94+
cl_device_id get_device() const;
95+
96+
/* Return the command queue associated with this task */
97+
cl_command_queue get_queue() const;
98+
99+
/*
100+
Returns the underlying cl_mem object associated with a given accessor
101+
*/
102+
template <typename dataT, int dimensions, access::mode accessmode,
103+
access::target accessTarget,
104+
access::placeholder isPlaceholder>
105+
cl_mem get_buffer(const accessor<dataT, dimensions, accessmode, access::target accessTarget, access::placeholder isPlaceholder>&) const;
106+
};
107+
} // namespace codeplay
108+
} // namespace sycl
109+
} // namespace cl
110+
```
111+
112+
## Example using regular accessor
113+
114+
```cpp
115+
auto cgH = [=] (codeplay::handler& cgh) {
116+
// Get device accessor to SYCL buffer (cannot be dereferenced directly in interop_task).
117+
auto accA = bufA.get_access<access::mode::read>(cgh);
118+
auto accB = bufB.get_access<access::mode::read_write>(cgh);
119+
120+
h.interop_task([=](codeplay::interop_handle &handle) {
121+
third_party_api(handle.get_queue(), // Get the OpenCL command queue to use, can be the fallback
122+
handle.get_buffer(accA), // Get the OpenCL mem object behind accA
123+
handle.get_buffer(accB)); // Get the OpenCL mem object behind accB
124+
// Assumes call has finish when exiting the task
125+
});
126+
};
127+
qA.submit(cgH);
128+
```
129+
130+
This example calls the clFFT library from SYCL using the `interop_task`:
131+
```cpp
132+
#include <stdlib.h>
133+
#include <CL/sycl.hpp>
134+
135+
/* No need to explicitly include the OpenCL headers */
136+
#include <clFFT.h>
137+
138+
int main( void )
139+
{
140+
size_t N = 16;
141+
142+
cl::sycl::queue device_queue;
143+
cl::sycl::buffer<float> X(range<1>(N * 2));
144+
145+
/* Setup clFFT. */
146+
clfftSetupData fftSetup;
147+
err = clfftInitSetupData(&fftSetup);
148+
err = clfftSetup(&fftSetup);
149+
150+
device_queue.submit([=](codeplay::handler& cgh) {
151+
auto X_accessor = X.get_access<access::mode::read_write>(cgh);
152+
h.interop_task([=](codeplay::interop_handle &handle) {
153+
/* FFT library related declarations */
154+
clfftPlanHandle planHandle;
155+
size_t clLengths[1] = {N};
156+
157+
/* Create a default plan for a complex FFT. */
158+
err = clfftCreateDefaultPlan(&planHandle, handle.get_context(), CLFFT_1D, clLengths);
159+
160+
/* Set plan parameters. */
161+
err = clfftSetPlanPrecision(planHandle, CLFFT_SINGLE);
162+
err = clfftSetLayout(planHandle, CLFFT_COMPLEX_INTERLEAVED, CLFFT_COMPLEX_INTERLEAVED);
163+
err = clfftSetResultLocation(planHandle, CLFFT_INPLACE);
164+
165+
/* Bake the plan. */
166+
err = clfftBakePlan(planHandle, 1, &queue, NULL, NULL);
167+
168+
/* Execute the plan. */
169+
cl_command_queue queue = handle.get_queue();
170+
cl_mem X_mem = handle.get_buffer(X_accessor);
171+
err = clfftEnqueueTransform(planHandle, CLFFT_FORWARD,
172+
1, &queue, 0, NULL, NULL,
173+
&X_mem, NULL, NULL);
174+
175+
/* Wait for calculations to finish. */
176+
err = clFinish(queue);
177+
178+
/* Release the plan. */
179+
err = clfftDestroyPlan( &planHandle );
180+
});
181+
});
182+
183+
/* Release clFFT library. */
184+
clfftTeardown( );
185+
186+
return 0;
187+
}
188+
```

onchip-memory/index.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,11 @@
66
| Previous Names | n/a |
77
| Date of Creation | 03 December 2018 |
88
| Target | SYCL 1.2.1 vendor extension |
9-
| Current Status | _Work in Progress_ |
9+
| Current Status | _Work in progress_ |
10+
| Available since | ComputeCpp CE 1.0.3 |
1011
| Reply-to | Gordon Brown <gordon@codeplay.com> |
1112
| Original author | Gordon Brown <gordon@codeplay.com> |
12-
| Contributors | Gordon Brown <gordon@codeplay.com> Ruyman Reyes <ruyman@codeplay.com> |
13+
| Contributors | Gordon Brown <gordon@codeplay.com>, Ruyman Reyes <ruyman@codeplay.com> |
1314

1415
## Overview
1516

@@ -20,4 +21,5 @@ See sycl-1.2.1/onchip-memory.md
2021
| Version | Last Modified | Document |
2122
|---------|----- | ---------|
2223
| 0.1 Draft | 03 December 2018 | [Link](sycl-1.2.1/onchip-memory.md) |
24+
| 0.2 Draft | 13 December 2018 | [Link](sycl-1.2.1/onchip-memory.md) |
2325

0 commit comments

Comments
 (0)