You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,7 +51,7 @@ Each proposal in the table below will be tagged with one of the following states
51
51
| CP009 |[Async Work Group Copy & Prefetch Builtins](async-work-group-copy/index.md)| SYCL 1.2.1 | 07 August 2017 | 07 August 2017 |_Accepted with changes_|
52
52
| CP011 |[Mem Fence Builtins](mem-fence/index.md)| SYCL 1.2.1 | 11 August 2017 | 9 September 2017 |_Accepted_|
53
53
| CP012 |[Data Movement in C++](data-movement/index.md)| ISO C++ SG1, SG14 | 30 May 2017 | 28 August 2017 |_Work in Progress_|
54
-
| CP013 |[P1436: Executor properties for affinity-based execution](affinity/index.md)| ISO C++ SG1, SG14, LEWG | 15 November 2017 |21 January 2019 |_Work in Progress_|
54
+
| CP013 |[P1436 & P1437: Papers for affinity-based execution](affinity/index.md)| ISO C++ SG1, SG14, LEWG | 15 November 2017 |31 March 2019 |_Work in Progress_|
55
55
| CP014 |[Shared Virtual Memory](svm/index.md)| SYCL 2.2 | 22 January 2018 | 22 January 2018 |_Work in Progress_|
56
56
| CP015 |[Specialization Constant](spec-constant/index.md)| SYCL 1.2.1 extension | 24 April 2018 | 24 April 2018 |_Work in Progress_|
57
57
| CP017 |[Host Access](host_access/index.md)| SYCL 1.2.1 vendor extension | 17 September 2018 | 13 December 2018 |_Available since CE 1.0.3_|
Copy file name to clipboardExpand all lines: affinity/cpp-23/d1436r1.md
+35-13Lines changed: 35 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,6 +17,14 @@
17
17
18
18
### P1436r1 (COL 2019)
19
19
20
+
* Introduce wording to clarify when two invocations of bulk_execute
21
+
are expected to have consistent binding.
22
+
* Introduce wording to describe how bulk_execute should handle an
23
+
execution context failing to provide the guaranteed binding.
24
+
* Update the wording of bulk_execution_affinity.scatter and
25
+
bulk_execution_affinity.balance to better describe the expected
26
+
binding pattern.
27
+
20
28
### P1436r0 (KON 2019)
21
29
22
30
* Separation of high-level features from P0796r3 [[35]][p0796].
@@ -179,7 +187,7 @@ Some systems give additional user control through explicit binding of threads to
179
187
180
188
## Relative affinity of execution resources
181
189
182
-
In order to make decisions about where to place execution or allocate memory in a given *system’s resource topology*, it is important to understand the concept of affinity between different *execution resources*. This is usually expressed in terms of latency between two resources. Distance does not need to be symmetric in all architectures. The relative position of two components in a system's topology does not necessarily indicate their affinity. For example, two cores from two different CPU sockets may have the same latency to access the same NUMA memory node.
190
+
In order to make decisions about where to place execution or allocate memory in a given *system’s resource topology*, it is important to understand the concept of affinity between different hardware and software resources. This is usually expressed in terms of latency between two resources. Distance does not need to be symmetric in all architectures. The relative position of two components in a system's topology does not necessarily indicate their affinity. For example, two cores from two different CPU sockets may have the same latency to access the same NUMA memory node.
183
191
184
192
This can be scaled to heterogeneous and distributed systems, as the relative affinity between components can apply to discrete heterogeneous and distributed systems as well.
185
193
@@ -191,13 +199,13 @@ The initial solution proposed by this paper may only target systems with a singl
191
199
192
200
## Overview
193
201
194
-
In this paper we propose an interface for discovering the execution resources within a system, querying the relative affinity metric between those execution resources, and then using those execution resources to allocate memory and execute work with affinity to the underlying hardware those execution resources represent. The interface described in this paper builds on the existing interface for executors and execution contexts defined in the executors proposal [[22]][p0443].
202
+
In this paper we propose executor properties that can be used for querying the affinity between different hardware and software resources within a system available that are available to executors and to require binding of *execution agents*to the underlying hardware or software resources in order to achieve performance through data locality. These properties provide a low granularity and is aimed at users who may have little or no knowledge of the system architecture.
195
203
196
-
A series of executor properties describe desired behavior when using parallel algorithms or libraries. These properties provide a low granularity and is aimed at users who may have little or no knowledge of the system architecture.
204
+
The interface described in this paper builds on the existing interface for executors and execution contexts defined in the executors proposal [[22]][p0443].
197
205
198
206
## Execution resources
199
207
200
-
*Execution resources* represent an abstraction of a hardware or software layer that guarantees a particular set of affinity properties, where the level of abstraction is implementation-defined. An implementation is permitted to migrate any underlying resources providing it guarantees the affinity properties remain consistent. This allows freedom for the implementor but also consistency for users.
208
+
An *execution resource* represents an abstraction of a hardware or software layer that guarantees a particular set of affinity properties, where the level of abstraction is implementation-defined. An implementation is permitted to migrate any underlying resources providing it guarantees the affinity properties remain consistent. This allows freedom for the implementor but also consistency for users.
201
209
202
210
If an *execution resource* is valid, then it must always point to the same underlying thing. For example, a *resource* cannot first point to one CPU core, and then suddenly point to a different CPU core. An *execution context* can thus rely on properties like binding of operating system threads to CPU cores. However, the "thing" to which an *execution resource* points may be a dynamic, possibly a software-managed pool of hardware. Here are three examples of this phenomenon:
203
211
@@ -252,15 +260,20 @@ We propose an executor property group called `bulk_execution_affinity` which con
252
260
253
261
### Example
254
262
255
-
Below is an example *(Listing 4)* of executing a parallel task over 8 threads using `bulk_execute`, with the affinity binding `bulk_execution_affinity.scatter`.
263
+
Below is an example *(Listing 4)* of executing a parallel task over 8 threads using `bulk_execute`, with the affinity binding `bulk_execution_affinity.scatter`. We request affinity binding using `prefer` and then check to see if the executor is able to support it using `query`.
256
264
257
265
```cpp
258
266
{
259
-
executor exec;
260
-
261
-
auto affExec = execution::require(exec, execution::bulk,
267
+
bulk_executor exec;
268
+
269
+
auto affExec = execution::prefer(exec,
262
270
execution::bulk_execution_affinity.scatter);
263
271
272
+
if (execution::query(affExec, execution::bulk_execution_affinity.scatter)) {
273
+
std::cout << "bulk_execute using bulk_execution_affinity.scatter"
274
+
<< std::endl;
275
+
}
276
+
264
277
affExec.bulk_execute([](std::size_t i, shared s) {
265
278
func(i);
266
279
}, 8, sharedFactory);
@@ -280,17 +293,26 @@ The `bulk_execution_affinity_t` property provides nested property types and obje
280
293
281
294
| Nested Property Type | Nested Property Name | Requirements |
|`bulk_execution_affinity_t::none_t`|`bulk_execution_affinity_t::none`| A call to `e.bulk_execute(f, s, sf)` has no requirements on the binding of *execution agents* to the underlying *execution resources*. |
284
-
|`bulk_execution_affinity_t::scatter_t`|`bulk_execution_scatter_t::scatter`| A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* such that they are distributed sparsely across the *execution resources*. The affinity binding pattern must be consistent across invocations of the executor's bulk execution function. |
285
-
|`bulk_execution_affinity_t::compact_t`|`bulk_execution_compact_t::compact`| A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* such that they are distributed as close as possible to the *execution resource* of the *thread of execution* which created them. The affinity binding pattern must be consistent across invocations of the executor's bulk execution function. |
286
-
| bulk_execution_affinity_t::balanced_t | bulk_execution_balanced_t::balanced | A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* such that they are collected into groups, each group is distributed sparsely across the *execution resources*and the *execution agents* within each group are distributed as close as possible to the first*execution resource* of that group. The affinity binding pattern must be consistent across invocations of the executor's bulk execution function. |
296
+
|`bulk_execution_affinity_t::none_t`|`bulk_execution_affinity_t::none`| A call to `e.bulk_execute(f, s, sf)` has no requirements on the binding of *execution agents* to the underlying *execution resources*. ||
297
+
|`bulk_execution_affinity_t::scatter_t`|`bulk_execution_scatter_t::scatter`| A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources*(ordered by physical closeness) such that they are distributed equally across the *execution resources* in a round-robin fashion. <br><br> If the execution context associated with `e` fails to bind the created *execution agents* to the underlying *execution resources* then `bulk_execute` must throw an exception. |
298
+
|`bulk_execution_affinity_t::compact_t`|`bulk_execution_compact_t::compact`| A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* such that they are distributed as close as possible to the *execution resource* of the *thread of execution* which created them. <br><br> If the execution context associated with `e` fails to bind the created *execution agents* to the underlying *execution resources* then `bulk_execute` must throw an exception. |
299
+
| bulk_execution_affinity_t::balanced_t | bulk_execution_balanced_t::balanced | A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources*(ordered by physical closeness) such that they are distributed equally across the *execution resources*in a bin packing fashion. <br><br> If the execution context associated with `e` fails to bind the created*execution agents* to the underlying *execution resources* then `bulk_execute`must throw an exception. |
287
300
288
301
> [*Note:* The requirements of the `bulk_execution_affinity_t` nested properties donot enforce a specific binding, simply that the binding follows the requirements set out above and that the pattern is consistent across invocations of the bulk execution functions. *--end note*]
289
302
290
-
> [*Note:* If two *executors* `e1` and `e2` invoke a bulk execution function in order, where `execution::query(e1, execution::context) == query(e2, execution::context)` is `true` and `execution::query(e1, execution::bulk_execution_affinity) == query(e2, execution::bulk_execution_affinity)` is `false`, this will likely result in `e1` binding *execution agents* if necessary to achieve the requested affinity pattern and then `e2` rebinding to achieve the new affinity pattern. Rebinding *execution agents* to *execution resources* may take substantial time and may affect performance of subsequent code. *--end note*]
303
+
> [*Note:* It's expected that the default value of `bulk_execution_affinity_t` for most executors be `bulk_execution_affinity_t::none_t`. *--end note*]
291
304
292
305
> [*Note:* The terms used for the `bulk_execution_affinity_t` nested properties are derived from the OpenMP properties [[33]][openmp-affinity] including the Intel specific balanced affinity binding [[[34]][intel-balanced-affinity] *--end note*]
293
306
307
+
For any two invocations; `e1.bulk_execute(f1, s1, sf1)` and `e2.bulk_execute(f2, s2, sf2)`, the binding of *execution agents* to the underlying *execution resources* must be consistent, if:
308
+
* `e1 == e2`,
309
+
* `execution::query(e1, execution::bulk_execution_affinity) != execution::bulk_execution_affinity.none`, and
310
+
* `s1 == s2`.
311
+
312
+
> [*Note:* If you have two invocation of `bulk_execute` where the binding of *execution agents* to the underlying *execution resources* is guaranteed to be consistent, this can lead to limitations of resource utilization. *--end note*]
313
+
314
+
> [*Note:* If two *executors* `e1` and `e2` invoke a bulk execution function in order, where `execution::query(e1, execution::context) == query(e2, execution::context)` is `true` and `execution::query(e1, execution::bulk_execution_affinity) == query(e2, execution::bulk_execution_affinity)` is `false`, this will likely result in `e1` binding *execution agents* if necessary to achieve the requested affinity pattern and then `e2` rebinding to achieve the new affinity pattern. Rebinding *execution agents* to *execution resources* may take substantial time and may affect performance of subsequent code. *--end note*]
315
+
294
316
## Concurrency property
295
317
296
318
We propose a query-only executor property called `concurrency_t` which returns the maximum potential concurrency available to *executor*.
0 commit comments