Skip to content

Commit be28285

Browse files
author
Gordon Brown
committed
CP013: Wording improvements.
* Introduce wording to describe when two invocations of `bulk_execute` are expected to have consistent binding. * Introduce wording to describe how `bulk_execute` should handle an execution context failing to provide the guaranteed binding. * Update the example for `bulk_execution_affinity` to demonstrate how to use `prefer` and `query`. * Update the wording of `bulk_execution_affinity.scatter` and `bulk_execution_affinity.balance` to better describe the expected binding pattern. * Add a note about potetial limitations of resource utilization when using `bulk_execution_affinity`.
1 parent 65e9ea3 commit be28285

File tree

3 files changed

+29
-15
lines changed

3 files changed

+29
-15
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ Each proposal in the table below will be tagged with one of the following states
5151
| CP009 | [Async Work Group Copy & Prefetch Builtins](async-work-group-copy/index.md) | SYCL 1.2.1 | 07 August 2017 | 07 August 2017 | _Accepted with changes_ |
5252
| CP011 | [Mem Fence Builtins](mem-fence/index.md) | SYCL 1.2.1 | 11 August 2017 | 9 September 2017 | _Accepted_ |
5353
| CP012 | [Data Movement in C++](data-movement/index.md) | ISO C++ SG1, SG14 | 30 May 2017 | 28 August 2017 | _Work in Progress_ |
54-
| CP013 | [P1436: Executor properties for affinity-based execution](affinity/index.md) | ISO C++ SG1, SG14, LEWG | 15 November 2017 | 21 January 2019 | _Work in Progress_ |
54+
| CP013 | [P1436 & P1437: Papers for affinity-based execution](affinity/index.md) | ISO C++ SG1, SG14, LEWG | 15 November 2017 | 31 March 2019 | _Work in Progress_ |
5555
| CP014 | [Shared Virtual Memory](svm/index.md) | SYCL 2.2 | 22 January 2018 | 22 January 2018 | _Work in Progress_ |
5656
| CP015 | [Specialization Constant](spec-constant/index.md) | SYCL 1.2.1 extension | 24 April 2018 | 24 April 2018 | _Work in Progress_ |
5757
| CP017 | [Host Access](host_access/index.md) | SYCL 1.2.1 vendor extension | 17 September 2018 | 13 December 2018 | _Available since CE 1.0.3_ |

affinity/cpp-23/d1436r1.md

Lines changed: 27 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ Some systems give additional user control through explicit binding of threads to
179179

180180
## Relative affinity of execution resources
181181

182-
In order to make decisions about where to place execution or allocate memory in a given *system’s resource topology*, it is important to understand the concept of affinity between different *execution resources*. This is usually expressed in terms of latency between two resources. Distance does not need to be symmetric in all architectures. The relative position of two components in a system's topology does not necessarily indicate their affinity. For example, two cores from two different CPU sockets may have the same latency to access the same NUMA memory node.
182+
In order to make decisions about where to place execution or allocate memory in a given *system’s resource topology*, it is important to understand the concept of affinity between different hardware and software resources. This is usually expressed in terms of latency between two resources. Distance does not need to be symmetric in all architectures. The relative position of two components in a system's topology does not necessarily indicate their affinity. For example, two cores from two different CPU sockets may have the same latency to access the same NUMA memory node.
183183

184184
This can be scaled to heterogeneous and distributed systems, as the relative affinity between components can apply to discrete heterogeneous and distributed systems as well.
185185

@@ -191,13 +191,13 @@ The initial solution proposed by this paper may only target systems with a singl
191191

192192
## Overview
193193

194-
In this paper we propose an interface for discovering the execution resources within a system, querying the relative affinity metric between those execution resources, and then using those execution resources to allocate memory and execute work with affinity to the underlying hardware those execution resources represent. The interface described in this paper builds on the existing interface for executors and execution contexts defined in the executors proposal [[22]][p0443].
194+
In this paper we propose executor properties that can be used for querying the affinity between different hardware an software resources within a system execute and to require binding of *execution agents* to the underlying hardware or software resources in order to achieve performance through data locality. These properties provide a low granularity and is aimed at users who may have little or no knowledge of the system architecture.
195195

196-
A series of executor properties describe desired behavior when using parallel algorithms or libraries. These properties provide a low granularity and is aimed at users who may have little or no knowledge of the system architecture.
196+
The interface described in this paper builds on the existing interface for executors and execution contexts defined in the executors proposal [[22]][p0443].
197197

198198
## Execution resources
199199

200-
*Execution resources* represent an abstraction of a hardware or software layer that guarantees a particular set of affinity properties, where the level of abstraction is implementation-defined. An implementation is permitted to migrate any underlying resources providing it guarantees the affinity properties remain consistent. This allows freedom for the implementor but also consistency for users.
200+
An *execution resource* represents an abstraction of a hardware or software layer that guarantees a particular set of affinity properties, where the level of abstraction is implementation-defined. An implementation is permitted to migrate any underlying resources providing it guarantees the affinity properties remain consistent. This allows freedom for the implementor but also consistency for users.
201201

202202
If an *execution resource* is valid, then it must always point to the same underlying thing. For example, a *resource* cannot first point to one CPU core, and then suddenly point to a different CPU core. An *execution context* can thus rely on properties like binding of operating system threads to CPU cores. However, the "thing" to which an *execution resource* points may be a dynamic, possibly a software-managed pool of hardware. Here are three examples of this phenomenon:
203203

@@ -252,15 +252,20 @@ We propose an executor property group called `bulk_execution_affinity` which con
252252
253253
### Example
254254
255-
Below is an example *(Listing 4)* of executing a parallel task over 8 threads using `bulk_execute`, with the affinity binding `bulk_execution_affinity.scatter`.
255+
Below is an example *(Listing 4)* of executing a parallel task over 8 threads using `bulk_execute`, with the affinity binding `bulk_execution_affinity.scatter`. We request affinity binding using `prefer` and then check to see if the executor is able to support it using `query`.
256256
257257
```cpp
258258
{
259-
executor exec;
260-
261-
auto affExec = execution::require(exec, execution::bulk,
259+
bulk_executor exec;
260+
261+
auto affExec = execution::prefer(exec,
262262
execution::bulk_execution_affinity.scatter);
263263
264+
if (execution::query(affExec, execution::bulk_execution_affinity.scatter)) {
265+
std::cout << "bulk_execute using bulk_execution_affinity.scatter"
266+
<< std::endl;
267+
}
268+
264269
affExec.bulk_execute([](std::size_t i, shared s) {
265270
func(i);
266271
}, 8, sharedFactory);
@@ -280,17 +285,26 @@ The `bulk_execution_affinity_t` property provides nested property types and obje
280285

281286
| Nested Property Type | Nested Property Name | Requirements |
282287
|----------------------|----------------------|--------------|
283-
| `bulk_execution_affinity_t::none_t` | `bulk_execution_affinity_t::none` | A call to `e.bulk_execute(f, s, sf)` has no requirements on the binding of *execution agents* to the underlying *execution resources*. |
284-
| `bulk_execution_affinity_t::scatter_t` | `bulk_execution_scatter_t::scatter` | A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* such that they are distributed sparsely across the *execution resources*. The affinity binding pattern must be consistent across invocations of the executor's bulk execution function. |
285-
| `bulk_execution_affinity_t::compact_t` | `bulk_execution_compact_t::compact` | A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* such that they are distributed as close as possible to the *execution resource* of the *thread of execution* which created them. The affinity binding pattern must be consistent across invocations of the executor's bulk execution function. |
286-
| bulk_execution_affinity_t::balanced_t | bulk_execution_balanced_t::balanced | A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* such that they are collected into groups, each group is distributed sparsely across the *execution resources* and the *execution agents* within each group are distributed as close as possible to the first *execution resource* of that group. The affinity binding pattern must be consistent across invocations of the executor's bulk execution function. |
288+
| `bulk_execution_affinity_t::none_t` | `bulk_execution_affinity_t::none` | A call to `e.bulk_execute(f, s, sf)` has no requirements on the binding of *execution agents* to the underlying *execution resources*. | |
289+
| `bulk_execution_affinity_t::scatter_t` | `bulk_execution_scatter_t::scatter` | A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* (ordered by physical closeness) such that they are distributed equally across the *execution resources* in a round-robin fashion. <br><br> If the execution context associated with `e` fails to bind the created *execution agents* to the underlying *execution resources* then `bulk_execute` must throw an exception. |
290+
| `bulk_execution_affinity_t::compact_t` | `bulk_execution_compact_t::compact` | A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* such that they are distributed as close as possible to the *execution resource* of the *thread of execution* which created them. <br><br> If the execution context associated with `e` fails to bind the created *execution agents* to the underlying *execution resources* then `bulk_execute` must throw an exception. |
291+
| bulk_execution_affinity_t::balanced_t | bulk_execution_balanced_t::balanced | A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* (ordered by physical closeness) such that they are distributed equally across the *execution resources* in a bin packing fashion. <br><br> If the execution context associated with `e` fails to bind the created *execution agents* to the underlying *execution resources* then `bulk_execute` must throw an exception. |
287292

288293
> [*Note:* The requirements of the `bulk_execution_affinity_t` nested properties do not enforce a specific binding, simply that the binding follows the requirements set out above and that the pattern is consistent across invocations of the bulk execution functions. *--end note*]
289294

290-
> [*Note:* If two *executors* `e1` and `e2` invoke a bulk execution function in order, where `execution::query(e1, execution::context) == query(e2, execution::context)` is `true` and `execution::query(e1, execution::bulk_execution_affinity) == query(e2, execution::bulk_execution_affinity)` is `false`, this will likely result in `e1` binding *execution agents* if necessary to achieve the requested affinity pattern and then `e2` rebinding to achieve the new affinity pattern. Rebinding *execution agents* to *execution resources* may take substantial time and may affect performance of subsequent code. *--end note*]
295+
> [*Note:* It's expected that the default value of `bulk_execution_affinity_t` for most executors be `bulk_execution_affinity_t::none_t`. *--end note*]
291296
292297
> [*Note:* The terms used for the `bulk_execution_affinity_t` nested properties are derived from the OpenMP properties [[33]][openmp-affinity] including the Intel specific balanced affinity binding [[[34]][intel-balanced-affinity] *--end note*]
293298
299+
For any two invocations; `e1.bulk_execute(f1, s1, sf1)` and `e2.bulk_execute(f2, s2, sf2)`, the binding of *execution agents* to the underlying *execution resources* must be consistent, if:
300+
* `e1 == e2`,
301+
* `execution::query(e1, execution::bulk_execution_affinity) != execution::bulk_execution_affinity.none`, and
302+
* `s1 == s2`.
303+
304+
> [*Note:* If you have two invocation of `bulk_execute` where the binding of *execution agents* to the underlying *execution resources* is guaranteed to be consistent, this can lead to limitations of resource utilization. *--end note*]
305+
306+
> [*Note:* If two *executors* `e1` and `e2` invoke a bulk execution function in order, where `execution::query(e1, execution::context) == query(e2, execution::context)` is `true` and `execution::query(e1, execution::bulk_execution_affinity) == query(e2, execution::bulk_execution_affinity)` is `false`, this will likely result in `e1` binding *execution agents* if necessary to achieve the requested affinity pattern and then `e2` rebinding to achieve the new affinity pattern. Rebinding *execution agents* to *execution resources* may take substantial time and may affect performance of subsequent code. *--end note*]
307+
294308
## Concurrency property
295309
296310
We propose a query-only executor property called `concurrency_t` which returns the maximum potential concurrency available to *executor*.

affinity/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# P1436: Executor properties for affinity-based execution
1+
# P1436 & P1437: Papers for affinity-based execution
22

33
| | |
44
|---|---|

0 commit comments

Comments
 (0)