Skip to content

Commit de6e026

Browse files
authored
Merge pull request #92 from AerialMantis/CP013-wording-improvements
CP013: Wording improvements.
2 parents 725cbbe + bcc4191 commit de6e026

File tree

3 files changed

+37
-15
lines changed

3 files changed

+37
-15
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ Each proposal in the table below will be tagged with one of the following states
5151
| CP009 | [Async Work Group Copy & Prefetch Builtins](async-work-group-copy/index.md) | SYCL 1.2.1 | 07 August 2017 | 07 August 2017 | _Accepted with changes_ |
5252
| CP011 | [Mem Fence Builtins](mem-fence/index.md) | SYCL 1.2.1 | 11 August 2017 | 9 September 2017 | _Accepted_ |
5353
| CP012 | [Data Movement in C++](data-movement/index.md) | ISO C++ SG1, SG14 | 30 May 2017 | 28 August 2017 | _Work in Progress_ |
54-
| CP013 | [P1436: Executor properties for affinity-based execution](affinity/index.md) | ISO C++ SG1, SG14, LEWG | 15 November 2017 | 21 January 2019 | _Work in Progress_ |
54+
| CP013 | [P1436 & P1437: Papers for affinity-based execution](affinity/index.md) | ISO C++ SG1, SG14, LEWG | 15 November 2017 | 31 March 2019 | _Work in Progress_ |
5555
| CP014 | [Shared Virtual Memory](svm/index.md) | SYCL 2.2 | 22 January 2018 | 22 January 2018 | _Work in Progress_ |
5656
| CP015 | [Specialization Constant](spec-constant/index.md) | SYCL 1.2.1 extension | 24 April 2018 | 24 April 2018 | _Work in Progress_ |
5757
| CP017 | [Host Access](host_access/index.md) | SYCL 1.2.1 vendor extension | 17 September 2018 | 13 December 2018 | _Available since CE 1.0.3_ |

affinity/cpp-23/d1436r1.md

Lines changed: 35 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,14 @@
1717

1818
### P1436r1 (COL 2019)
1919

20+
* Introduce wording to clarify when two invocations of bulk_execute
21+
are expected to have consistent binding.
22+
* Introduce wording to describe how bulk_execute should handle an
23+
execution context failing to provide the guaranteed binding.
24+
* Update the wording of bulk_execution_affinity.scatter and
25+
bulk_execution_affinity.balance to better describe the expected
26+
binding pattern.
27+
2028
### P1436r0 (KON 2019)
2129

2230
* Separation of high-level features from P0796r3 [[35]][p0796].
@@ -179,7 +187,7 @@ Some systems give additional user control through explicit binding of threads to
179187

180188
## Relative affinity of execution resources
181189

182-
In order to make decisions about where to place execution or allocate memory in a given *system’s resource topology*, it is important to understand the concept of affinity between different *execution resources*. This is usually expressed in terms of latency between two resources. Distance does not need to be symmetric in all architectures. The relative position of two components in a system's topology does not necessarily indicate their affinity. For example, two cores from two different CPU sockets may have the same latency to access the same NUMA memory node.
190+
In order to make decisions about where to place execution or allocate memory in a given *system’s resource topology*, it is important to understand the concept of affinity between different hardware and software resources. This is usually expressed in terms of latency between two resources. Distance does not need to be symmetric in all architectures. The relative position of two components in a system's topology does not necessarily indicate their affinity. For example, two cores from two different CPU sockets may have the same latency to access the same NUMA memory node.
183191

184192
This can be scaled to heterogeneous and distributed systems, as the relative affinity between components can apply to discrete heterogeneous and distributed systems as well.
185193

@@ -191,13 +199,13 @@ The initial solution proposed by this paper may only target systems with a singl
191199

192200
## Overview
193201

194-
In this paper we propose an interface for discovering the execution resources within a system, querying the relative affinity metric between those execution resources, and then using those execution resources to allocate memory and execute work with affinity to the underlying hardware those execution resources represent. The interface described in this paper builds on the existing interface for executors and execution contexts defined in the executors proposal [[22]][p0443].
202+
In this paper we propose executor properties that can be used for querying the affinity between different hardware and software resources within a system available that are available to executors and to require binding of *execution agents* to the underlying hardware or software resources in order to achieve performance through data locality. These properties provide a low granularity and is aimed at users who may have little or no knowledge of the system architecture.
195203

196-
A series of executor properties describe desired behavior when using parallel algorithms or libraries. These properties provide a low granularity and is aimed at users who may have little or no knowledge of the system architecture.
204+
The interface described in this paper builds on the existing interface for executors and execution contexts defined in the executors proposal [[22]][p0443].
197205

198206
## Execution resources
199207

200-
*Execution resources* represent an abstraction of a hardware or software layer that guarantees a particular set of affinity properties, where the level of abstraction is implementation-defined. An implementation is permitted to migrate any underlying resources providing it guarantees the affinity properties remain consistent. This allows freedom for the implementor but also consistency for users.
208+
An *execution resource* represents an abstraction of a hardware or software layer that guarantees a particular set of affinity properties, where the level of abstraction is implementation-defined. An implementation is permitted to migrate any underlying resources providing it guarantees the affinity properties remain consistent. This allows freedom for the implementor but also consistency for users.
201209

202210
If an *execution resource* is valid, then it must always point to the same underlying thing. For example, a *resource* cannot first point to one CPU core, and then suddenly point to a different CPU core. An *execution context* can thus rely on properties like binding of operating system threads to CPU cores. However, the "thing" to which an *execution resource* points may be a dynamic, possibly a software-managed pool of hardware. Here are three examples of this phenomenon:
203211

@@ -252,15 +260,20 @@ We propose an executor property group called `bulk_execution_affinity` which con
252260
253261
### Example
254262
255-
Below is an example *(Listing 4)* of executing a parallel task over 8 threads using `bulk_execute`, with the affinity binding `bulk_execution_affinity.scatter`.
263+
Below is an example *(Listing 4)* of executing a parallel task over 8 threads using `bulk_execute`, with the affinity binding `bulk_execution_affinity.scatter`. We request affinity binding using `prefer` and then check to see if the executor is able to support it using `query`.
256264
257265
```cpp
258266
{
259-
executor exec;
260-
261-
auto affExec = execution::require(exec, execution::bulk,
267+
bulk_executor exec;
268+
269+
auto affExec = execution::prefer(exec,
262270
execution::bulk_execution_affinity.scatter);
263271
272+
if (execution::query(affExec, execution::bulk_execution_affinity.scatter)) {
273+
std::cout << "bulk_execute using bulk_execution_affinity.scatter"
274+
<< std::endl;
275+
}
276+
264277
affExec.bulk_execute([](std::size_t i, shared s) {
265278
func(i);
266279
}, 8, sharedFactory);
@@ -280,17 +293,26 @@ The `bulk_execution_affinity_t` property provides nested property types and obje
280293

281294
| Nested Property Type | Nested Property Name | Requirements |
282295
|----------------------|----------------------|--------------|
283-
| `bulk_execution_affinity_t::none_t` | `bulk_execution_affinity_t::none` | A call to `e.bulk_execute(f, s, sf)` has no requirements on the binding of *execution agents* to the underlying *execution resources*. |
284-
| `bulk_execution_affinity_t::scatter_t` | `bulk_execution_scatter_t::scatter` | A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* such that they are distributed sparsely across the *execution resources*. The affinity binding pattern must be consistent across invocations of the executor's bulk execution function. |
285-
| `bulk_execution_affinity_t::compact_t` | `bulk_execution_compact_t::compact` | A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* such that they are distributed as close as possible to the *execution resource* of the *thread of execution* which created them. The affinity binding pattern must be consistent across invocations of the executor's bulk execution function. |
286-
| bulk_execution_affinity_t::balanced_t | bulk_execution_balanced_t::balanced | A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* such that they are collected into groups, each group is distributed sparsely across the *execution resources* and the *execution agents* within each group are distributed as close as possible to the first *execution resource* of that group. The affinity binding pattern must be consistent across invocations of the executor's bulk execution function. |
296+
| `bulk_execution_affinity_t::none_t` | `bulk_execution_affinity_t::none` | A call to `e.bulk_execute(f, s, sf)` has no requirements on the binding of *execution agents* to the underlying *execution resources*. | |
297+
| `bulk_execution_affinity_t::scatter_t` | `bulk_execution_scatter_t::scatter` | A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* (ordered by physical closeness) such that they are distributed equally across the *execution resources* in a round-robin fashion. <br><br> If the execution context associated with `e` fails to bind the created *execution agents* to the underlying *execution resources* then `bulk_execute` must throw an exception. |
298+
| `bulk_execution_affinity_t::compact_t` | `bulk_execution_compact_t::compact` | A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* such that they are distributed as close as possible to the *execution resource* of the *thread of execution* which created them. <br><br> If the execution context associated with `e` fails to bind the created *execution agents* to the underlying *execution resources* then `bulk_execute` must throw an exception. |
299+
| bulk_execution_affinity_t::balanced_t | bulk_execution_balanced_t::balanced | A call to `e.bulk_execute(f, s, sf)` must bind the created *execution agents* to the underlying *execution resources* (ordered by physical closeness) such that they are distributed equally across the *execution resources* in a bin packing fashion. <br><br> If the execution context associated with `e` fails to bind the created *execution agents* to the underlying *execution resources* then `bulk_execute` must throw an exception. |
287300

288301
> [*Note:* The requirements of the `bulk_execution_affinity_t` nested properties do not enforce a specific binding, simply that the binding follows the requirements set out above and that the pattern is consistent across invocations of the bulk execution functions. *--end note*]
289302

290-
> [*Note:* If two *executors* `e1` and `e2` invoke a bulk execution function in order, where `execution::query(e1, execution::context) == query(e2, execution::context)` is `true` and `execution::query(e1, execution::bulk_execution_affinity) == query(e2, execution::bulk_execution_affinity)` is `false`, this will likely result in `e1` binding *execution agents* if necessary to achieve the requested affinity pattern and then `e2` rebinding to achieve the new affinity pattern. Rebinding *execution agents* to *execution resources* may take substantial time and may affect performance of subsequent code. *--end note*]
303+
> [*Note:* It's expected that the default value of `bulk_execution_affinity_t` for most executors be `bulk_execution_affinity_t::none_t`. *--end note*]
291304
292305
> [*Note:* The terms used for the `bulk_execution_affinity_t` nested properties are derived from the OpenMP properties [[33]][openmp-affinity] including the Intel specific balanced affinity binding [[[34]][intel-balanced-affinity] *--end note*]
293306
307+
For any two invocations; `e1.bulk_execute(f1, s1, sf1)` and `e2.bulk_execute(f2, s2, sf2)`, the binding of *execution agents* to the underlying *execution resources* must be consistent, if:
308+
* `e1 == e2`,
309+
* `execution::query(e1, execution::bulk_execution_affinity) != execution::bulk_execution_affinity.none`, and
310+
* `s1 == s2`.
311+
312+
> [*Note:* If you have two invocation of `bulk_execute` where the binding of *execution agents* to the underlying *execution resources* is guaranteed to be consistent, this can lead to limitations of resource utilization. *--end note*]
313+
314+
> [*Note:* If two *executors* `e1` and `e2` invoke a bulk execution function in order, where `execution::query(e1, execution::context) == query(e2, execution::context)` is `true` and `execution::query(e1, execution::bulk_execution_affinity) == query(e2, execution::bulk_execution_affinity)` is `false`, this will likely result in `e1` binding *execution agents* if necessary to achieve the requested affinity pattern and then `e2` rebinding to achieve the new affinity pattern. Rebinding *execution agents* to *execution resources* may take substantial time and may affect performance of subsequent code. *--end note*]
315+
294316
## Concurrency property
295317
296318
We propose a query-only executor property called `concurrency_t` which returns the maximum potential concurrency available to *executor*.

affinity/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# P1436: Executor properties for affinity-based execution
1+
# P1436 & P1437: Papers for affinity-based execution
22

33
| | |
44
|---|---|

0 commit comments

Comments
 (0)