You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Move the wording of the bulk_execution_affinity properties to the
synopsis and proposed wording.
* Add additional notes discussing the behaviour of the
bulk_execution_affinity properties.
* Remove the straw poll on having a high-level interface.
* Add a new straw poll discussing who should have control over the
bulk_execution_affinity properties.
Copy file name to clipboardExpand all lines: affinity/cpp-20/d0796r2.md
+35-14Lines changed: 35 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -132,7 +132,7 @@ The interface of `thread_execution_resource_t` proposed in the execution context
132
132
133
133
The interface for querying the *resource topology* of a *system* must be flexible enough to allow querying all *execution resources* available under an *execution context*, querying the *execution resources* available to the entire system, and constructing an *execution context* for a particular *execution resource*. This is important, as many standards such as OpenCL [[6]][opencl-2-2] and HSA [[7]][hsa] require the ability to query the *resource topology* available in a *system* before constructing an *execution context* for executing work.
134
134
135
-
> For example, an implementation may provide an execution context for a particular execution resource such as a static thread pool or a GPU context for a particular GPU device, or an implementation may provide a more generic execution context which can be constructed from a number of CPU and GPU devices queryable through the system resource topology.
135
+
> For example, an implementation may provide an execution context for a particular execution resource such as a static thread pool or a GPU context for a particular GPU device, or an implementation may provide a more generic execution context which can be constructed from a number of CPU and GPU devices query-able through the system resource topology.
136
136
137
137
### Topology discovery & fault tolerance
138
138
@@ -188,11 +188,7 @@ The high-level interface is a policy-based design which utilizes the executor pr
188
188
189
189
### Bulk execution affinity
190
190
191
-
In this paper we propose an executor property group called `bulk_execution_affinity` which contains the sub properties `none`, `balanced`, `scatter` or `compact`. Each of these properties, if applied to an *executor* enforce a particular guarantee of execution agent binding to the *execution resources* associated with the *executor* in a particular pattern:
192
-
* **none** makes no guarantee that *execution agents* created by the *executor* will be bound to specific *execution resources*.
193
-
* **balanced** guarantees that *execution agents* created by the executor will be bound to the *execution resources* associated with the *executor* close together in sequence but with an even distribution across the *execution resources*.
194
-
* **scatter** guarantees that *execution agents* created by the executor will be bound to the *execution resources* associated with the *executor* distributed with each *execution agent* far from each other *execution agent* in sequence.
195
-
* **compact** guarantees that *execution agents* created by the executor will be bound to the *execution resources* associated with the *executor* close together in sequence.
191
+
In this paper we propose an executor property group called `bulk_execution_affinity` which contains the nested properties `none`, `balanced`, `scatter` or `compact`. Each of these properties, if applied to an *executor* enforce a particular guarantee of execution agent binding to the *execution resource*s associated with the *executor* in a particular pattern.
196
192
197
193
Below *(Listing 2)* is an example of executing a parallel task over 8 threads using `bulk_execute`, with the affinity binding `bulk_execution_affinity.scatter`.
198
194
@@ -211,8 +207,6 @@ Below *(Listing 2)* is an example of executing a parallel task over 8 threads us
211
207
```
212
208
*Listing 2: Example of using the bulk_execution_affinity property*
213
209
214
-
> [*Note:* The terms used for the `bulk_execution_affinity` property group are derived from the OpenMP properties [[33]][openmp-affinity] including the Intel specific balanced affinity binding [[[34]][intel-balanced-affinity]*--end note*]
215
-
216
210
## Low-level interface
217
211
218
212
### Execution resources
@@ -319,6 +313,12 @@ A *thread of execution* can be requested to bind to a particular `execution_reso
@@ -418,6 +418,25 @@ A *thread of execution* can be requested to bind to a particular `execution_reso
418
418
419
419
*Listing 7: Header synopsis*
420
420
421
+
## Bulk execution affinity properties
422
+
423
+
The bulk_execution_affinity_t property describes what guarantees executors provide about the binding of *execution agent*s to the underlying *execution resource*s.
424
+
425
+
bulk_execution_affinity_t provides nested property types and objects as described below.
426
+
427
+
| Nested Property Type | Nested Property Name | Requirements |
| bulk_execution_affinity_t::none_t | bulk_execution_affinity_t::none | A call to an executor's bulk execution function may or may not bind the the *execution agent*s to the underlying *execution resource*s. The affinity binding pattern may or may not be consistent across invocations of the executor's bulk execution function. |
430
+
| bulk_execution_affinity_t::scatter_t | bulk_execution_scatter_t::scatter | A call to an executor's bulk execution function must bind the *execution agent*s to the underlying *execution resource*s such that they are distributed across the *execution resource*s where each *execution agent* far from it's preceding and following *execution agent*s. The affinity binding pattern must to be consistent across invocations of the executor's bulk execution function. |
431
+
| bulk_execution_affinity_t::compact_t | bulk_execution_compact_t::compact | A call to an executor's bulk execution function must bind the *execution agent*s to the underlying *execution resource*s such that they are in sequence across the *execution resource*s where each *execution agent* close to it's preceding and following *execution agent*s. The affinity binding pattern must be consistent across invocations of the executor's bulk execution function. |
432
+
| bulk_execution_affinity_t::balanced_t | bulk_execution_balanced_t::balanced | A call to an executor's bulk execution function must bind the *execution agent*s to the underlying *execution resource*s such that they are in sequence and evenly spread across the *execution resource*s where each *execution agent* close to it's preceding and following *execution agent*s and all *execution resource*s are utilized. The affinity binding pattern must to be consistent across invocations of the executor's bulk execution function. |
433
+
434
+
> [*Note:* The requirements of the `bulk_execution_affinity_t` nested properties do not enforce a specific binding, simply that the binding follows the requirements set out above and that the pattern is consistent across invocations of the bulk execution functions. *--end note*]
435
+
436
+
> [*Note:* If two *executor*s `e1` and `e2` invoke a bulk execution function in order, where `execution::query(e1, execution::context) == query(e2, execution::context)` is `true` and `execution::query(e1, execution::bulk_execution_affinity) == query(e2, execution::bulk_execution_affinity)` is `false`, this will likely result in `e1` binding *execution agent*s if necessary to achieve the requested affinity pattern and then `e2` rebinding rebinding tp achieve the new affinity pattern. *--end note*]
437
+
438
+
> [*Note:* The terms used for the `bulk_execution_affinity_t` nested properties are derived from the OpenMP properties [[33]][openmp-affinity] including the Intel specific balanced affinity binding [[[34]][intel-balanced-affinity]*--end note*]
439
+
421
440
## Class `execution_resource`
422
441
423
442
The `execution_resource` class provides an abstraction over a system's hardware capable to allocate memory, execute light weight execution agents or both. An `execution_resource` can represent further `execution_resource`s, these `execution_resource`s are said to be *members of* this `execution_resource`.
@@ -610,21 +629,23 @@ The free function `this_thread::get_resource` is provided for retrieving the `ex
610
629
611
630
# Future Work
612
631
613
-
## Migrating data from memory allocated in one partition to another
632
+
## Who should have control over bulk execution affinity?
614
633
615
-
In some cases for performance it is important to bind a memory allocation to a memory region for the duration of an a tasks execution, however in other cases it’s important to be able to migrate the data from one memory region to another. This is outside the scope of this paper, however we would like to investigate this in a future paper.
634
+
This paper currently proposes the `bulk_execution_affinity_t` properties and it's nested properties for allowing an *executor* to make guarantees as to how *execution agent*s are bound to the underlying *execution resource*s. However providing control at this level may lead to *execution agent*s being bound to *execution resource*s within a critical path. A possible solution to this to allow the *execution context* to be configured with `bulk_execution_affinity_t` nested properties, either instead of the *executor* property or in addition. This would allow the binding of *threads of execution* to be performed at the time of the *execution context* creation.
616
635
617
636
| Straw Poll |
618
637
|------------|
619
-
| Should the interface provide a way of migrating data between partitions? |
638
+
| Should the *execution context* be able to manage the binding of all *threads of execution* which it manages using the `bulk_execution_affinity_t` nested properties? |
639
+
| Should the *executor* be able to manage the binding of all *execution agent*s which it manages using the `bulk_execution_affinity_t` nested properties? |
640
+
| Should both the *execution context* and the *executor* be able to manage the binding of *threads of execution* and subsequently *execution agent*s using the `bulk_execution_affinity_t` nested properties? |
620
641
621
-
## Defining memory placement algorithms or policies
642
+
## Migrating data from memory allocated in one partition to another
622
643
623
-
With the ability to place memory with affinity comes the ability to define algorithms or memory policies which describe at a higher level how memory is distributed across large systems. Some examples of these are pinned, first touch and scatter. This is outside the scope of this paper, however we would like to investigate this in a future paper.
644
+
In some cases for performance it is important to bind a memory allocation to a memory region for the duration of an a tasks execution, however in other cases it’s important to be able to migrate the data from one memory region to another. This is outside the scope of this paper, however we would like to investigate this in a future paper.
624
645
625
646
| Straw Poll |
626
647
|------------|
627
-
| Should the interface provide standard algorithms or policies for distributing memory? |
648
+
| Should the interface provide a way of migrating data between partitions? |
0 commit comments