You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: affinity/cpp-20/d0796r2.md
+36-24Lines changed: 36 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@
16
16
17
17
### P0796r2 (RAP)
18
18
19
-
* Introduce `this_thread::bind` & `this_thread::unbind` for binding a thread of execution to an execution resource.
19
+
* Introduce `this_thread::bind` & `this_thread::unbind` for binding and unbinding a thread of execution to an execution resource.
20
20
* Introduce high-level interface for execution binding via executor properties.
21
21
22
22
### P0796r1 (JAX)
@@ -173,21 +173,21 @@ This feature could be easily scaled to heterogeneous and distributed systems, as
173
173
174
174
## Overview
175
175
176
-
In this paper we propose an interface for querying and representing the execution resources within a system, queurying the relative affinity metric between those execution resources, and then using those execution resources to allocate memory and execute work with affinity to the underlying hardware. The interface described in this paper builds on the existing initerface for executors and execution contexts defined in the executors proposal [[22]][p0443r4].
176
+
In this paper we propose an interface for querying and representing the execution resources within a system, queurying the relative affinity metric between those execution resources, and then using those execution resources to allocate memory and execute work with affinity to the underlying hardware. The interface described in this paper builds on the existing interface for executors and execution contexts defined in the executors proposal [[22]][p0443r4].
177
177
178
-
### Interface grandularity
178
+
### Interface granularity
179
179
180
180
In this paper we propose both a low-level interface and a high-level interface:
181
-
* The low-level interface cosnsists of mechanisms for discovering detailed information about a system's topology and affinity properties which can be utilised to hand optimise parallel applications and libraries for the best performance. The low-level interface has high granularity and is aimed at users who have a high knowledge of the system architecture.
182
-
* The high-level interface consists of policies which describe desired behaviour when using parallel algorithms or libraries. The high-level interface has low granularity and is aimed at users who may have little or no knowledge of the system architecture.
181
+
* The low-level interface consists of mechanisms for discovering detailed information about a system's topology and affinity properties which can be utilized to hand optimise parallel applications and libraries for the best performance. The low-level interface has high granularity and is aimed at users who have a high knowledge of the system architecture.
182
+
* The high-level interface consists of policies which describe desired behavior when using parallel algorithms or libraries. The high-level interface has low granularity and is aimed at users who may have little or no knowledge of the system architecture.
183
183
184
184
## High-level interface
185
185
186
-
The high-level interface is a polcy-based design which utilises the executor property mechanism to provide additional affinity based requirements on executors.
186
+
The high-level interface is a policy-based design which utilizes the executor property mechanism to provide additional affinity based requirements on executors.
187
187
188
188
### Bulk execution affinity
189
189
190
-
In this paper we propose an executor property group called `bulk_execution_affinity` which contains the sub properties `none`, `balanced`, `scatter` or `compact`. Each of these properties, if applied to an *executor* enforce a particular guarantee of execution agent binding to the *execution resources* associated with the *executor* in a partuclar pattern:
190
+
In this paper we propose an executor property group called `bulk_execution_affinity` which contains the sub properties `none`, `balanced`, `scatter` or `compact`. Each of these properties, if applied to an *executor* enforce a particular guarantee of execution agent binding to the *execution resources* associated with the *executor* in a particular pattern:
191
191
* **none** makes no guarantee that *execution agents* created by the *executor* will be bound to specific *execution resources*.
192
192
* **balanced** guarantees that *execution agents* created by the executor will be bound to the *execution resources* associated with the *executor* close together in sequence but with an even distribution across the *execution resources*.
193
193
* **scatter** guarantees that *execution agents* created by the executor will be bound to the *execution resources* associated with the *executor* distributed with each *execution agent* far from each other *execution agent* in sequence.
@@ -210,21 +210,23 @@ Below *(Listing 2)* is an example of executing a parallel task over 8 threads us
210
210
```
211
211
*Listing 2: Example of using the bulk_execution_affinity property*
212
212
213
+
> [*Note:* The terms used for the `bulk_execution_affinity` property group are derived from the OpenMP properties [[33]][openmp-affinity] including the Intel specific balanced affinity binding [[[34]][intel-balanced-affinity]*--end note*]
214
+
213
215
## Low-level interface
214
216
215
217
### Execution resources
216
218
217
-
An `execution_resource` is a light weight structure which acts as an identifier to particular piece of hardware within a system. It can be queried for whether it can allocate memory via `can_place_memory` and whether it can execute work via `can_place_agents`, and for it's name via `name`. An `execution_resource` can also represent other `execution_resource`s, these are refered to as being *members of* that `execution_resource` and can be queried via `resources`. Additionally the `execution_resource` which another is a *member of* can be queried vis `member_of`. An `execution_resource` can also be queried for the concurrency it can provide; the total number of *threads of execution* supported by that *execution_resource* and all resources it represents.
219
+
An `execution_resource` is a light weight structure which acts as an identifier to particular piece of hardware within a system. It can be queried for whether it can allocate memory via `can_place_memory` and whether it can execute work via `can_place_agents`, and for it's name via `name`. An `execution_resource` can also represent other `execution_resource`s, these are referred to as being *members of* that `execution_resource` and can be queried via `resources`. Additionally the `execution_resource` which another is a *member of* can be queried vis `member_of`. An `execution_resource` can also be queried for the concurrency it can provide; the total number of *threads of execution* supported by that *execution_resource* and all resources it represents.
218
220
219
221
> [*Note:* Note that an execution resource is not limited to resources which execute work, but also a general resource where no execution can take place but memory can be allocated such as off-chip memory. *--end note*]
220
222
221
223
> [*Note:* The intention is that the actual implementation details of a resource topology are described in an execution context when required. This allows the execution resource objects to be lightweight objects that serve as identifiers that are only referenced. *--end note*]
222
224
223
225
### System topology
224
226
225
-
The system topology is made up of a number of system level `execution_resource`s, which can be queried through `this_system::resource` which returns a `std::vector`. The `execution_resources` available within the system can be initialised dynamically by a runtime library, however must be done so before `main` is called, given that after that point the system topology cannot change.
227
+
The system topology is made up of a number of system level `execution_resource`s, which can be queried through `this_system::resource` which returns a `std::vector`. The `execution_resources` available within the system can be initialized dynamically by a runtime library, however must be done so before `main` is called, given that after that point the system topology cannot change.
226
228
227
-
Below *(Listing 3)* is an example of iterating over the system level resources and priniting out it's capabilities.
229
+
Below *(Listing 3)* is an example of iterating over the system level resources and printing out it's capabilities.
228
230
229
231
```cpp
230
232
for (auto res : execution::this_system::resources()) {
@@ -238,7 +240,7 @@ for (auto res : execution::this_system::resources()) {
238
240
239
241
### Querying relative affinity
240
242
241
-
The `affinity_query` class template provides an abstraction for a relative affinity value between two `execution_resource`s, derived from a particular `affinity_operation` and `affinity_metric`. The `affinity_query` is templated by `affinity_operation` and `affinity_metric` and is constructed from two `execution_resource`s. An `affinity_query` does not mean much on it's own, instead a relative magnitude of affinity can be queried by using comparison operators. If nessesary the value of an `affinity_query` can also be queried through `native_affinity`, though the return value of this is implementation defined.
243
+
The `affinity_query` class template provides an abstraction for a relative affinity value between two `execution_resource`s, derived from a particular `affinity_operation` and `affinity_metric`. The `affinity_query` is templated by `affinity_operation` and `affinity_metric` and is constructed from two `execution_resource`s. An `affinity_query` does not mean much on it's own, instead a relative magnitude of affinity can be queried by using comparison operators. If necessary the value of an `affinity_query` can also be queried through `native_affinity`, though the return value of this is implementation defined.
242
244
243
245
Below *(listing 4)* is an example of how you can query the relative affinity between two `execution_resource`s.
244
246
@@ -302,7 +304,7 @@ If a particular policy or algorithm requires to access placement information, th
302
304
303
305
### Binding to execution
304
306
305
-
A *thread of execution* can be bound to a particular `execution_resource` for a particular *execution agent* by calling `this_thread::bind`. After which point the *execution resource* returned by `this_thread::get_resource` must be equal to the `execution_resource` provided to `this_thread::bind`. Subsequently a *thread of execution* can be unbound by calling `this_thread::unbind`.
307
+
A *thread of execution* can be requested to bind to a particular `execution_resource` for a particular *execution agent* by calling `this_thread::bind` if that `execution_resource` is able to place agents. If the current *thread of execution* is successfully bound to the specified `execution_resource` it will return `true` otherwise it will return `false`. If the *thread of execution* is successfully bound to the specified `execution_resource` then `execution_resource` returned by `this_thread::get_resource` must be equal to the `execution_resource` provided to `this_thread::bind`. Subsequently a *thread of execution* can be unbound by calling `this_thread::unbind`.
306
308
307
309
> [*Note:* Binding *threads of execution* can provide performance benefits when used in a way which compliments the application, however incorrect usage can lead to denial of service and therefore can cause loss of performance. *--end note*]
308
310
@@ -396,8 +398,8 @@ A *thread of execution* can be bound to a particular `execution_resource` for a
396
398
}
397
399
398
400
namespace this_thread {
399
-
bool bind(executon_resource) noexcept;
400
-
bool unbind(executon_resource) noexcept;
401
+
bool bind(execution_resource eR) noexcept;
402
+
bool unbind(execution_resource eR) noexcept;
401
403
}
402
404
403
405
} // execution
@@ -410,7 +412,7 @@ A *thread of execution* can be bound to a particular `execution_resource` for a
410
412
411
413
The `execution_resource` class provides an abstraction over a system's hardware capable to allocate memory, execute light weight execution agents or both. An `execution_resource` can represent further `execution_resource`s, these `execution_resource`s are said to be *members of* this `execution_resource`.
412
414
413
-
> [*Note:* The `execution_resource` is required to be implemented such that the underlying software abstraction is initialised when the `execution_resource` is constructed, maintained through reference counting and cleaned up on destruction of the final reference. *--end note*]
415
+
> [*Note:* The `execution_resource` is required to be implemented such that the underlying software abstraction is initialized when the `execution_resource` is constructed, maintained through reference counting and cleaned up on destruction of the final reference. *--end note*]
414
416
415
417
### `execution_resource` constructors
416
418
@@ -457,13 +459,13 @@ The `execution_resource` class provides an abstraction over a system's hardware
457
459
458
460
## Class `execution_context`
459
461
460
-
The `execution_context` class provides an abstraction for managing a number of light weight execution agents executing work on an `execution_resource` and any `execution_resource`s encapsulated by it. The `execution_resource` which an `execution_context` encapsulates is refered to as the *contained resource*.
462
+
The `execution_context` class provides an abstraction for managing a number of light weight execution agents executing work on an `execution_resource` and any `execution_resource`s encapsulated by it. The `execution_resource` which an `execution_context` encapsulates is referred to as the *contained resource*.
461
463
462
464
### `execution_context` types
463
465
464
466
using executor_type = see-below;
465
467
466
-
*Requires:*`executor_type` is an implementation defined class which satifies the general executor requires, as specified by P0443r5.
468
+
*Requires:*`executor_type` is an implementation defined class which satisfies the general executor requires, as specified by P0443r5.
467
469
468
470
using pmr_memory_resource_type = see-below;
469
471
@@ -547,7 +549,7 @@ The `affinity_query` class template provides an abstraction for a relative affin
*Returns:* An `expected<size_t, error_type>` where,
550
-
* if the affinity query was succesful, the value of type `size_t` represents the magnitude of the relative affinity;
552
+
* if the affinity query was successful, the value of type `size_t` represents the magnitude of the relative affinity;
551
553
* if the affinity query was not successful, the error is an error of type `error_type` which represents the reason for affinity query failed.
552
554
553
555
> [*Note:* An affinity query is permitted to fail if affinity between the two execution resources cannot be calculated for any reason, such as the resources are of different vendors or communication between the resources is not possible. *--end note*]
@@ -556,7 +558,7 @@ The `affinity_query` class template provides an abstraction for a relative affin
556
558
557
559
## Free functions
558
560
559
-
The free function `this_system::resources` is provided for retrieving the `execution_resource`s which encapsulate the hardware platforms available within the system, these are refered to as the *system level resources*.
561
+
The free function `this_system::resources` is provided for retrieving the `execution_resource`s which encapsulate the hardware platforms available within the system, these are referred to as the *system level resources*.
@@ -566,18 +568,22 @@ The free function `this_system::resources` is provided for retrieving the `execu
566
568
567
569
> [*Note:* Returning a `std::vector` allows users to potentially manipulate the container of `execution_resource`s after it is returned, we may want to replace this with an alternative type which is more restrictive at a later date such as a range. *--end note*]
568
570
569
-
The free functions `this_thread::bind` and `this_thread::unbind` are provided for binding / unbinding the current *thread of execution* to / from a particular `execution_reosurce`.
571
+
The free functions `this_thread::bind` and `this_thread::unbind` are provided for binding / unbinding the current *thread of execution* to / from a particular `execution_resource`.
570
572
571
-
bool bind(executon_resource) noexcept;
573
+
bool bind(execution_resource eR) noexcept;
572
574
573
575
*Returns:*`true` if the requested binding was successful, otherwise `false`.
574
576
577
+
*Requires:*`eR.can_place_agents() == true`.
578
+
575
579
*Effects:* If successful, binds the current *thread of execution* to the specified `execution_resource`.
576
580
577
-
bool unbind(executon_resource) noexcept;
581
+
bool unbind(execution_resource eR) noexcept;
578
582
579
583
*Returns:*`true` if the requested unbinding was successful, otherwise `false`.
580
584
585
+
*Requires:*`eR.can_place_agents() == true`.
586
+
581
587
*Effects:* If successful, unbinds the current *thread of execution* from the specified `execution_resource`.
582
588
583
589
# Future Work
@@ -600,7 +606,7 @@ With the ability to place memory with affinity comes the ability to define algor
600
606
601
607
## Level of abstraction
602
608
603
-
The current proposal provides an interface for querying whether an `execution_resource` can allocate and/or execute work, it can provide the concurrency it supports and it can provide a name. We also provide the `affinity_query` structure for querying the relative affinity metrics between two `execution_resource`s. However this may not be enough information for users to take full advance of the system, they may also want to know what kind of memory is available or the properties by which work is executed. It was decided that attempting to enumerate the various hardware components would not be ideal as that would make it harder for implementers to support new hardware. It has been discussed that a better approach would be to parameterise the additional properties of hardware such that hardware queries could be much more generic.
609
+
The current proposal provides an interface for querying whether an `execution_resource` can allocate and/or execute work, it can provide the concurrency it supports and it can provide a name. We also provide the `affinity_query` structure for querying the relative affinity metrics between two `execution_resource`s. However this may not be enough information for users to take full advance of the system, they may also want to know what kind of memory is available or the properties by which work is executed. It was decided that attempting to enumerate the various hardware components would not be ideal as that would make it harder for implementors to support new hardware. It has been discussed that a better approach would be to parameterize the additional properties of hardware such that hardware queries could be much more generic.
604
610
605
611
We may wish to mirror the design of the executors proposal and have a generic query interface using properties for querying information about an `execution_resource`. It’s expected that an implementation may provide additional nonstandard queries that are specific to that implementation.
606
612
@@ -610,7 +616,7 @@ We may wish to mirror the design of the executors proposal and have a generic qu
610
616
611
617
## Dynamic topology discovery
612
618
613
-
The current proposal requires that all `execution_resource`s are initialised before `main` is called, therefore not allowing an `execution_resource` to become available or go offline at runtime. We may wish to support this in the future, however this is outside of the scope of this paper.
619
+
The current proposal requires that all `execution_resource`s are initialized before `main` is called, therefore not allowing an `execution_resource` to become available or go offline at runtime. We may wish to support this in the future, however this is outside of the scope of this paper.
614
620
615
621
| Straw Poll |
616
622
|------------|
@@ -712,3 +718,9 @@ The current proposal requires that all `execution_resource`s are initialised bef
0 commit comments