You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: affinity/cpp-20/d0796r2.md
+76-11Lines changed: 76 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,11 @@
14
14
15
15
# Changelog
16
16
17
+
### P0796r2 (RAP)
18
+
19
+
* Introduce `this_thread::bind` & `this_thread::unbind` for binding a thread of execution to an execution resource.
20
+
* Introduce high-level interface for execution binding via executor properties.
21
+
17
22
### P0796r1 (JAX)
18
23
19
24
* Introduce proposed wording.
@@ -170,6 +175,43 @@ This feature could be easily scaled to heterogeneous and distributed systems, as
170
175
171
176
In this paper we propose an interface for querying and representing the execution resources within a system, queurying the relative affinity metric between those execution resources, and then using those execution resources to allocate memory and execute work with affinity to the underlying hardware. The interface described in this paper builds on the existing initerface for executors and execution contexts defined in the executors proposal [[22]][p0443r4].
172
177
178
+
### Interface grandularity
179
+
180
+
In this paper we propose both a low-level interface and a high-level interface:
181
+
* The low-level interface cosnsists of mechanisms for discovering detailed information about a system's topology and affinity properties which can be utilised to hand optimise parallel applications and libraries for the best performance. The low-level interface has high granularity and is aimed at users who have a high knowledge of the system architecture.
182
+
* The high-level interface consists of policies which describe desired behaviour when using parallel algorithms or libraries. The high-level interface has low granularity and is aimed at users who may have little or no knowledge of the system architecture.
183
+
184
+
## High-level interface
185
+
186
+
The high-level interface is a polcy-based design which utilises the executor property mechanism to provide additional affinity based requirements on executors.
187
+
188
+
### Bulk execution affinity
189
+
190
+
In this paper we propose an executor property group called `bulk_execution_affinity` which contains the sub properties `none`, `balanced`, `scatter` or `compact`. Each of these properties, if applied to an *executor* enforce a particular guarantee of execution agent binding to the *execution resources* associated with the *executor* in a partuclar pattern:
191
+
* **none** makes no guarantee that *execution agents* created by the *executor* will be bound to specific *execution resources*.
192
+
* **balanced** guarantees that *execution agents* created by the executor will be bound to the *execution resources* associated with the *executor* close together in sequence but with an even distribution across the *execution resources*.
193
+
* **scatter** guarantees that *execution agents* created by the executor will be bound to the *execution resources* associated with the *executor* distributed with each *execution agent* far from each other *execution agent* in sequence.
194
+
* **compact** guarantees that *execution agents* created by the executor will be bound to the *execution resources* associated with the *executor* close together in sequence.
195
+
196
+
Below *(Listing 2)* is an example of executing a parallel task over 8 threads using `bulk_execute`, with the affinity binding `bulk_execution_affinity.scatter`.
197
+
198
+
```cpp
199
+
{
200
+
auto exec = executionContext.executor();
201
+
202
+
auto affExec = execution::require(exec, execution::bulk,
203
+
execution::bulk_execution_affinity.scatter);
204
+
205
+
affExec.bulk_execute([](std::size_t i, shared s) {
206
+
func(i);
207
+
}, 8, sharedFactory);
208
+
}
209
+
210
+
```
211
+
*Listing 2: Example of using the bulk_execution_affinity property*
212
+
213
+
## Low-level interface
214
+
173
215
### Execution resources
174
216
175
217
An `execution_resource` is a light weight structure which acts as an identifier to particular piece of hardware within a system. It can be queried for whether it can allocate memory via `can_place_memory` and whether it can execute work via `can_place_agents`, and for it's name via `name`. An `execution_resource` can also represent other `execution_resource`s, these are refered to as being *members of* that `execution_resource` and can be queried via `resources`. Additionally the `execution_resource` which another is a *member of* can be queried vis `member_of`. An `execution_resource` can also be queried for the concurrency it can provide; the total number of *threads of execution* supported by that *execution_resource* and all resources it represents.
@@ -182,7 +224,7 @@ An `execution_resource` is a light weight structure which acts as an identifier
182
224
183
225
The system topology is made up of a number of system level `execution_resource`s, which can be queried through `this_system::resource` which returns a `std::vector`. The `execution_resources` available within the system can be initialised dynamically by a runtime library, however must be done so before `main` is called, given that after that point the system topology cannot change.
184
226
185
-
Below *(Listing 2)* is an example of iterating over the system level resources and priniting out it's capabilities.
227
+
Below *(Listing 3)* is an example of iterating over the system level resources and priniting out it's capabilities.
186
228
187
229
```cpp
188
230
for (auto res : execution::this_system::resources()) {
@@ -192,13 +234,13 @@ for (auto res : execution::this_system::resources()) {
192
234
std::cout << res.concurrency() << `\n`;
193
235
}
194
236
```
195
-
*Listing 2: Example of querying all the system level execution resources*
237
+
*Listing 3: Example of querying all the system level execution resources*
196
238
197
239
### Querying relative affinity
198
240
199
241
The `affinity_query` class template provides an abstraction for a relative affinity value between two `execution_resource`s, derived from a particular `affinity_operation` and `affinity_metric`. The `affinity_query` is templated by `affinity_operation` and `affinity_metric` and is constructed from two `execution_resource`s. An `affinity_query` does not mean much on it's own, instead a relative magnitude of affinity can be queried by using comparison operators. If nessesary the value of an `affinity_query` can also be queried through `native_affinity`, though the return value of this is implementation defined.
200
242
201
-
Below *(listing 3)* is an example of how you can query the relative affinity between two `execution_resource`s.
243
+
Below *(listing 4)* is an example of how you can query the relative affinity between two `execution_resource`s.
202
244
203
245
```cpp
204
246
auto systemLevelResources = execution::this_system::resources();
@@ -212,15 +254,15 @@ auto relativeLatency02 = execution::affinity_query<execution::affinity_operation
212
254
213
255
auto relativeLatency = relativeLatency01 > relativeLatency02;
214
256
```
215
-
*Listing 3: Example of querying affinity between two `execution_resource`s.*
257
+
*Listing 4: Example of querying affinity between two `execution_resource`s.*
216
258
217
259
> [*Note:* This interface for querying relative affinity is a very low-level interface designed to be abstracted by libraries and later affinity policies. *--end note*]
218
260
219
261
### Execution context
220
262
221
263
The `execution_context` class provides an abstraction for managing a number of light weight execution agents executing work on an `execution_resource` and any `execution_resource`s encapsulated by it. An `execution_context` can then provide an executor for executing work and an allocator or polymorphic memory resource for allocating memory. The `execution_context` is constructed with an `execution_resource`, the `execution_context` then executes work or allocates memory for that `execution_resource` and an `execution_resource` that it represents.
222
264
223
-
Below *(Listing 4)* is an example of how this extended interface could be used to construct an *execution context* from an *execution resource* which is retrieved from the *system’s resource topology*. Once an *execution context* is constructed it can then still be queried for its *execution resource* and then that *execution resource* can be further partitioned.
265
+
Below *(Listing 5)* is an example of how this extended interface could be used to construct an *execution context* from an *execution resource* which is retrieved from the *system’s resource topology*. Once an *execution context* is constructed it can then still be queried for its *execution resource* and then that *execution resource* can be further partitioned.
224
266
225
267
```cpp
226
268
auto &resources = execution::this_system::resources();
@@ -235,11 +277,9 @@ for (auto res : systelLevelResource.resources()) {
235
277
std::cout << res.name() << `\n`;
236
278
}
237
279
```
238
-
*Listing 4: Example of constructing an execution context from an execution resource*
280
+
*Listing 5: Example of constructing an execution context from an execution resource*
239
281
240
-
### Binding execution and allocation to resources
241
-
242
-
When creating an `execution_context` from a given `execution_resource`, the executors and allocators associated with it are bound to that `execution_resource`. For example: when creating an `execution_resource` from a CPU socket resource, all executors associated with the given socket will spawn execution agents with affinity to the socket partition of the system *(Listing 5)*.
282
+
When creating an `execution_context` from a given `execution_resource`, the executors and allocators associated with it are bound to that `execution_resource`. For example: when creating an `execution_resource` from a CPU socket resource, all executors associated with the given socket will spawn execution agents with affinity to the socket partition of the system *(Listing 6)*.
243
283
244
284
```cpp
245
285
auto cList = std::execution::this_system::resources();
@@ -252,14 +292,20 @@ auto socketAllocator = eC.allocator(); // Retrieve an allocator to the closest m
*Listing 5: Example of allocating with affinity to an execution resource*
295
+
*Listing 6: Example of allocating with affinity to an execution resource*
256
296
257
297
The construction of an `execution_context` on a component implies affinity (where possible) to the given resource. This guarantees that all executors created from that `execution_context` can access the resources and the internal data structures requires to guarantee the placement of the processor.
258
298
259
299
Only developers that care about resource placement need to care about obtaining executors and allocations from the correct `execution_context` object. Existing code for vectors and STL (including the Parallel STL interface) remains unaffected.
260
300
261
301
If a particular policy or algorithm requires to access placement information, the resources associated with the passed executor can be retrieved via the link to the `execution_context`.
262
302
303
+
### Binding to execution
304
+
305
+
A *thread of execution* can be bound to a particular `execution_resource` for a particular *execution agent* by calling `this_thread::bind`. After which point the *execution resource* returned by `this_thread::get_resource` must be equal to the `execution_resource` provided to `this_thread::bind`. Subsequently a *thread of execution* can be unbound by calling `this_thread::unbind`.
306
+
307
+
> [*Note:* Binding *threads of execution* can provide performance benefits when used in a way which compliments the application, however incorrect usage can lead to denial of service and therefore can cause loss of performance. *--end note*]
308
+
263
309
## Header `<execution>` synopsis
264
310
265
311
namespace std {
@@ -349,11 +395,16 @@ If a particular policy or algorithm requires to access placement information, th
@@ -515,6 +566,20 @@ The free function `this_system::resources` is provided for retrieving the `execu
515
566
516
567
> [*Note:* Returning a `std::vector` allows users to potentially manipulate the container of `execution_resource`s after it is returned, we may want to replace this with an alternative type which is more restrictive at a later date such as a range. *--end note*]
517
568
569
+
The free functions `this_thread::bind` and `this_thread::unbind` are provided for binding / unbinding the current *thread of execution* to / from a particular `execution_reosurce`.
570
+
571
+
bool bind(executon_resource) noexcept;
572
+
573
+
*Returns:*`true` if the requested binding was successful, otherwise `false`.
574
+
575
+
*Effects:* If successful, binds the current *thread of execution* to the specified `execution_resource`.
576
+
577
+
bool unbind(executon_resource) noexcept;
578
+
579
+
*Returns:*`true` if the requested unbinding was successful, otherwise `false`.
580
+
581
+
*Effects:* If successful, unbinds the current *thread of execution* from the specified `execution_resource`.
582
+
518
583
# Future Work
519
584
520
585
## Migrating data from memory allocated in one partition to another
0 commit comments