You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: affinity/cpp-20/d0796r2.md
+34-19Lines changed: 34 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,9 +16,9 @@
16
16
17
17
### P0796r2 (RAP)
18
18
19
-
*Introduced a free function for retrieving the execution resource underlying the current thread of execution.
19
+
*Introduce a free function for retrieving the execution resource underlying the current thread of execution.
20
20
* Introduce `this_thread::bind` & `this_thread::unbind` for binding and unbinding a thread of execution to an execution resource.
21
-
* Introduce high-level interface for execution binding via executor properties.
21
+
* Introduce `bulk_execution_affinity` executor properties for specifying affinity binding patterns on bulk execution functions.
22
22
23
23
### P0796r1 (JAX)
24
24
@@ -34,7 +34,9 @@
34
34
35
35
# Abstract
36
36
37
-
This paper provides an initial meta-framework for the drives toward memory affinity for C++. It accounts for feedback from the Toronto 2017 SG1 meeting on Data Movement in C++ [[1]][p0687r0] that we should define affinity for C++ first, before considering inaccessible memory as a solution to the separate memory problem towards supporting heterogeneous and distributed computing.
37
+
This paper provides an initial meta-framework for the drives toward an execution and memory affinity model for C++. It accounts for feedback from the Toronto 2017 SG1 meeting on Data Movement in C++ [[1]][p0687r0] that we should define affinity for C++ first, before considering inaccessible memory as a solution to the separate memory problem towards supporting heterogeneous and distributed computing.
38
+
39
+
This paper is split into two main parts; firstly a series of executor properties which can be used to apply affinity requirements to bulk execution functions, and secondly an interface for discovering the execution resources within the system topology and querying relative affinity of execution resources.
38
40
39
41
# Motivation
40
42
@@ -103,7 +105,7 @@ Some systems give additional user control through explicit binding of threads to
103
105
In this paper we describe the problem space of affinity for C++, the various challenges which need to be addressed in defining a partitioning and affinity interface for C++, and some suggested solutions. These include:
104
106
105
107
* How to represent, identify and navigate the topology of execution resources available within a heterogeneous or distributed system.
106
-
* How to query and measure the relative affininty between different execution resources within a system.
108
+
* How to query and measure the relative affinity between different execution resources within a system.
107
109
* How to bind execution and allocation particular execution resource(s).
108
110
* What kind of and level of interface(s) should be provided by C++ for affinity.
109
111
@@ -120,7 +122,7 @@ The first task in allowing C++ applications to leverage memory locality is to pr
120
122
121
123
The capability of querying underlying *execution resources* of a given *system* is particularly important towards supporting affinity control in C++. The current proposal for executors [[22]][p0443r4] leaves the *execution resource* largely unspecified. This is intentional: *execution resources* will vary greatly between one implementation and another, and it is out of the scope of the current executors proposal to define those. There is current work [[23]][p0737r0] on extending the executors proposal to describe a typical interface for an *execution context*. In this paper a typical *execution context* is defined with an interface for construction and comparison, and for retrieving an *executor*, waiting on submitted work to complete and querying the underlying *execution resource*. Extending the executors interface to provide topology information can serve as a basis for providing a unified interface to expose affinity. This interface cannot mandate a specific architectural definition, and must be generic enough that future architectural evolutions can still be expressed.
122
124
123
-
Two important considerations when defining a unified interface for querying the *resource topology* of a *system*, are (a) what level of abstraction such an interface should have, and (b) at what granularity it should describe the topology's *execution resources*. As both the level of abstraction of an *execution resource* and the granularity that it is described in will vary greatly from one implementation to another, it’s important for the interface to be generic enough to support any level of abstraction. To achieve this we propose a generic hierarchical structure of *execution resources*, each *execution resource* being composed of other *execution resources* recursively. Each *execution resource* within this hierarchy can be used to place memory (i.e., allocate memory within the *execution resource’s* memory region), place execution (i.e. bind an execution to an *execution resource’s execution agents*), or both.
125
+
Two important considerations when defining a unified interface for querying the *resource topology* of a *system*, are (a) what level of abstraction such an interface should have, and (b) at what granularity it should describe the typology's *execution resources*. As both the level of abstraction of an *execution resource* and the granularity that it is described in will vary greatly from one implementation to another, it’s important for the interface to be generic enough to support any level of abstraction. To achieve this we propose a generic hierarchical structure of *execution resources*, each *execution resource* being composed of other *execution resources* recursively. Each *execution resource* within this hierarchy can be used to place memory (i.e., allocate memory within the *execution resource’s* memory region), place execution (i.e. bind an execution to an *execution resource’s execution agents*), or both.
124
126
125
127
For example, a NUMA system will likely have a hierarchy of nodes, each capable of placing memory and placing agents. A system with both CPUs and GPUs (programmable graphics processing units) may have GPU local memory regions capable of placing memory, but not capable of placing agents.
126
128
@@ -178,13 +180,11 @@ In this paper we propose an interface for querying and representing the executio
178
180
179
181
### Interface granularity
180
182
181
-
In this paper we propose both a low-level interface and a high-level interface:
182
-
* The low-level interface consists of mechanisms for discovering detailed information about a system's topology and affinity properties which can be utilized to hand optimise parallel applications and libraries for the best performance. The low-level interface has high granularity and is aimed at users who have a high knowledge of the system architecture.
183
-
* The high-level interface consists of policies which describe desired behavior when using parallel algorithms or libraries. The high-level interface has low granularity and is aimed at users who may have little or no knowledge of the system architecture.
184
-
185
-
## High-level interface
183
+
In this paper is split into two main parts:
184
+
* A series of executor properties describe desired behavior when using parallel algorithms or libraries. These properties provide a low granularity and is aimed at users who may have little or no knowledge of the system architecture.
185
+
* A series of execution resource topology mechanisms for discovering detailed information about the system's topology and affinity properties which can be used to hand optimise parallel applications and libraries for the best performance. These mechanisms provide a high granularity and is aimed at users who have a high knowledge of the system architecture.
186
186
187
-
The high-level interface is a policy-based design which utilizes the executor property mechanism to provide additional affinity based requirements on executors.
187
+
## Executor properties
188
188
189
189
### Bulk execution affinity
190
190
@@ -207,7 +207,7 @@ Below *(Listing 2)* is an example of executing a parallel task over 8 threads us
207
207
```
208
208
*Listing 2: Example of using the bulk_execution_affinity property*
209
209
210
-
## Low-level interface
210
+
## Execution resource topology
211
211
212
212
### Execution resources
213
213
@@ -233,10 +233,6 @@ for (auto res : execution::this_system::get_resources()) {
233
233
```
234
234
*Listing 3: Example of querying all the system level execution resources*
235
235
236
-
### Current resource
237
-
238
-
The `execution_resource` which underlies the current thread of execution can be queried through `this_thread::get_resource`.
239
-
240
236
### Querying relative affinity
241
237
242
238
The `affinity_query` class template provides an abstraction for a relative affinity value between two `execution_resource`s. This value depends on a particular `affinity_operation` and `affinity_metric`. As a result, the `affinity_query` is templated on `affinity_operation` and `affinity_metric`, and is constructed from two `execution_resource`s. An `affinity_query` is not meant to be meaningful on its own. Instead, users are meant to compare two queries with comparison operators, in order to get a relative magnitude of affinity. If necessary, the value of an `affinity_query` can also be queried through `native_affinity`, though the return value of this is implementation defined.
@@ -301,6 +297,10 @@ Only developers that care about resource placement need to care about obtaining
301
297
302
298
If a particular policy or algorithm requires to access placement information, the resources associated with the passed executor can be retrieved via the link to the `execution_context`.
303
299
300
+
### Current resource
301
+
302
+
The `execution_resource` which underlies the current thread of execution can be queried through `this_thread::get_resource`.
303
+
304
304
### Binding to execution
305
305
306
306
A *thread of execution* can be requested to bind to a particular `execution_resource` for a particular *execution agent* by calling `this_thread::bind` if that `execution_resource` is able to place agents. If the current *thread of execution* is successfully bound to the specified `execution_resource` it will return `true` otherwise it will return `false`. If the *thread of execution* is successfully bound to the specified `execution_resource` then `execution_resource` returned by `this_thread::get_resource` must be equal to the `execution_resource` provided to `this_thread::bind`. Subsequently a *thread of execution* can be unbound by calling `this_thread::unbind`.
@@ -420,9 +420,9 @@ A *thread of execution* can be requested to bind to a particular `execution_reso
420
420
421
421
## Bulk execution affinity properties
422
422
423
-
The bulk_execution_affinity_t property describes what guarantees executors provide about the binding of *execution agent*s to the underlying *execution resource*s.
423
+
The `bulk_execution_affinity_t` property describes what guarantees executors provide about the binding of *execution agent*s to the underlying *execution resource*s.
424
424
425
-
bulk_execution_affinity_t provides nested property types and objects as described below.
425
+
bulk_execution_affinity_t provides nested property types and objects as described below. These properties are behavioral properties as described in [[22]][p0443r4] so must adhere to the requirements of behavioral properties and the requirements described below.
426
426
427
427
| Nested Property Type | Nested Property Name | Requirements |
@@ -629,6 +629,21 @@ The free function `this_thread::get_resource` is provided for retrieving the `ex
629
629
630
630
# Future Work
631
631
632
+
## How should we define the execution context?
633
+
634
+
This paper currently defines the execution context as a concrete type which provides the essential interface requires to be constructed from an `execution_resource` and to provide an affinity-based `allocator` or `pmr::memory_resource` and `executor`.
635
+
636
+
However going forward there are a few different directions the execution context could take:
637
+
* A) The execution context could be **the** standard execution context type, which can be used polymorphically in place of any concrete execution context type in a similar way to the polymorphic executor [[22]][p0443r4]. This approach allows it to interoperate well with any concrete execution context type, however it may be very difficult to define exactly what this type should look like as the different kinds of execution contexts are still being developed and all the different requirements are still to be fully understood.
638
+
* B) The execution context could be a concrete executor type itself, used solely for the purpose of being constructed from and managing a set of `execution_resource`s. This approach would allow the execution context to be tailored specific for it's original purpose, however it would be more difficult to support interoperability with other concrete execution context types.
639
+
* C) The execution context could be simply a concept, similar to `OnewayExecutor` or `BulkExecutor`, for executors, where it requires the execution context type to provide the required interface for managing *execution_resource*s. This approach would allow for any concrete execution context type to support necessary interface for managing execution resources by simply implementing the requirements of the concept, and would avoid defining any concrete or generic execution context type.
640
+
641
+
| Straw Poll |
642
+
|------------|
643
+
| Should the execution context be a generic polymorphic execution context, as described above in option A? |
644
+
| Should the execution context be a concrete type specifically for the purpose of managing execution resources, as described above in option B? |
645
+
| Should the execution context be a concept, as described above in option C? |
646
+
632
647
## Who should have control over bulk execution affinity?
633
648
634
649
This paper currently proposes the `bulk_execution_affinity_t` properties and it's nested properties for allowing an *executor* to make guarantees as to how *execution agent*s are bound to the underlying *execution resource*s. However providing control at this level may lead to *execution agent*s being bound to *execution resource*s within a critical path. A possible solution to this is to allow the *execution context* to be configured with `bulk_execution_affinity_t` nested properties, either instead of the *executor* property or in addition. This would allow the binding of *threads of execution* to be performed at the time of the *execution context* creation.
@@ -649,7 +664,7 @@ With the ability to place memory with affinity comes the ability to define algor
649
664
650
665
## Level of abstraction
651
666
652
-
The current proposal provides an interface for querying whether an `execution_resource` can allocate and/or execute work, it can provide the concurrency it supports and it can provide a name. We also provide the `affinity_query` structure for querying the relative affinity metrics between two `execution_resource`s. However, this may not be enough information for users to take full advantage of the system. For example, they may also want to know what kind of memory is available or the properties by which work is executed. We decided that attempting to enumerate the various hardware components would not be ideal, as that would make it harder for implementers to support new hardware. We think a better approach would be to parameterize the additional properties of hardware such that hardware queries could be much more generic.
667
+
The current proposal provides an interface for querying whether an `execution_resource` can allocate and/or execute work, it can provide the concurrency it supports and it can provide a name. We also provide the `affinity_query` structure for querying the relative affinity metrics between two `execution_resource`s. However, this may not be enough information for users to take full advantage of the system. For example, they may also want to know what kind of memory is available or the properties by which work is executed. We decided that attempting to enumerate the various hardware components would not be ideal, as that would make it harder for implementors to support new hardware. We think a better approach would be to parameterize the additional properties of hardware such that hardware queries could be much more generic.
653
668
654
669
We may wish to mirror the design of the executors proposal and have a generic query interface using properties for querying information about an `execution_resource`. We expect that an implementation may provide additional nonstandard, implementation-specific queries.
0 commit comments