Skip to content

Commit 04e7bab

Browse files
author
Gordon Brown
committed
P0796r2: Final changes
* Make mionor corrections and fixes. * Add note to the wording of the `bulk_execution_affinity_t` nestested properties that requires them to adhere to the requirements of behavioural properties. * Rephrase text to remove use of high-level and low-level terms as can be read ambiguously. * Add straw poll on the execution context design direction.
1 parent 2f2091f commit 04e7bab

File tree

1 file changed

+34
-19
lines changed

1 file changed

+34
-19
lines changed

affinity/cpp-20/d0796r2.md

Lines changed: 34 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@
1616

1717
### P0796r2 (RAP)
1818

19-
* Introduced a free function for retrieving the execution resource underlying the current thread of execution.
19+
* Introduce a free function for retrieving the execution resource underlying the current thread of execution.
2020
* Introduce `this_thread::bind` & `this_thread::unbind` for binding and unbinding a thread of execution to an execution resource.
21-
* Introduce high-level interface for execution binding via executor properties.
21+
* Introduce `bulk_execution_affinity` executor properties for specifying affinity binding patterns on bulk execution functions.
2222

2323
### P0796r1 (JAX)
2424

@@ -34,7 +34,9 @@
3434

3535
# Abstract
3636

37-
This paper provides an initial meta-framework for the drives toward memory affinity for C++. It accounts for feedback from the Toronto 2017 SG1 meeting on Data Movement in C++ [[1]][p0687r0] that we should define affinity for C++ first, before considering inaccessible memory as a solution to the separate memory problem towards supporting heterogeneous and distributed computing.
37+
This paper provides an initial meta-framework for the drives toward an execution and memory affinity model for C++. It accounts for feedback from the Toronto 2017 SG1 meeting on Data Movement in C++ [[1]][p0687r0] that we should define affinity for C++ first, before considering inaccessible memory as a solution to the separate memory problem towards supporting heterogeneous and distributed computing.
38+
39+
This paper is split into two main parts; firstly a series of executor properties which can be used to apply affinity requirements to bulk execution functions, and secondly an interface for discovering the execution resources within the system topology and querying relative affinity of execution resources.
3840

3941
# Motivation
4042

@@ -103,7 +105,7 @@ Some systems give additional user control through explicit binding of threads to
103105
In this paper we describe the problem space of affinity for C++, the various challenges which need to be addressed in defining a partitioning and affinity interface for C++, and some suggested solutions. These include:
104106
105107
* How to represent, identify and navigate the topology of execution resources available within a heterogeneous or distributed system.
106-
* How to query and measure the relative affininty between different execution resources within a system.
108+
* How to query and measure the relative affinity between different execution resources within a system.
107109
* How to bind execution and allocation particular execution resource(s).
108110
* What kind of and level of interface(s) should be provided by C++ for affinity.
109111
@@ -120,7 +122,7 @@ The first task in allowing C++ applications to leverage memory locality is to pr
120122
121123
The capability of querying underlying *execution resources* of a given *system* is particularly important towards supporting affinity control in C++. The current proposal for executors [[22]][p0443r4] leaves the *execution resource* largely unspecified. This is intentional: *execution resources* will vary greatly between one implementation and another, and it is out of the scope of the current executors proposal to define those. There is current work [[23]][p0737r0] on extending the executors proposal to describe a typical interface for an *execution context*. In this paper a typical *execution context* is defined with an interface for construction and comparison, and for retrieving an *executor*, waiting on submitted work to complete and querying the underlying *execution resource*. Extending the executors interface to provide topology information can serve as a basis for providing a unified interface to expose affinity. This interface cannot mandate a specific architectural definition, and must be generic enough that future architectural evolutions can still be expressed.
122124
123-
Two important considerations when defining a unified interface for querying the *resource topology* of a *system*, are (a) what level of abstraction such an interface should have, and (b) at what granularity it should describe the topology's *execution resources*. As both the level of abstraction of an *execution resource* and the granularity that it is described in will vary greatly from one implementation to another, it’s important for the interface to be generic enough to support any level of abstraction. To achieve this we propose a generic hierarchical structure of *execution resources*, each *execution resource* being composed of other *execution resources* recursively. Each *execution resource* within this hierarchy can be used to place memory (i.e., allocate memory within the *execution resource’s* memory region), place execution (i.e. bind an execution to an *execution resource’s execution agents*), or both.
125+
Two important considerations when defining a unified interface for querying the *resource topology* of a *system*, are (a) what level of abstraction such an interface should have, and (b) at what granularity it should describe the typology's *execution resources*. As both the level of abstraction of an *execution resource* and the granularity that it is described in will vary greatly from one implementation to another, it’s important for the interface to be generic enough to support any level of abstraction. To achieve this we propose a generic hierarchical structure of *execution resources*, each *execution resource* being composed of other *execution resources* recursively. Each *execution resource* within this hierarchy can be used to place memory (i.e., allocate memory within the *execution resource’s* memory region), place execution (i.e. bind an execution to an *execution resource’s execution agents*), or both.
124126
125127
For example, a NUMA system will likely have a hierarchy of nodes, each capable of placing memory and placing agents. A system with both CPUs and GPUs (programmable graphics processing units) may have GPU local memory regions capable of placing memory, but not capable of placing agents.
126128
@@ -178,13 +180,11 @@ In this paper we propose an interface for querying and representing the executio
178180
179181
### Interface granularity
180182
181-
In this paper we propose both a low-level interface and a high-level interface:
182-
* The low-level interface consists of mechanisms for discovering detailed information about a system's topology and affinity properties which can be utilized to hand optimise parallel applications and libraries for the best performance. The low-level interface has high granularity and is aimed at users who have a high knowledge of the system architecture.
183-
* The high-level interface consists of policies which describe desired behavior when using parallel algorithms or libraries. The high-level interface has low granularity and is aimed at users who may have little or no knowledge of the system architecture.
184-
185-
## High-level interface
183+
In this paper is split into two main parts:
184+
* A series of executor properties describe desired behavior when using parallel algorithms or libraries. These properties provide a low granularity and is aimed at users who may have little or no knowledge of the system architecture.
185+
* A series of execution resource topology mechanisms for discovering detailed information about the system's topology and affinity properties which can be used to hand optimise parallel applications and libraries for the best performance. These mechanisms provide a high granularity and is aimed at users who have a high knowledge of the system architecture.
186186
187-
The high-level interface is a policy-based design which utilizes the executor property mechanism to provide additional affinity based requirements on executors.
187+
## Executor properties
188188
189189
### Bulk execution affinity
190190
@@ -207,7 +207,7 @@ Below *(Listing 2)* is an example of executing a parallel task over 8 threads us
207207
```
208208
*Listing 2: Example of using the bulk_execution_affinity property*
209209

210-
## Low-level interface
210+
## Execution resource topology
211211

212212
### Execution resources
213213

@@ -233,10 +233,6 @@ for (auto res : execution::this_system::get_resources()) {
233233
```
234234
*Listing 3: Example of querying all the system level execution resources*
235235

236-
### Current resource
237-
238-
The `execution_resource` which underlies the current thread of execution can be queried through `this_thread::get_resource`.
239-
240236
### Querying relative affinity
241237

242238
The `affinity_query` class template provides an abstraction for a relative affinity value between two `execution_resource`s. This value depends on a particular `affinity_operation` and `affinity_metric`. As a result, the `affinity_query` is templated on `affinity_operation` and `affinity_metric`, and is constructed from two `execution_resource`s. An `affinity_query` is not meant to be meaningful on its own. Instead, users are meant to compare two queries with comparison operators, in order to get a relative magnitude of affinity. If necessary, the value of an `affinity_query` can also be queried through `native_affinity`, though the return value of this is implementation defined.
@@ -301,6 +297,10 @@ Only developers that care about resource placement need to care about obtaining
301297

302298
If a particular policy or algorithm requires to access placement information, the resources associated with the passed executor can be retrieved via the link to the `execution_context`.
303299

300+
### Current resource
301+
302+
The `execution_resource` which underlies the current thread of execution can be queried through `this_thread::get_resource`.
303+
304304
### Binding to execution
305305

306306
A *thread of execution* can be requested to bind to a particular `execution_resource` for a particular *execution agent* by calling `this_thread::bind` if that `execution_resource` is able to place agents. If the current *thread of execution* is successfully bound to the specified `execution_resource` it will return `true` otherwise it will return `false`. If the *thread of execution* is successfully bound to the specified `execution_resource` then `execution_resource` returned by `this_thread::get_resource` must be equal to the `execution_resource` provided to `this_thread::bind`. Subsequently a *thread of execution* can be unbound by calling `this_thread::unbind`.
@@ -420,9 +420,9 @@ A *thread of execution* can be requested to bind to a particular `execution_reso
420420

421421
## Bulk execution affinity properties
422422

423-
The bulk_execution_affinity_t property describes what guarantees executors provide about the binding of *execution agent*s to the underlying *execution resource*s.
423+
The `bulk_execution_affinity_t` property describes what guarantees executors provide about the binding of *execution agent*s to the underlying *execution resource*s.
424424

425-
bulk_execution_affinity_t provides nested property types and objects as described below.
425+
bulk_execution_affinity_t provides nested property types and objects as described below. These properties are behavioral properties as described in [[22]][p0443r4] so must adhere to the requirements of behavioral properties and the requirements described below.
426426

427427
| Nested Property Type | Nested Property Name | Requirements |
428428
|----------------------|----------------------|--------------|
@@ -629,6 +629,21 @@ The free function `this_thread::get_resource` is provided for retrieving the `ex
629629

630630
# Future Work
631631

632+
## How should we define the execution context?
633+
634+
This paper currently defines the execution context as a concrete type which provides the essential interface requires to be constructed from an `execution_resource` and to provide an affinity-based `allocator` or `pmr::memory_resource` and `executor`.
635+
636+
However going forward there are a few different directions the execution context could take:
637+
* A) The execution context could be **the** standard execution context type, which can be used polymorphically in place of any concrete execution context type in a similar way to the polymorphic executor [[22]][p0443r4]. This approach allows it to interoperate well with any concrete execution context type, however it may be very difficult to define exactly what this type should look like as the different kinds of execution contexts are still being developed and all the different requirements are still to be fully understood.
638+
* B) The execution context could be a concrete executor type itself, used solely for the purpose of being constructed from and managing a set of `execution_resource`s. This approach would allow the execution context to be tailored specific for it's original purpose, however it would be more difficult to support interoperability with other concrete execution context types.
639+
* C) The execution context could be simply a concept, similar to `OnewayExecutor` or `BulkExecutor`, for executors, where it requires the execution context type to provide the required interface for managing *execution_resource*s. This approach would allow for any concrete execution context type to support necessary interface for managing execution resources by simply implementing the requirements of the concept, and would avoid defining any concrete or generic execution context type.
640+
641+
| Straw Poll |
642+
|------------|
643+
| Should the execution context be a generic polymorphic execution context, as described above in option A? |
644+
| Should the execution context be a concrete type specifically for the purpose of managing execution resources, as described above in option B? |
645+
| Should the execution context be a concept, as described above in option C? |
646+
632647
## Who should have control over bulk execution affinity?
633648

634649
This paper currently proposes the `bulk_execution_affinity_t` properties and it's nested properties for allowing an *executor* to make guarantees as to how *execution agent*s are bound to the underlying *execution resource*s. However providing control at this level may lead to *execution agent*s being bound to *execution resource*s within a critical path. A possible solution to this is to allow the *execution context* to be configured with `bulk_execution_affinity_t` nested properties, either instead of the *executor* property or in addition. This would allow the binding of *threads of execution* to be performed at the time of the *execution context* creation.
@@ -649,7 +664,7 @@ With the ability to place memory with affinity comes the ability to define algor
649664

650665
## Level of abstraction
651666

652-
The current proposal provides an interface for querying whether an `execution_resource` can allocate and/or execute work, it can provide the concurrency it supports and it can provide a name. We also provide the `affinity_query` structure for querying the relative affinity metrics between two `execution_resource`s. However, this may not be enough information for users to take full advantage of the system. For example, they may also want to know what kind of memory is available or the properties by which work is executed. We decided that attempting to enumerate the various hardware components would not be ideal, as that would make it harder for implementers to support new hardware. We think a better approach would be to parameterize the additional properties of hardware such that hardware queries could be much more generic.
667+
The current proposal provides an interface for querying whether an `execution_resource` can allocate and/or execute work, it can provide the concurrency it supports and it can provide a name. We also provide the `affinity_query` structure for querying the relative affinity metrics between two `execution_resource`s. However, this may not be enough information for users to take full advantage of the system. For example, they may also want to know what kind of memory is available or the properties by which work is executed. We decided that attempting to enumerate the various hardware components would not be ideal, as that would make it harder for implementors to support new hardware. We think a better approach would be to parameterize the additional properties of hardware such that hardware queries could be much more generic.
653668

654669
We may wish to mirror the design of the executors proposal and have a generic query interface using properties for querying information about an `execution_resource`. We expect that an implementation may provide additional nonstandard, implementation-specific queries.
655670

0 commit comments

Comments
 (0)