Affinity: Remove "parallel" in "parallel (execution,launch,region)"

Mark Hoemmen · Mark Hoemmen · commit 3803052d5a75 · 2018-09-09T18:31:49.000-06:00
Executors don't have to be about parallel execution.  Thus, remove
references to "parallel execution," "parallel launch," and "parallel
region."  The phrase "parallel region" is an OpenMP-ism that would need
definition anyway, and is better expressed using phrases like "the code
that would execute on ${HARDWARE}."
diff --git a/affinity/cpp-20/d0796r3.md b/affinity/cpp-20/d0796r3.md
@@ -54,7 +54,7 @@ The affinity problem is especially challenging for applications whose behavior c
 
 Frequently, data are initialized at the beginning of the program by the initial thread and are used by multiple threads. While some OSes automatically migrate threads or data for better affinity, migration may have high overhead. In an optimal case, the OS may automatically detect which thread access which data most frequently, or it may replicate data which are read by multiple threads, or migrate data which are modified and used by threads residing on remote locality groups. However, the OS often does a reasonable job, if the machine is not overloaded, if the application carefully used first-touch allocation, and if the program does not change its behavior with respect to locality.
 
-Consider a code example *(Listing 1)* that uses the C++17 parallel STL algorithm `for_each` to modify the entries of a `valarray` `a`.  The example applies a loop body in a lambda to each entry of the `valarray` `a`, using a parallel execution policy that distributes work in parallel across multiple CPU cores. We might expect this to be fast, but since `valarray` containers are initialized automatically and automatically allocated on the master thread's memory, we find that it is actually quite slow even when we have more than one thread. 
+Consider a code example *(Listing 1)* that uses the C++17 parallel STL algorithm `for_each` to modify the entries of a `valarray` `a`.  The example applies a loop body in a lambda to each entry of the `valarray` `a`, using an execution policy that distributes work in parallel across multiple CPU cores. We might expect this to be fast, but since `valarray` containers are initialized automatically and automatically allocated on the master thread's memory, we find that it is actually quite slow even when we have more than one thread.
 
 ```cpp
 // C++ valarray STL containers are initialized automatically.
@@ -164,7 +164,7 @@ The *resource* objects returned from the *topology discovery interface* are opaq
 
 The lifetime of a *resource* instance refers to both validity and uniqueness. First, if a *resource* instance exists, does it point to a valid underlying hardware or software resource? That is, could an instance's validity ever change at run time? Second, could a *resource* instance ever point to a different (but still valid) underlying resource? It suffices for now to define "point to a valid underlying resource" informally. We will elaborate this idea later in this proposal.
 
-Creation of a *context* expresses intent to use the *resource*, not just to view it as part of the *resource topology*. Thus, if a *resource* could ever cease to point to a valid underlying resource, then users must not be allowed to create a *context* from the resource instance, or launch parallel executions with that context. *Context* construction, and use of an *executor* with that *context* to launch a parallel execution, both assert validity of the *context*'s *resource*.
+Creation of a *context* expresses intent to use the *resource*, not just to view it as part of the *resource topology*. Thus, if a *resource* could ever cease to point to a valid underlying resource, then users must not be allowed to create a *context* from the resource instance, or launch executions with that context. *Context* construction, and use of an *executor* with that *context* to launch an execution, both assert validity of the *context*'s *resource*.
 
 If a *resource* is valid, then it must always point to the same underlying thing. For example, a *resource* cannot first point to one CPU core, and then suddenly point to a different CPU core. *Contexts* can thus rely on properties like binding of operating system threads to CPU cores. However, the "thing" to which a *resource* points may be a dynamic, possibly software-managed pool of hardware. Here are three examples of this phenomenon:
 
@@ -182,9 +182,9 @@ We should not assume that *resource* instances have the same lifetime as the run
 
 We considered mandating that *execution resources* use reference counting, just like `shared_ptr`. This would clearly define resources' lifetimes. However, there are several arguments against requiring reference counting. 
 
-   1. Holding a reference to the *execution resource* would prevent parallel execution from shutting down, thus (potentially) deadlocking the program.
-   2. Not all kinds of *resources* may have lifetimes that fit reference counting semantics. Some kinds of GPU *resources* only exist during parallel execution, for example; those *resources* cannot be valid if they escape the parallel region. In general, programming models that let a "host" processor launch code on a "different processor" have this issue.
-   3. Reference counting could have unattractive overhead if accessed concurrently, especially if code wants to traverse a particular subset of the *resource topology* inside a parallel region (e.g., to access GPU scratch memory).
+   1. Holding a reference to the *execution resource* would prevent  execution from shutting down, thus (potentially) deadlocking the program.
+   2. Not all kinds of *resources* may have lifetimes that fit reference counting semantics. Some kinds of GPU *resources* only exist during  execution, for example; those *resources* cannot be valid if they escape the scope of code that executes on the GPU. In general, programming models that let a "host" processor launch code on a "different processor" have this issue.
+   3. Reference counting could have unattractive overhead if accessed concurrently, especially if code wants to traverse a particular subset of the *resource topology* inside a region executing on the GPU (e.g., to access GPU scratch memory).
    4. Since users can construct arbitrary data structures from *resources* in a *resource hierarchy*, the proposal would need another *resource* type analogous to `weak_ptr`, in order to avoid circular dependencies that could prevent releasing *resources*.
    5. There is no type currently in the Standard that has reference-counting semantics, but does not have `shared_` in its name (e.g., `shared_ptr` and `shared_future`). Adding a type like this sets a bad precedent for types with hidden costs and correctness issues (see (4)).
 
@@ -194,7 +194,7 @@ Here, we elaborate on what it means for a *resource* to be "valid." This proposa
 
    1. It is implementation defined whether any subset of the *resource topology* reflects the current state of the *system*, or just a "snapshot." Ability to iterate a *resource*'s children in the *resource topology* need not imply ability to create a *context* from that *resource*. This may even vary between subsets of the *resource topology*. 
    2. *Context* creation asserts *resource* validity.
-   3. Use of a *context* to launch parallel execution asserts *resource* validity.
+   3. Use of a *context* to launch execution asserts *resource* validity.
 
  Here is a concrete example. Suppose that company "Aleph" makes an accelerator that can be viewed as a *resource*, and that has its own child *resources*. Users must call `Aleph_initialize()` in order to see the accelerator and its children as *resources* in the *resource topology*. Users must call `Aleph_finalize()` when they are done using the accelerator. 
 
@@ -210,7 +210,7 @@ Here, we elaborate on what it means for a *resource* to be "valid." This proposa
    1. Nothing bad may happen. Users must be able to iterate past an invalidated *resource*. If users are iterating a *resource* R's children and one child becomes invalid, that must not invalidate R or the iterators to its children.
    2. Iterating the children after invalidation of the parent must not be undefined behavior, but the child *resources* remain invalid. Attempts to view and iterate the children of the child *resources* may (but need not) fail. 
    3. *Context* creation asserts *resource* validity. If the *resource* is invalid, *context* creation must fail.  (Compare to how MPI functions report an error if they are called after `MPI_Finalize` has been called on that process.)
-   4. Use of a *context* in an *executor* to launch parallel execution asserts *resource* validity, and must thus fail if the *resource* is not longer valid.
+   4. Use of a *context* in an *executor* to launch execution asserts *resource* validity, and must thus fail if the *resource* is not longer valid.
 
 ### Querying the relative affinity of partitions
 
@@ -230,7 +230,7 @@ In this paper we propose an interface for querying and representing the executio
 
 In this paper is split into two main parts:
 * A series of executor properties describe desired behavior when using parallel algorithms or libraries. These properties provide a low granularity and is aimed at users who may have little or no knowledge of the system architecture.
-* A series of execution resource topology mechanisms for discovering detailed information about the system's topology and affinity properties which can be used to hand optimise parallel applications and libraries for the best performance. These mechanisms provide a high granularity and is aimed at users who have a high knowledge of the system architecture.
+* A series of execution resource topology mechanisms for discovering detailed information about the system's topology and affinity properties which can be used to hand optimize parallel applications and libraries for the best performance. These mechanisms provide a high granularity and is aimed at users who have a high knowledge of the system architecture.
 
 ## Executor properties