Affinity: Minor grammar & code fixes

Mark Hoemmen · Mark Hoemmen · commit e45296018f16 · 2018-09-09T20:28:13.000-06:00
diff --git a/affinity/cpp-20/d0796r3.md b/affinity/cpp-20/d0796r3.md
@@ -41,7 +41,10 @@
 
 This paper provides an initial meta-framework for the drives toward an execution and memory affinity model for C++.  It accounts for feedback from the Toronto 2017 SG1 meeting on Data Movement in C++ [[1]][p0687r0] that we should define affinity for C++ first, before considering inaccessible memory as a solution to the separate memory problem towards supporting heterogeneous and distributed computing.
 
-This paper is split into two main parts; firstly a series of executor properties which can be used to apply affinity requirements to bulk execution functions, and secondly an interface for discovering the execution resources within the system topology and querying relative affinity of execution resources.
+This paper is split into two main parts:
+
+1. A series of executor properties which can be used to apply affinity requirements to bulk execution functions.
+2. An interface for discovering the execution resources within the system topology and querying relative affinity of execution resources.
 
 # Motivation
 
@@ -55,7 +58,7 @@ Operating systems (OSes) traditionally take responsibility for assigning threads
 
 The affinity problem is especially challenging for applications whose behavior changes over time or is hard to predict, or when different applications interfere with each other's performance. Today, most OSes already can group processing units according to their locality and distribute processes, while keeping threads close to the initial thread, or even avoid migrating threads and maintain first touch policy. Nevertheless, most programs can change their work distribution, especially in the presence of nested parallelism.
 
-Frequently, data are initialized at the beginning of the program by the initial thread and are used by multiple threads. While some OSes automatically migrate threads or data for better affinity, migration may have high overhead. In an optimal case, the OS may automatically detect which thread access which data most frequently, or it may replicate data which are read by multiple threads, or migrate data which are modified and used by threads residing on remote locality groups. However, the OS often does a reasonable job, if the machine is not overloaded, if the application carefully used first-touch allocation, and if the program does not change its behavior with respect to locality.
+Frequently, data are initialized at the beginning of the program by the initial thread and are used by multiple threads. While some OSes automatically migrate threads or data for better affinity, migration may have high overhead. In an optimal case, the OS may automatically detect which thread access which data most frequently, or it may replicate data which are read by multiple threads, or migrate data which are modified and used by threads residing on remote locality groups. However, the OS often does a reasonable job, if the machine is not overloaded, if the application carefully uses first-touch allocation, and if the program does not change its behavior with respect to locality.
 
 Consider a code example *(Listing 1)* that uses the C++17 parallel STL algorithm `for_each` to modify the entries of a `valarray` `a`.  The example applies a loop body in a lambda to each entry of the `valarray` `a`, using an execution policy that distributes work in parallel across multiple CPU cores. We might expect this to be fast, but since `valarray` containers are initialized automatically and automatically allocated on the master thread's memory, we find that it is actually quite slow even when we have more than one thread.
 
@@ -65,14 +68,14 @@ Consider a code example *(Listing 1)* that uses the C++17 parallel STL algorithm
 std::valarray<double> a(N);
 
 // Data placement is wrong, so parallel update is slow.
-std::for_each(par, std::begin(a), std::end(a), 
+std::for_each(std::execution::par, std::begin(a), std::end(a), 
               [=] (double& a_i) { a_i *= scalar; });
 	      
 // Use future affinity interface to migrate data at next
 // use and move pages closer to next accessing thread.
 ...
 // Faster, because data are local now.
-std::for_each(par, std::begin(a), std::end(a), 
+std::for_each(std::execution::par, std::begin(a), std::end(a), 
               [=] (double& a_i) { a_i *= scalar; });
 ```
 *Listing 1: Parallel vector update example*