1717The C11 memory model is fundamentally about trying to bridge the gap between the
1818semantics we want, the optimizations compilers want, and the inconsistent chaos
1919our hardware wants. * We* would like to just write programs and have them do
20- exactly what we said but, you know, * fast* . Wouldn't that be great?
20+ exactly what we said but, you know, fast. Wouldn't that be great?
2121
2222
2323
@@ -35,20 +35,20 @@ y = 3;
3535x = 2;
3636```
3737
38- The compiler may conclude that it would * really * be best if your program did
38+ The compiler may conclude that it would be best if your program did
3939
4040``` rust,ignore
4141x = 2;
4242y = 3;
4343```
4444
45- This has inverted the order of events * and* completely eliminated one event.
45+ This has inverted the order of events and completely eliminated one event.
4646From a single-threaded perspective this is completely unobservable: after all
4747the statements have executed we are in exactly the same state. But if our
48- program is multi-threaded, we may have been relying on ` x ` to * actually* be
49- assigned to 1 before ` y ` was assigned. We would * really * like the compiler to be
48+ program is multi-threaded, we may have been relying on ` x ` to actually be
49+ assigned to 1 before ` y ` was assigned. We would like the compiler to be
5050able to make these kinds of optimizations, because they can seriously improve
51- performance. On the other hand, we'd really like to be able to depend on our
51+ performance. On the other hand, we'd also like to be able to depend on our
5252program * doing the thing we said* .
5353
5454
@@ -57,15 +57,15 @@ program *doing the thing we said*.
5757# Hardware Reordering
5858
5959On the other hand, even if the compiler totally understood what we wanted and
60- respected our wishes, our * hardware* might instead get us in trouble. Trouble
60+ respected our wishes, our hardware might instead get us in trouble. Trouble
6161comes from CPUs in the form of memory hierarchies. There is indeed a global
6262shared memory space somewhere in your hardware, but from the perspective of each
6363CPU core it is * so very far away* and * so very slow* . Each CPU would rather work
64- with its local cache of the data and only go through all the * anguish* of
65- talking to shared memory * only* when it doesn't actually have that memory in
64+ with its local cache of the data and only go through all the anguish of
65+ talking to shared memory only when it doesn't actually have that memory in
6666cache.
6767
68- After all, that's the whole * point* of the cache, right? If every read from the
68+ After all, that's the whole point of the cache, right? If every read from the
6969cache had to run back to shared memory to double check that it hadn't changed,
7070what would the point be? The end result is that the hardware doesn't guarantee
7171that events that occur in the same order on * one* thread, occur in the same
@@ -99,13 +99,13 @@ provides weak ordering guarantees. This has two consequences for concurrent
9999programming:
100100
101101* Asking for stronger guarantees on strongly-ordered hardware may be cheap or
102- even * free* because they already provide strong guarantees unconditionally.
102+ even free because they already provide strong guarantees unconditionally.
103103 Weaker guarantees may only yield performance wins on weakly-ordered hardware.
104104
105- * Asking for guarantees that are * too* weak on strongly-ordered hardware is
105+ * Asking for guarantees that are too weak on strongly-ordered hardware is
106106 more likely to * happen* to work, even though your program is strictly
107- incorrect. If possible, concurrent algorithms should be tested on weakly-
108- ordered hardware.
107+ incorrect. If possible, concurrent algorithms should be tested on
108+ weakly- ordered hardware.
109109
110110
111111
@@ -115,10 +115,10 @@ programming:
115115
116116The C11 memory model attempts to bridge the gap by allowing us to talk about the
117117* causality* of our program. Generally, this is by establishing a * happens
118- before* relationships between parts of the program and the threads that are
118+ before* relationship between parts of the program and the threads that are
119119running them. This gives the hardware and compiler room to optimize the program
120120more aggressively where a strict happens-before relationship isn't established,
121- but forces them to be more careful where one * is * established. The way we
121+ but forces them to be more careful where one is established. The way we
122122communicate these relationships are through * data accesses* and * atomic
123123accesses* .
124124
@@ -130,8 +130,10 @@ propagate the changes made in data accesses to other threads as lazily and
130130inconsistently as it wants. Mostly critically, data accesses are how data races
131131happen. Data accesses are very friendly to the hardware and compiler, but as
132132we've seen they offer * awful* semantics to try to write synchronized code with.
133- Actually, that's too weak. * It is literally impossible to write correct
134- synchronized code using only data accesses* .
133+ Actually, that's too weak.
134+
135+ ** It is literally impossible to write correct synchronized code using only data
136+ accesses.**
135137
136138Atomic accesses are how we tell the hardware and compiler that our program is
137139multi-threaded. Each atomic access can be marked with an * ordering* that
@@ -141,7 +143,10 @@ they *can't* do. For the compiler, this largely revolves around re-ordering of
141143instructions. For the hardware, this largely revolves around how writes are
142144propagated to other threads. The set of orderings Rust exposes are:
143145
144- * Sequentially Consistent (SeqCst) Release Acquire Relaxed
146+ * Sequentially Consistent (SeqCst)
147+ * Release
148+ * Acquire
149+ * Relaxed
145150
146151(Note: We explicitly do not expose the C11 * consume* ordering)
147152
@@ -154,13 +159,13 @@ synchronize"
154159
155160Sequentially Consistent is the most powerful of all, implying the restrictions
156161of all other orderings. Intuitively, a sequentially consistent operation
157- * cannot* be reordered: all accesses on one thread that happen before and after a
158- SeqCst access * stay* before and after it. A data-race-free program that uses
162+ cannot be reordered: all accesses on one thread that happen before and after a
163+ SeqCst access stay before and after it. A data-race-free program that uses
159164only sequentially consistent atomics and data accesses has the very nice
160165property that there is a single global execution of the program's instructions
161166that all threads agree on. This execution is also particularly nice to reason
162167about: it's just an interleaving of each thread's individual executions. This
163- * does not* hold if you start using the weaker atomic orderings.
168+ does not hold if you start using the weaker atomic orderings.
164169
165170The relative developer-friendliness of sequential consistency doesn't come for
166171free. Even on strongly-ordered platforms sequential consistency involves
@@ -170,8 +175,8 @@ In practice, sequential consistency is rarely necessary for program correctness.
170175However sequential consistency is definitely the right choice if you're not
171176confident about the other memory orders. Having your program run a bit slower
172177than it needs to is certainly better than it running incorrectly! It's also
173- * mechanically* trivial to downgrade atomic operations to have a weaker
174- consistency later on. Just change ` SeqCst ` to e.g. ` Relaxed ` and you're done! Of
178+ mechanically trivial to downgrade atomic operations to have a weaker
179+ consistency later on. Just change ` SeqCst ` to ` Relaxed ` and you're done! Of
175180course, proving that this transformation is * correct* is a whole other matter.
176181
177182
@@ -183,15 +188,15 @@ Acquire and Release are largely intended to be paired. Their names hint at their
183188use case: they're perfectly suited for acquiring and releasing locks, and
184189ensuring that critical sections don't overlap.
185190
186- Intuitively, an acquire access ensures that every access after it * stays* after
191+ Intuitively, an acquire access ensures that every access after it stays after
187192it. However operations that occur before an acquire are free to be reordered to
188193occur after it. Similarly, a release access ensures that every access before it
189- * stays* before it. However operations that occur after a release are free to be
194+ stays before it. However operations that occur after a release are free to be
190195reordered to occur before it.
191196
192197When thread A releases a location in memory and then thread B subsequently
193198acquires * the same* location in memory, causality is established. Every write
194- that happened * before* A's release will be observed by B * after* its release.
199+ that happened before A's release will be observed by B after its release.
195200However no causality is established with any other threads. Similarly, no
196201causality is established if A and B access * different* locations in memory.
197202
@@ -230,7 +235,7 @@ weakly-ordered platforms.
230235# Relaxed
231236
232237Relaxed accesses are the absolute weakest. They can be freely re-ordered and
233- provide no happens-before relationship. Still, relaxed operations * are* still
238+ provide no happens-before relationship. Still, relaxed operations are still
234239atomic. That is, they don't count as data accesses and any read-modify-write
235240operations done to them occur atomically. Relaxed operations are appropriate for
236241things that you definitely want to happen, but don't particularly otherwise care
0 commit comments