|
1 | 1 | # From MIR to Binaries |
2 | 2 |
|
3 | | -All of the preceding chapters of this guide have one thing in common: we never |
4 | | -generated any executable machine code at all! With this chapter, all of that |
5 | | -changes. |
| 3 | +All of the preceding chapters of this guide have one thing in common: |
| 4 | +we never generated any executable machine code at all! |
| 5 | +With this chapter, all of that changes. |
6 | 6 |
|
7 | | -So far, we've shown how the compiler can take raw source code in text format |
8 | | -and transform it into [MIR]. We have also shown how the compiler does various |
9 | | -analyses on the code to detect things like type or lifetime errors. Now, we |
10 | | -will finally take the MIR and produce some executable machine code. |
| 7 | +So far, |
| 8 | +we've shown how the compiler can take raw source code in text format |
| 9 | +and transform it into [MIR]. |
| 10 | +We have also shown how the compiler does various |
| 11 | +analyses on the code to detect things like type or lifetime errors. |
| 12 | +Now, we will finally take the MIR and produce some executable machine code. |
11 | 13 |
|
12 | 14 | [MIR]: ./mir/index.md |
13 | 15 |
|
14 | | -> NOTE: This part of a compiler is often called the _backend_. The term is a bit |
15 | | -> overloaded because in the compiler source, it usually refers to the "codegen |
16 | | -> backend" (i.e. LLVM or Cranelift). Usually, when you see the word "backend" |
17 | | -> in this part, we are referring to the "codegen backend". |
| 16 | +> NOTE: This part of a compiler is often called the _backend_. |
| 17 | +> The term is a bit overloaded because in the compiler source, |
| 18 | +> it usually refers to the "codegen backend" (i.e. LLVM, Cranelift, or GCC). |
| 19 | +> Usually, when you see the word "backend" in this part, |
| 20 | +> we are referring to the "codegen backend". |
18 | 21 |
|
19 | 22 | So what do we need to do? |
20 | 23 |
|
21 | | -0. First, we need to collect the set of things to generate code for. In |
22 | | - particular, we need to find out which concrete types to substitute for |
23 | | - generic ones, since we need to generate code for the concrete types. |
24 | | - Generating code for the concrete types (i.e. emitting a copy of the code for |
25 | | - each concrete type) is called _monomorphization_, so the process of |
26 | | - collecting all the concrete types is called _monomorphization collection_. |
| 24 | +0. First, we need to collect the set of things to generate code for. |
| 25 | + In particular, |
| 26 | + we need to find out which concrete types to substitute for generic ones, |
| 27 | + since we need to generate code for the concrete types. |
| 28 | + Generating code for the concrete types |
| 29 | + (i.e. emitting a copy of the code for each concrete type) is called _monomorphization_, |
| 30 | + so the process of collecting all the concrete types is called _monomorphization collection_. |
27 | 31 | 1. Next, we need to actually lower the MIR to a codegen IR |
28 | 32 | (usually LLVM IR) for each concrete type we collected. |
29 | | -2. Finally, we need to invoke LLVM or Cranelift, which runs a bunch of |
30 | | - optimization passes, generates executable code, and links together an |
31 | | - executable binary. |
| 33 | +2. Finally, we need to invoke the codegen backend, |
| 34 | + which runs a bunch of optimization passes, |
| 35 | + generates executable code, |
| 36 | + and links together an executable binary. |
32 | 37 |
|
33 | 38 | [codegen1]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html |
34 | 39 |
|
35 | 40 | The code for codegen is actually a bit complex due to a few factors: |
36 | 41 |
|
37 | | -- Support for multiple codegen backends (LLVM and Cranelift). We try to share as much |
38 | | - backend code between them as possible, so a lot of it is generic over the |
39 | | - codegen implementation. This means that there are often a lot of layers of |
40 | | - abstraction. |
| 42 | +- Support for multiple codegen backends (LLVM, Cranelift, and GCC). |
| 43 | + We try to share as much backend code between them as possible, |
| 44 | + so a lot of it is generic over the codegen implementation. |
| 45 | + This means that there are often a lot of layers of abstraction. |
41 | 46 | - Codegen happens asynchronously in another thread for performance. |
42 | | -- The actual codegen is done by a third-party library (either LLVM or Cranelift). |
| 47 | +- The actual codegen is done by a third-party library (either of the 3 backends). |
43 | 48 |
|
44 | | -Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnostic code |
45 | | -(i.e. independent of LLVM or Cranelift), while the [`rustc_codegen_llvm`][llvm] |
46 | | -crate contains code specific to LLVM codegen. |
| 49 | +Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnostic code, |
| 50 | +while the [`rustc_codegen_llvm`][llvm] crate contains code specific to LLVM codegen. |
47 | 51 |
|
48 | 52 | [ssa]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html |
49 | 53 | [llvm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/index.html |
50 | 54 |
|
51 | 55 | At a very high level, the entry point is |
52 | | -[`rustc_codegen_ssa::base::codegen_crate`][codegen1]. This function starts the |
53 | | -process discussed in the rest of this chapter. |
54 | | - |
| 56 | +[`rustc_codegen_ssa::base::codegen_crate`][codegen1]. |
| 57 | +This function starts the process discussed in the rest of this chapter. |
0 commit comments