|
1 | 1 | # High-level overview of the compiler source |
2 | 2 |
|
3 | | -## Crate structure |
4 | | - |
5 | | -The main Rust repository consists of a `src` directory, under which |
6 | | -there live many crates. These crates contain the sources for the |
7 | | -standard library and the compiler. This document, of course, focuses |
8 | | -on the latter. |
9 | | - |
10 | | -Rustc consists of a number of crates, including `rustc_ast`, |
11 | | -`rustc`, `rustc_target`, `rustc_codegen`, `rustc_driver`, and |
12 | | -many more. The source for each crate can be found in a directory |
13 | | -like `src/libXXX`, where `XXX` is the crate name. |
14 | | - |
15 | | -(N.B. The names and divisions of these crates are not set in |
16 | | -stone and may change over time. For the time being, we tend towards a |
17 | | -finer-grained division to help with compilation time, though as incremental |
18 | | -compilation improves, that may change.) |
19 | | - |
20 | | -The dependency structure of these crates is roughly a diamond: |
21 | | - |
22 | | -```text |
23 | | - rustc_driver |
24 | | - / | \ |
25 | | - / | \ |
26 | | - / | \ |
27 | | - / v \ |
28 | | -rustc_codegen rustc_borrowck ... rustc_metadata |
29 | | - \ | / |
30 | | - \ | / |
31 | | - \ | / |
32 | | - \ v / |
33 | | - rustc_middle |
34 | | - | |
35 | | - v |
36 | | - rustc_ast |
37 | | - / \ |
38 | | - / \ |
39 | | - rustc_span rustc_builtin_macros |
40 | | -``` |
41 | | - |
42 | | -The `rustc_driver` crate, at the top of this lattice, is effectively |
43 | | -the "main" function for the rust compiler. It doesn't have much "real |
44 | | -code", but instead ties together all of the code defined in the other |
45 | | -crates and defines the overall flow of execution. (As we transition |
46 | | -more and more to the [query model], however, the |
47 | | -"flow" of compilation is becoming less centrally defined.) |
48 | | - |
49 | | -At the other extreme, the `rustc_middle` crate defines the common and |
50 | | -pervasive data structures that all the rest of the compiler uses |
51 | | -(e.g. how to represent types, traits, and the program itself). It |
52 | | -also contains some amount of the compiler itself, although that is |
53 | | -relatively limited. |
54 | | - |
55 | | -Finally, all the crates in the bulge in the middle define the bulk of |
56 | | -the compiler – they all depend on `rustc_middle`, so that they can make use |
57 | | -of the various types defined there, and they export public routines |
58 | | -that `rustc_driver` will invoke as needed (more and more, what these |
59 | | -crates export are "query definitions", but those are covered later |
60 | | -on). |
61 | | - |
62 | | -Below `rustc_middle` lie various crates that make up the parser and error |
63 | | -reporting mechanism. They are also an internal part |
64 | | -of the compiler and not intended to be stable (though they do wind up |
65 | | -getting used by some crates in the wild; a practice we hope to |
66 | | -gradually phase out). |
67 | | - |
68 | | -## The main stages of compilation |
69 | | - |
70 | | -The Rust compiler is in a bit of transition right now. It used to be a |
71 | | -purely "pass-based" compiler, where we ran a number of passes over the |
72 | | -entire program, and each did a particular check of transformation. We |
73 | | -are gradually replacing this pass-based code with an alternative setup |
74 | | -based on on-demand **queries**. In the query-model, we work backwards, |
75 | | -executing a *query* that expresses our ultimate goal (e.g. "compile |
76 | | -this crate"). This query in turn may make other queries (e.g. "get me |
77 | | -a list of all modules in the crate"). Those queries make other queries |
78 | | -that ultimately bottom out in the base operations, like parsing the |
79 | | -input, running the type-checker, and so forth. This on-demand model |
80 | | -permits us to do exciting things like only do the minimal amount of |
81 | | -work needed to type-check a single function. It also helps with |
82 | | -incremental compilation. (For details on defining queries, check out |
83 | | -the [query model].) |
84 | | - |
85 | | -Regardless of the general setup, the basic operations that the |
86 | | -compiler must perform are the same. The only thing that changes is |
87 | | -whether these operations are invoked front-to-back, or on demand. In |
88 | | -order to compile a Rust crate, these are the general steps that we |
89 | | -take: |
90 | | - |
91 | | -1. **Parsing input** |
92 | | - - this processes the `.rs` files and produces the AST |
93 | | - ("abstract syntax tree") |
94 | | - - the AST is defined in `src/librustc_ast/ast.rs`. It is intended to match the lexical |
95 | | - syntax of the Rust language quite closely. |
96 | | -2. **Name resolution, macro expansion, and configuration** |
97 | | - - once parsing is complete, we process the AST recursively, resolving |
98 | | - paths and expanding macros. This same process also processes `#[cfg]` |
99 | | - nodes, and hence may strip things out of the AST as well. |
100 | | -3. **Lowering to HIR** |
101 | | - - Once name resolution completes, we convert the AST into the HIR, |
102 | | - or "[high-level intermediate representation]". The HIR is defined in |
103 | | - `src/librustc_middle/hir/`; that module also includes the [lowering] code. |
104 | | - - The HIR is a lightly desugared variant of the AST. It is more processed |
105 | | - than the AST and more suitable for the analyses that follow. |
106 | | - It is **not** required to match the syntax of the Rust language. |
107 | | - - As a simple example, in the **AST**, we preserve the parentheses |
108 | | - that the user wrote, so `((1 + 2) + 3)` and `1 + 2 + 3` parse |
109 | | - into distinct trees, even though they are equivalent. In the |
110 | | - HIR, however, parentheses nodes are removed, and those two |
111 | | - expressions are represented in the same way. |
112 | | -3. **Type-checking and subsequent analyses** |
113 | | - - An important step in processing the HIR is to perform type |
114 | | - checking. This process assigns types to every HIR expression, |
115 | | - for example, and also is responsible for resolving some |
116 | | - "type-dependent" paths, such as field accesses (`x.f` – we |
117 | | - can't know what field `f` is being accessed until we know the |
118 | | - type of `x`) and associated type references (`T::Item` – we |
119 | | - can't know what type `Item` is until we know what `T` is). |
120 | | - - Type checking creates "side-tables" (`TypeckTables`) that include |
121 | | - the types of expressions, the way to resolve methods, and so forth. |
122 | | - - After type-checking, we can do other analyses, such as privacy checking. |
123 | | -4. **Lowering to MIR and post-processing** |
124 | | - - Once type-checking is done, we can lower the HIR into MIR ("middle IR"), |
125 | | - which is a **very** desugared version of Rust, well suited to borrowck |
126 | | - but also to certain high-level optimizations. |
127 | | -5. **Translation to LLVM and LLVM optimizations** |
128 | | - - From MIR, we can produce LLVM IR. |
129 | | - - LLVM then runs its various optimizations, which produces a number of |
130 | | - `.o` files (one for each "codegen unit"). |
131 | | -6. **Linking** |
132 | | - - Finally, those `.o` files are linked together. |
133 | | - |
134 | | - |
135 | | -[query model]: query.html |
136 | | -[high-level intermediate representation]: hir.html |
137 | | -[lowering]: lowering.html |
| 3 | +> **NOTE**: The structure of the repository is going through a lot of |
| 4 | +> transitions. In particular, we want to get to a point eventually where the |
| 5 | +> top-level directory has separate directories for the compiler, build-system, |
| 6 | +> std libs, etc, rather than one huge `src/` directory. |
| 7 | +
|
| 8 | +## Workspace structure |
| 9 | + |
| 10 | +The `rust-lang/rust` repository consists of a single large cargo workspace |
| 11 | +containing the compiler, the standard library (core, alloc, std, etc), and |
| 12 | +`rustdoc`, along with the build system and bunch of tools and submodules for |
| 13 | +building a full Rust distribution. |
| 14 | + |
| 15 | +As of this writing, this structure is gradually undergoing some transformation |
| 16 | +to make it a bit less monolithic and more approachable, especially to |
| 17 | +newcommers. |
| 18 | + |
| 19 | +> Eventually, the hope is for the standard library to live in a `stdlib/` |
| 20 | +> directory, while the compiler lives in `compiler/`. However, as of this |
| 21 | +> writing, both live in `src/`. |
| 22 | +
|
| 23 | +The repository consists of a `src` directory, under which there live many |
| 24 | +crates, which are the source for the compiler, standard library, etc, as |
| 25 | +mentioned above. |
| 26 | + |
| 27 | +## Standard library |
| 28 | + |
| 29 | +The standard library crates are obviously named `libstd`, `libcore`, |
| 30 | +`liballoc`, etc. There is also `libproc_macro`, `libtest`, and other runtime |
| 31 | +libraries. |
| 32 | + |
| 33 | +This code is fairly similar to most other Rust crates except that it must be |
| 34 | +built in a special way because it can use unstable features. |
| 35 | + |
| 36 | +## Compiler |
| 37 | + |
| 38 | +The compiler crates all have names starting with `librustc_*`. These are a large |
| 39 | +collection of interdependent crates. There is also the `rustc` crate which is |
| 40 | +the actual binary. It doesn't actually do anything besides calling the compiler |
| 41 | +main function elsewhere. |
| 42 | + |
| 43 | +The dependency structure of these crates is complex, but roughly it is |
| 44 | +something like this: |
| 45 | + |
| 46 | +- `rustc` (the binary) calls [`rustc_driver::main`][main]. |
| 47 | + - [`rustc_driver`] depends on a lot of other crates, but the main one is |
| 48 | + [`rustc_interface`]. |
| 49 | + - [`rustc_interface`] depends on most of the other compiler crates. It |
| 50 | + is a fairly generic interface for driving the whole compilation. |
| 51 | + - The most of the other `rustc_*` crates depend on [`rustc_middle`], |
| 52 | + which defines a lot of central data structures in the compiler. |
| 53 | + - [`rustc_middle`] and most of the other crates depend on a |
| 54 | + handful of crates representing the early parts of the |
| 55 | + compiler (e.g. the parser), fundamental data structures (e.g. |
| 56 | + [`Span`]), or error reporting: [`rustc_data_strucutres`], |
| 57 | + [`rustc_span`], [`rustc_errors`], etc. |
| 58 | + |
| 59 | +[main]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/fn.main.html |
| 60 | +[`rustc_driver`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/index.html |
| 61 | +[`rustc_interface`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/index.html |
| 62 | +[`rustc_middle`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/index.html |
| 63 | +[`rustc_data_strucutres`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_strucutres/index.html |
| 64 | +[`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html |
| 65 | +[`Span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html |
| 66 | +[`rustc_errors`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html |
| 67 | + |
| 68 | +You can see the exact dependencies by reading the `Cargo.toml` for the various |
| 69 | +crates, just like a normal Rust crate. |
| 70 | + |
| 71 | +You may ask why the compiler is broken into so many crates. There are two major reasons: |
| 72 | + |
| 73 | +1. Organization. The compiler is a _huge_ codebase; it would be an impossibly large crate. |
| 74 | +2. Compile time. By breaking the compiler into multiple crates, we can take |
| 75 | + better advantage of incremental/parallel compilation using cargo. In |
| 76 | + particular, we try to have as few dependencies between crates as possible so |
| 77 | + that we dont' have to rebuild as many crates if you change one. |
| 78 | + |
| 79 | +Most of this book is about the compiler, so we won't have any further |
| 80 | +explanation of these crates here. |
| 81 | + |
| 82 | +One final thing: [`src/llvm-project`] is a submodule for our fork of LLVM. |
| 83 | + |
| 84 | +[`src/llvm-project`]: https://github.com/rust-lang/rust/tree/master/src |
| 85 | + |
| 86 | +## rustdoc |
| 87 | + |
| 88 | +The bulk of `rustdoc` is in [`librustdoc`]. However, the `rustdoc` binary |
| 89 | +itself is [`src/tools/rustdoc`], which does nothing except call [`rustdoc::main`]. |
| 90 | + |
| 91 | +There is also javascript and CSS for the rustdocs in [`src/tools/rustdoc-js`] |
| 92 | +and [`src/tools/rustdoc-themes`]. |
| 93 | + |
| 94 | +You can read more about rustdoc in [this chapter][rustdocch]. |
| 95 | + |
| 96 | +[`librustdoc`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustdoc/index.html |
| 97 | +[`rustdoc::main`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustdoc/fn.main.html |
| 98 | +[`src/tools/rustdoc`]: https://github.com/rust-lang/rust/tree/master/src/tools/rustdoc |
| 99 | +[`src/tools/rustdoc-js`]: https://github.com/rust-lang/rust/tree/master/src/tools/rustdoc-js |
| 100 | +[`src/tools/rustdoc-themes`]: https://github.com/rust-lang/rust/tree/master/src/tools/rustdoc-themes |
| 101 | + |
| 102 | +[rustdocch]: ./rustdoc-internals.md |
| 103 | + |
| 104 | +## Tests |
| 105 | + |
| 106 | +The test suite for all of the above is in [`src/test/`]. You can read more |
| 107 | +about the test suite [in this chapter][testsch]. |
| 108 | + |
| 109 | +The test harness itself is in [`src/tools/compiletest`]. |
| 110 | + |
| 111 | +[testsch]: ./tests/intro.md |
| 112 | + |
| 113 | +[`src/test/`]: https://github.com/rust-lang/rust/tree/master/src/test |
| 114 | +[`src/tools/compiletest`]: https://github.com/rust-lang/rust/tree/master/src/tools/compiletest |
| 115 | + |
| 116 | +## Build System |
| 117 | + |
| 118 | +There are a number of tools in the repository just for building the compiler, |
| 119 | +standard library, rustdoc, etc, along with testing, building a full Rust |
| 120 | +distribution, etc. |
| 121 | + |
| 122 | +One of the primary tools is [`src/bootstrap`]. You can read more about |
| 123 | +bootstrapping [in this chapter][bootstch]. The process may also use other tools |
| 124 | +from `src/tools/`, such as [`tidy`] or [`compiletest`]. |
| 125 | + |
| 126 | +[`src/bootstrap`]: https://github.com/rust-lang/rust/tree/master/src/bootstrap |
| 127 | +[`tidy`]: https://github.com/rust-lang/rust/tree/master/src/tools/tidy |
| 128 | +[`compiletest`]: https://github.com/rust-lang/rust/tree/master/src/tools/compiletest |
| 129 | + |
| 130 | +[bootstch]: ./building/bootstrapping.md |
| 131 | + |
| 132 | +## Other |
| 133 | + |
| 134 | +There are a lot of other things in the `rust-lang/rust` repo that are related |
| 135 | +to building a full rust distribution. Most of the time you don't need to worry |
| 136 | +about them. |
| 137 | + |
| 138 | +These include: |
| 139 | +- [`src/ci`]: The CI configuration. This actually quite extensive because we |
| 140 | + run a lot of tests on a lot of platforms. |
| 141 | +- [`src/doc`]: Various documentation, including submodules for a few books. |
| 142 | +- [`src/etc`]: Miscellaneous utilities. |
| 143 | +- [`src/tools/rustc-workspace-hack`], and others: Various workarounds to make cargo work with bootstrapping. |
| 144 | +- And more... |
| 145 | + |
| 146 | +[`src/ci`]: https://github.com/rust-lang/rust/tree/master/src/ci |
| 147 | +[`src/doc`]: https://github.com/rust-lang/rust/tree/master/src/doc |
| 148 | +[`src/etc`]: https://github.com/rust-lang/rust/tree/master/src/etc |
| 149 | +[`src/tools/rustc-workspace-hack`]: https://github.com/rust-lang/rust/tree/master/src/tools/rustc-workspace-hack |
0 commit comments