|
1 | 1 | # How the Rust CI works |
2 | 2 |
|
3 | | -Rust CI ensures that the master branch of rust-lang/rust is always in a valid state. |
| 3 | +Continuous integration (CI) workflows on the `rust-lang/rust` repository ensure that the `master` branch |
| 4 | +is always in a valid state. |
4 | 5 |
|
5 | | -A developer submitting a pull request to rust-lang/rust, experiences the following: |
| 6 | +The CI infrastructure is described in detail in the [rustc-dev-guide][rustc-dev-guide]. |
6 | 7 |
|
7 | | -- A small subset of tests and checks are run on each commit to catch common errors. |
8 | | -- When the PR is ready and approved, the "bors" tool enqueues a full CI run. |
9 | | -- The full run either queues the specific PR or the PR is "rolled up" with other changes. |
10 | | -- Eventually a CI run containing the changes from the PR is performed and either passes or fails with an error the developer must address. |
11 | | - |
12 | | -## Which jobs we run |
13 | | - |
14 | | -The `rust-lang/rust` repository uses GitHub Actions to test [all the |
15 | | -platforms][platforms] we support. We currently have two kinds of jobs running |
16 | | -for each commit we want to merge to master: |
17 | | - |
18 | | -- Dist jobs build a full release of the compiler for that platform, including |
19 | | - all the tools we ship through rustup; Those builds are then uploaded to the |
20 | | - `rust-lang-ci2` S3 bucket and are available to be locally installed with the |
21 | | - [rustup-toolchain-install-master] tool; The same builds are also used for |
22 | | - actual releases: our release process basically consists of copying those |
23 | | - artifacts from `rust-lang-ci2` to the production endpoint and signing them. |
24 | | -- Non-dist jobs run our full test suite on the platform, and the test suite of |
25 | | - all the tools we ship through rustup; The amount of stuff we test depends on |
26 | | - the platform (for example some tests are run only on Tier 1 platforms), and |
27 | | - some quicker platforms are grouped together on the same builder to avoid |
28 | | - wasting CI resources. |
29 | | - |
30 | | -All the builds except those on macOS and Windows are executed inside that |
31 | | -platform’s custom [Docker container]. This has a lot of advantages for us: |
32 | | - |
33 | | -- The build environment is consistent regardless of the changes of the |
34 | | - underlying image (switching from the trusty image to xenial was painless for |
35 | | - us). |
36 | | -- We can use ancient build environments to ensure maximum binary compatibility, |
37 | | - for example [using older CentOS releases][dist-x86_64-linux] on our Linux builders. |
38 | | -- We can avoid reinstalling tools (like QEMU or the Android emulator) every |
39 | | - time thanks to Docker image caching. |
40 | | -- Users can run the same tests in the same environment locally by just running |
41 | | - `src/ci/docker/run.sh image-name`, which is awesome to debug failures. |
42 | | - |
43 | | -The docker images prefixed with `dist-` are used for building artifacts while those without that prefix run tests and checks. |
44 | | - |
45 | | -We also run tests for less common architectures (mainly Tier 2 and Tier 3 |
46 | | -platforms) in CI. Since those platforms are not x86 we either run |
47 | | -everything inside QEMU or just cross-compile if we don’t want to run the tests |
48 | | -for that platform. |
49 | | - |
50 | | -These builders are running on a special pool of builders set up and maintained for us by GitHub. |
51 | | - |
52 | | -Almost all build steps shell out to separate scripts. This keeps the CI fairly platform independent (i.e., we are not |
53 | | -overly reliant on GitHub Actions). GitHub Actions is only relied on for bootstrapping the CI process and for orchestrating |
54 | | -the scripts that drive the process. |
55 | | - |
56 | | -[platforms]: https://doc.rust-lang.org/nightly/rustc/platform-support.html |
57 | | -[rustup-toolchain-install-master]: https://github.com/kennytm/rustup-toolchain-install-master |
58 | | -[Docker container]: https://github.com/rust-lang/rust/tree/master/src/ci/docker |
59 | | -[dist-x86_64-linux]: https://github.com/rust-lang/rust/blob/master/src/ci/docker/host-x86_64/dist-x86_64-linux/Dockerfile |
60 | | - |
61 | | -## Merging PRs serially with bors |
62 | | - |
63 | | -CI services usually test the last commit of a branch merged with the last |
64 | | -commit in master, and while that’s great to check if the feature works in |
65 | | -isolation it doesn’t provide any guarantee the code is going to work once it’s |
66 | | -merged. Breakages like these usually happen when another, incompatible PR is |
67 | | -merged after the build happened. |
68 | | - |
69 | | -To ensure a master that works all the time we forbid manual merges: instead all |
70 | | -PRs have to be approved through our bot, [bors] (the software behind it is |
71 | | -called [homu]). All the approved PRs are put [in a queue][homu-queue] (sorted |
72 | | -by priority and creation date) and are automatically tested one at the time. If |
73 | | -all the builders are green the PR is merged, otherwise the failure is recorded |
74 | | -and the PR will have to be re-approved again. |
75 | | - |
76 | | -Bors doesn’t interact with CI services directly, but it works by pushing the |
77 | | -merge commit it wants to test to a branch called `auto`, and detecting the |
78 | | -outcome of the build by listening for either Commit Statuses or Check Runs. |
79 | | -Since the merge commit is based on the latest master and only one can be tested |
80 | | -at the same time, when the results are green master is fast-forwarded to that |
81 | | -merge commit. |
82 | | - |
83 | | -The `auto` branch and other branches used by bors live on a fork of rust-lang/rust: |
84 | | -[rust-lang-ci/rust]. This was originally done due to some security limitations in GitHub |
85 | | -Actions. These limitations have been addressed, but we've not yet done the work of removing |
86 | | -the use of the fork. |
87 | | - |
88 | | -Unfortunately testing a single PR at the time, combined with our long CI (~3 |
89 | | -hours for a full run)[^1], means we can’t merge too many PRs in a single day, and a |
90 | | -single failure greatly impacts our throughput for the day. The maximum number |
91 | | -of PRs we can merge in a day is around 8. |
92 | | - |
93 | | -The large CI run times and requirement for a large builder pool is largely due to the |
94 | | -fact that full release artifacts are built in the `dist-` builders. This is worth it |
95 | | -because these release artifacts: |
96 | | - |
97 | | -- allow perf testing even at a later date |
98 | | -- allow bisection when bugs are discovered later |
99 | | -- ensure release quality since if we're always releasing, we can catch problems early |
100 | | - |
101 | | -Bors [runs on ecs](https://github.com/rust-lang/simpleinfra/blob/master/terraform/bors/app.tf) and uses a sqlite database running in a volume as storage. |
102 | | - |
103 | | -[^1]: As of January 2023, the bottleneck are the `dist-x86_64-linux` and `dist-x86_64-linux-alt` runners because of their usage of [BOLT] and [PGO] optimization tooling. |
104 | | - |
105 | | -[bors]: https://github.com/bors |
106 | | -[homu]: https://github.com/rust-lang/homu |
107 | | -[homu-queue]: https://bors.rust-lang.org/queue/rust |
108 | | -[rust-lang-ci/rust]: https://github.com/rust-lang-ci/rust |
109 | | -[BOLT]: https://github.com/facebookincubator/BOLT |
110 | | -[PGO]: https://en.wikipedia.org/wiki/Profile-guided_optimization |
111 | | - |
112 | | -### Rollups |
113 | | - |
114 | | -Some PRs don’t need the full test suite to be executed: trivial changes like |
115 | | -typo fixes or README improvements *shouldn’t* break the build, and testing |
116 | | -every single one of them for 2 to 3 hours is a big waste of time. To solve this |
117 | | -we do a "rollup", a PR where we merge all the trivial PRs so they can be tested |
118 | | -together. Rollups are created manually by a team member using the "create a rollup" button on the [bors queue]. The team member uses their |
119 | | -judgment to decide if a PR is risky or not, and are the best tool we have at |
120 | | -the moment to keep the queue in a manageable state. |
121 | | - |
122 | | -[bors queue]: https://bors.rust-lang.org/queue/rust |
123 | | - |
124 | | -### Try builds |
125 | | - |
126 | | -Sometimes we need a working compiler build before approving a PR, usually for |
127 | | -[benchmarking][perf] or [checking the impact of the PR across the |
128 | | -ecosystem][crater]. Bors supports creating them by pushing the merge commit on |
129 | | -a separate branch (`try`), and they basically work the same as normal builds, |
130 | | -without the actual merge at the end. Any number of try builds can happen at the |
131 | | -same time, even if there is a normal PR in progress. |
132 | | - |
133 | | -You can see the CI configuration for try builds [here](https://github.com/rust-lang/rust/blob/9d46c7a3e69966782e163877151c1f0cea8b630a/src/ci/github-actions/ci.yml#L728-L741). |
134 | | - |
135 | | -If you want to perform a try build with a different configuration (e.g. try to |
136 | | -perform a compiler build for a different architecture), you can temporarily change |
137 | | -the `try` CI job in your PR: |
138 | | - |
139 | | -1) Open `src/ci/github-actions/ci.yml` |
140 | | -2) Find the CI job that you want to run (e.g. `dist-aarch64-linux`) |
141 | | -3) Copy-paste the entry of the CI job |
142 | | -4) Find the `try:` job in the file |
143 | | -5) Replace the `dist-x86_64-linux` job in the matrix with the copied entry from step 3) |
144 | | -6) Run `python3 x.py run src/tools/expand-yaml-anchors` |
145 | | -7) Push your changes and start a try build with `@bors try` |
146 | | - |
147 | | -[perf]: https://perf.rust-lang.org |
148 | | -[crater]: https://github.com/rust-lang/crater |
149 | | - |
150 | | -## Which branches we test |
151 | | - |
152 | | -Our builders are defined in [`src/ci/github-actions/ci.yml`]. |
153 | | - |
154 | | -[`src/ci/github-actions/ci.yml`]: https://github.com/rust-lang/rust/blob/master/src/ci/github-actions/ci.yml |
155 | | - |
156 | | -### PR builds |
157 | | - |
158 | | -All the commits pushed in a PR run a limited set of tests: a job containing a |
159 | | -bunch of lints plus a cross-compile check build to Windows mingw (without |
160 | | -producing any artifacts) and the `x86_64-gnu-llvm-##` non-dist builder (where |
161 | | -`##` is the *system* LLVM version we are currently testing). Those two |
162 | | -builders are enough to catch most of the common errors introduced in a PR, but |
163 | | -they don’t cover other platforms at all. Unfortunately it would take too many |
164 | | -resources to run the full test suite for each commit on every PR. |
165 | | - |
166 | | -Additionally, if the PR changes certain tools (or certain platform-specific |
167 | | -parts of std to check for miri breakage), the `x86_64-gnu-tools` non-dist |
168 | | -builder is run. |
169 | | - |
170 | | -### The `try` branch |
171 | | - |
172 | | -On the main rust repo, `try` builds produce just a Linux toolchain using the |
173 | | -`dist-x86_64-linux` image. |
174 | | - |
175 | | -### The `auto` branch |
176 | | - |
177 | | -This branch is used by bors to run all the tests on a PR before merging it, so |
178 | | -all the builders are enabled for it. bors will repeatedly force-push on it |
179 | | -(every time a new commit is tested). |
180 | | - |
181 | | -### The `master` branch |
182 | | - |
183 | | -Since all the commits to `master` are fast-forwarded from the `auto` branch (if |
184 | | -they pass all the tests there) we don’t need to build or test anything. A quick |
185 | | -job is executed on each push to update toolstate (see the toolstate description |
186 | | -below). |
187 | | - |
188 | | -### Other branches |
189 | | - |
190 | | -Other branches are just disabled and don’t run any kind of builds, since all |
191 | | -the in-progress branches will eventually be tested in a PR. |
192 | | - |
193 | | -## Caching |
194 | | - |
195 | | -The main rust repository doesn’t use the native GitHub Actions caching tools. |
196 | | -All our caching is uploaded to an S3 bucket we control |
197 | | -(`rust-lang-ci-sccache2`), and it’s used mainly for two things: |
198 | | - |
199 | | -### Docker images caching |
200 | | - |
201 | | -The Docker images we use to run most of the Linux-based builders take a *long* |
202 | | -time to fully build. To speed up the build, we cache the exported images on the |
203 | | -S3 bucket (with `docker save`/`docker load`). |
204 | | - |
205 | | -Since we test multiple, diverged branches (`master`, `beta` and `stable`) we |
206 | | -can’t rely on a single cache for the images, otherwise builds on a branch would |
207 | | -override the cache for the others. Instead we store the images identifying them |
208 | | -with a custom hash, made from the host’s Docker version and the contents of all |
209 | | -the Dockerfiles and related scripts. |
210 | | - |
211 | | -### LLVM caching with sccache |
212 | | - |
213 | | -We build some C/C++ stuff during the build and we rely on [sccache] to cache |
214 | | -intermediate LLVM artifacts. Sccache is a distributed ccache developed by |
215 | | -Mozilla, and it can use an object storage bucket as the storage backend, like |
216 | | -we do with our S3 bucket. |
217 | | - |
218 | | -[sccache]: https://github.com/mozilla/sccache |
219 | | - |
220 | | -## Custom tooling around CI |
221 | | - |
222 | | -During the years we developed some custom tooling to improve our CI experience. |
223 | | - |
224 | | -### Rust Log Analyzer to show the error message in PRs |
225 | | - |
226 | | -The build logs for `rust-lang/rust` are huge, and it’s not practical to find |
227 | | -what caused the build to fail by looking at the logs. To improve the |
228 | | -developers’ experience we developed a bot called [Rust Log Analyzer][rla] (RLA) |
229 | | -that receives the build logs on failure and extracts the error message |
230 | | -automatically, posting it on the PR. |
231 | | - |
232 | | -The bot is not hardcoded to look for error strings, but was trained with a |
233 | | -bunch of build failures to recognize which lines are common between builds and |
234 | | -which are not. While the generated snippets can be weird sometimes, the bot is |
235 | | -pretty good at identifying the relevant lines even if it’s an error we've never |
236 | | -seen before. |
237 | | - |
238 | | -[rla]: https://github.com/rust-lang/rust-log-analyzer |
239 | | - |
240 | | -### Toolstate to support allowed failures |
241 | | - |
242 | | -The `rust-lang/rust` repo doesn’t only test the compiler on its CI, but also a |
243 | | -variety of tools and documentation. Some documentation is pulled in via git |
244 | | -submodules. If we blocked merging rustc PRs on the documentation being fixed, |
245 | | -we would be stuck in a chicken-and-egg problem, because the documentation's CI |
246 | | -would not pass since updating it would need the not-yet-merged version of |
247 | | -rustc to test against (and we usually require CI to be passing). |
248 | | - |
249 | | -To avoid the problem, submodules are allowed to fail, and their status is |
250 | | -recorded in [rust-toolstate]. When a submodule breaks, a bot automatically |
251 | | -pings the maintainers so they know about the breakage, and it records the |
252 | | -failure on the toolstate repository. The release process will then ignore |
253 | | -broken tools on nightly, removing them from the shipped nightlies. |
254 | | - |
255 | | -While tool failures are allowed most of the time, they’re automatically |
256 | | -forbidden a week before a release: we don’t care if tools are broken on nightly |
257 | | -but they must work on beta and stable, so they also need to work on nightly a |
258 | | -few days before we promote nightly to beta. |
259 | | - |
260 | | -More information is available in the [toolstate documentation]. |
261 | | - |
262 | | -### GitHub Actions Templating |
263 | | - |
264 | | -GitHub Actions does not natively support templating which can cause configurations to be large and difficult to change. We use YAML anchors for templating and a custom tool, [`expand-yaml-anchors`], to expand [the template] into the CI configuration that [GitHub uses][ci config]. |
265 | | - |
266 | | -This templating language is fairly straightforward: |
267 | | - |
268 | | -- `&` indicates a template section |
269 | | -- `*` expands the indicated template in place |
270 | | -- `<<` merges yaml dictionaries |
271 | | - |
272 | | -[rust-toolstate]: https://rust-lang-nursery.github.io/rust-toolstate |
273 | | -[toolstate documentation]: ../toolstate.md |
274 | | -[`expand-yaml-anchors`]: https://github.com/rust-lang/rust/tree/master/src/tools/expand-yaml-anchors |
275 | | -[the template]: https://github.com/rust-lang/rust/blob/736c675d2ab65bcde6554e1b73340c2dbc27c85a/src/ci/github-actions/ci.yml |
276 | | -[ci config]: https://github.com/rust-lang/rust/blob/master/.github/workflows/ci.yml |
| 8 | +[rustc-dev-guide]: https://rustc-dev-guide.rust-lang.org/tests/ci.html |
0 commit comments