@@ -12,5 +12,84 @@ Depending on what you're trying to measure, there are several different approach
1212 See [ their docs] ( https://github.com/rust-lang/measureme/blob/master/summarize/Readme.md ) for more information.
1313
1414- If you want function level performance data or even just more details than the above approaches:
15- - Consider using a native code profiler such as [ perf] ( profiling/with_perf.html ) .
15+ - Consider using a native code profiler such as [ perf] ( profiling/with_perf.html )
16+ - or [ tracy] ( https://github.com/nagisa/rust_tracy_client ) for a nanosecond-precision,
17+ full-featured graphical interface.
1618
19+ - If you want a nice visual representation of the compile times of your crate graph,
20+ you can use [ cargo's ` -Ztimings ` flag] ( https://doc.rust-lang.org/cargo/reference/unstable.html#timings ) ,
21+ eg. ` cargo -Ztimings build ` .
22+ You can use this flag on the compiler itself with ` CARGOFLAGS="-Ztimings" ./x.py build `
23+
24+ ## Optimizing rustc's self-compile-times with cargo-llvm-lines
25+
26+ Using [ cargo-llvm-lines] ( https://github.com/dtolnay/cargo-llvm-lines ) you can count the
27+ number of lines of LLVM IR across all instantiations of a generic function.
28+ Since most of the time compiling rustc is spent in LLVM, the idea is that by
29+ reducing the amount of code passed to LLVM, compiling rustc gets faster.
30+
31+ Example usage:
32+ ```
33+ cargo install cargo-llvm-lines
34+ # On a normal crate you could now run `cargo llvm-lines`, but x.py isn't normal :P
35+
36+ # Do a clean before every run, to not mix in the results from previous runs.
37+ ./x.py clean
38+ RUSTFLAGS="--emit=llvm-ir" ./x.py build --stage 0 compiler/rustc
39+
40+ # Single crate, eg. rustc_middle
41+ cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/rustc_middle* > llvm-lines-middle.txt
42+ # Whole compiler at once
43+ cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/*.ll > llvm-lines.txt
44+ ```
45+
46+ Example output:
47+ ```
48+ Lines Copies Function name
49+ ----- ------ -------------
50+ 11802479 (100%) 52848 (100%) (TOTAL)
51+ 1663902 (14.1%) 400 (0.8%) rustc_query_system::query::plumbing::get_query_impl::{{closure}}
52+ 683526 (5.8%) 10579 (20.0%) core::ptr::drop_in_place
53+ 568523 (4.8%) 528 (1.0%) rustc_query_system::query::plumbing::get_query_impl
54+ 472715 (4.0%) 1134 (2.1%) hashbrown::raw::RawTable<T>::reserve_rehash
55+ 306782 (2.6%) 1320 (2.5%) rustc_middle::ty::query::plumbing::<impl rustc_query_system::query::QueryContext for rustc_middle::ty::context::TyCtxt>::start_query::{{closure}}::{{closure}}::{{closure}}
56+ 212800 (1.8%) 514 (1.0%) rustc_query_system::dep_graph::graph::DepGraph<K>::with_task_impl
57+ 194813 (1.7%) 124 (0.2%) rustc_query_system::query::plumbing::force_query_impl
58+ 158488 (1.3%) 1 (0.0%) rustc_middle::ty::query::<impl rustc_middle::ty::context::TyCtxt>::alloc_self_profile_query_strings
59+ 119768 (1.0%) 418 (0.8%) core::ops::function::FnOnce::call_once
60+ 119644 (1.0%) 1 (0.0%) rustc_target::spec::load_specific
61+ 104153 (0.9%) 7 (0.0%) rustc_middle::ty::context::_DERIVE_rustc_serialize_Decodable_D_FOR_TypeckResults::<impl rustc_serialize::serialize::Decodable<__D> for rustc_middle::ty::context::TypeckResults>::decode::{{closure}}
62+ 81173 (0.7%) 1 (0.0%) rustc_middle::ty::query::stats::query_stats
63+ 80306 (0.7%) 2029 (3.8%) core::ops::function::FnOnce::call_once{{vtable.shim}}
64+ 78019 (0.7%) 1611 (3.0%) stacker::grow::{{closure}}
65+ 69720 (0.6%) 3286 (6.2%) <&T as core::fmt::Debug>::fmt
66+ 56327 (0.5%) 186 (0.4%) rustc_query_system::query::plumbing::incremental_verify_ich
67+ 49714 (0.4%) 14 (0.0%) rustc_mir::dataflow::framework::graphviz::BlockFormatter<A>::write_node_label
68+ ```
69+
70+ Since this doesn't seem to work with incremental compilation or ` x.py check ` ,
71+ you will be compiling rustc _ a lot_ .
72+ I recommend changing a few settings in ` config.toml ` to make it bearable:
73+ ```
74+ [rust]
75+ # A debug build takes _a fourth_ as long on my machine,
76+ # but compiling more than stage0 rustc becomes unbearably slow.
77+ optimize = false
78+
79+ # We can't use incremental anyway, so we disable it for a little speed boost.
80+ incremental = false
81+ # We won't be running it, so no point in compiling debug checks.
82+ debug = false
83+
84+ # Caution: This changes the output of llvm-lines.
85+ # Using a single codegen unit gives more accurate output, but is slower to compile.
86+ # Changing it to the number of cores on my machine increased the output
87+ # from 3.5GB to 4.1GB and decreased compile times from 5½ min to 4 min.
88+ codegen-units = 1
89+ #codegen-units = 0 # num_cpus
90+ ```
91+
92+ What I'm still not sure about is if inlining in MIR optimizations affect llvm-lines.
93+ The output with ` -Zmir-opt-level=0 ` and ` -Zmir-opt-level=1 ` is the same,
94+ but it feels like that some functions that show up at the top should be to small
95+ to have such a high impact. Inlining should only happens in LLVM though.
0 commit comments