Experiment: New fmt::Arguments implementation (another one) #148789

m-ou-se · 2025-11-10T14:30:15Z

Alternative to #148529

This is yet another an experimental new implementation of fmt::Arguments. In this implementation, fmt::Arguments is only two pointers in size. (Instead of six, currently.) This makes it the same size as a &str and makes it fit in a register pair.

Unlike #148529, this implementation stores all static information as just a single (byte) string, without any indirection:

code:

format_args!("Hello, {name:-^20}!")

lowering before:

fmt::Arguments::new_v1_formatted(
    &["Hello, ", "!\n"],
    &args,
    &[
        Placeholder {
            position: 0usize,
            flags: 3355443245u32,
            precision: format_count::Implied,
            width: format_count::Is(20u16),
        },
    ],
)

lowering in #148529:

format_arguments::new(
    &[
        Piece::num(7usize),
        Piece::str("Hello, "),
        Piece::num(14411519000859136000usize),
        Piece::num(2usize),
        Piece::str("!\n"),
        Piece::num(0usize),
    ],
    &args,
)

lowering in this PR:

fmt::Arguments::new(
    b"\x07Hello, \x83-\x00\x00\xc8\x14\x00\x02!\n\x00",
    &args,
)

This saves a ton of pointers and simplifies the expansion significantly, but does mean that individual pieces (e.g. "Hello, " and "!\n") cannot be reused.

Like #148529, this fmt::Arguments can store a &'static str without any indirection or additional storage. This means that simple cases like print_fmt(format_args!("hello")) are now just as efficient for the caller as print_str("hello"), as shown by this example:

code:

fn main() {
    println!("Hello, world!");
}

before:

main:
 sub     rsp, 56
 lea     rax, [rip + .Lanon_hello_world]
 mov     qword ptr [rsp + 8], rax
 mov     qword ptr [rsp + 16], 1
 mov     qword ptr [rsp + 24], 8
 xorps   xmm0, xmm0
 movups  xmmword ptr [rsp + 32], xmm0
 lea     rdi, [rsp + 8]
 call    qword ptr [rip + std::io::stdio::_print]
 add     rsp, 56
 ret

after:

main:
 lea     rsi, [rip + .Lanon_hello_world]
 mov     edi, 29
 jmp     qword ptr [rip + std::io::stdio::_print]

Similarly, panic!("Hello, world!"); shows the same change.

To do:

Performance testing
Documentation / comments

m-ou-se · 2025-11-10T14:33:10Z

@bors try @rust-timer queue

Experiment: New fmt::Arguments implementation (another one)

rust-log-analyzer · 2025-11-10T15:14:23Z

The job x86_64-gnu-gcc failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)

---- [run-make] tests/run-make/symbol-mangling-hashed stdout ----

error: rmake recipe failed to complete
status: exit status: 101
command: cd "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/symbol-mangling-hashed/rmake_out" && env -u RUSTFLAGS -u __RUSTC_DEBUG_ASSERTIONS_ENABLED -u __STD_DEBUG_ASSERTIONS_ENABLED AR="ar" BUILD_ROOT="/checkout/obj/build/x86_64-unknown-linux-gnu" CC="cc" CC_DEFAULT_FLAGS="-ffunction-sections -fdata-sections -fPIC -m64" CXX="c++" CXX_DEFAULT_FLAGS="-ffunction-sections -fdata-sections -fPIC -m64" HOST_RUSTC_DYLIB_PATH="/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" LD_LIBRARY_PATH="/checkout/obj/build/x86_64-unknown-linux-gnu/bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/lib/rustlib/x86_64-unknown-linux-gnu/lib" LD_LIB_PATH_ENVVAR="LD_LIBRARY_PATH" LLVM_BIN_DIR="/checkout/obj/build/x86_64-unknown-linux-gnu/ci-llvm/bin" LLVM_COMPONENTS="aarch64 aarch64asmparser aarch64codegen aarch64desc aarch64disassembler aarch64info aarch64utils aggressiveinstcombine all all-targets amdgpu amdgpuasmparser amdgpucodegen amdgpudesc amdgpudisassembler amdgpuinfo amdgputargetmca amdgpuutils analysis arm armasmparser armcodegen armdesc armdisassembler arminfo armutils asmparser asmprinter avr avrasmparser avrcodegen avrdesc avrdisassembler avrinfo binaryformat bitreader bitstreamreader bitwriter bpf bpfasmparser bpfcodegen bpfdesc bpfdisassembler bpfinfo cfguard cgdata codegen codegentypes core coroutines coverage csky cskyasmparser cskycodegen cskydesc cskydisassembler cskyinfo debuginfobtf debuginfocodeview debuginfodwarf debuginfodwarflowlevel debuginfogsym debuginfologicalview debuginfomsf debuginfopdb demangle dlltooldriver dwarfcfichecker dwarflinker dwarflinkerclassic dwarflinkerparallel dwp engine executionengine extensions filecheck frontendatomic frontenddirective frontenddriver frontendhlsl frontendoffloading frontendopenacc frontendopenmp fuzzercli fuzzmutate globalisel hexagon hexagonasmparser hexagoncodegen hexagondesc hexagondisassembler hexagoninfo hipstdpar instcombine instrumentation interfacestub interpreter ipo irprinter irreader jitlink libdriver lineeditor linker loongarch loongarchasmparser loongarchcodegen loongarchdesc loongarchdisassembler loongarchinfo lto m68k m68kasmparser m68kcodegen m68kdesc m68kdisassembler m68kinfo mc mca mcdisassembler mcjit mcparser mips mipsasmparser mipscodegen mipsdesc mipsdisassembler mipsinfo mirparser msp430 msp430asmparser msp430codegen msp430desc msp430disassembler msp430info native nativecodegen nvptx nvptxcodegen nvptxdesc nvptxinfo objcarcopts objcopy object objectyaml option orcdebugging orcjit orcshared orctargetprocess passes powerpc powerpcasmparser powerpccodegen powerpcdesc powerpcdisassembler powerpcinfo profiledata remarks riscv riscvasmparser riscvcodegen riscvdesc riscvdisassembler riscvinfo riscvtargetmca runtimedyld sandboxir scalaropts selectiondag sparc sparcasmparser sparccodegen sparcdesc sparcdisassembler sparcinfo support symbolize systemz systemzasmparser systemzcodegen systemzdesc systemzdisassembler systemzinfo tablegen target targetparser telemetry textapi textapibinaryreader transformutils vectorize webassembly webassemblyasmparser webassemblycodegen webassemblydesc webassemblydisassembler webassemblyinfo webassemblyutils windowsdriver windowsmanifest x86 x86asmparser x86codegen x86desc x86disassembler x86info x86targetmca xray xtensa xtensaasmparser xtensacodegen xtensadesc xtensadisassembler xtensainfo" LLVM_FILECHECK="/checkout/obj/build/x86_64-unknown-linux-gnu/ci-llvm/bin/FileCheck" PYTHON="/usr/bin/python3" RUSTC="/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" RUSTDOC="/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustdoc" SOURCE_ROOT="/checkout" TARGET="x86_64-unknown-linux-gnu" TARGET_EXE_DYLIB_PATH="/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/symbol-mangling-hashed/rmake"
--- stdout -------------------------------
checking dylib `libhashed_dylib.so`
------------------------------------------
--- stderr -------------------------------
exported dynamic symbols: [
    "_RNxC12hashed_dylib12HcerJokNWwiK",
    "rust_metadata_hashed_dylib_a4dc49c23f628cc7",
]

thread 'main' (57383) panicked at /checkout/tests/run-make/symbol-mangling-hashed/rmake.rs:66:13:
expected two dynamic symbols starting with `_RNxC12hashed_dylib`
stack backtrace:
   0: __rustc::rust_begin_unwind
             at /rustc/3b4dd9bf1410f8da6329baa36ce5e37673cbbd1f/library/std/src/panicking.rs:698:5
   1: core::panicking::panic_fmt
             at /rustc/3b4dd9bf1410f8da6329baa36ce5e37673cbbd1f/library/core/src/panicking.rs:80:14

For more information how to resolve CI failures of this job, visit this link.

rust-bors · 2025-11-10T16:53:36Z

☀️ Try build successful (CI)
Build commit: 6e6ba94 (6e6ba949d24fbfbd9cd48ca4c98adf59fbd04482, parent: a7b3715826827677ca8769eb88dc8052f43e734b)

rust-timer · 2025-11-10T18:13:06Z

Finished benchmarking commit (6e6ba94): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.7%	[0.1%, 5.8%]	26
Regressions ❌ (secondary)	0.6%	[0.1%, 1.3%]	44
Improvements ✅ (primary)	-0.7%	[-4.3%, -0.1%]	109
Improvements ✅ (secondary)	-1.7%	[-38.2%, -0.0%]	93
All ❌✅ (primary)	-0.5%	[-4.3%, 5.8%]	135

Max RSS (memory usage)

Results (primary -1.5%, secondary -0.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	2.2%	[2.2%, 2.2%]	1
Regressions ❌ (secondary)	3.7%	[1.0%, 6.7%]	12
Improvements ✅ (primary)	-1.6%	[-6.0%, -0.5%]	31
Improvements ✅ (secondary)	-2.6%	[-7.9%, -0.7%]	25
All ❌✅ (primary)	-1.5%	[-6.0%, 2.2%]	32

Cycles

Results (primary -0.5%, secondary -4.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	4.8%	[3.4%, 6.2%]	2
Regressions ❌ (secondary)	8.8%	[2.6%, 18.8%]	6
Improvements ✅ (primary)	-3.1%	[-5.0%, -2.1%]	4
Improvements ✅ (secondary)	-10.3%	[-39.4%, -2.1%]	13
All ❌✅ (primary)	-0.5%	[-5.0%, 6.2%]	6

Binary size

Results (primary -0.7%, secondary -1.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.0%, 1.4%]	4
Regressions ❌ (secondary)	3.2%	[0.0%, 7.5%]	12
Improvements ✅ (primary)	-0.8%	[-3.3%, -0.0%]	129
Improvements ✅ (secondary)	-1.7%	[-23.6%, -0.0%]	123
All ❌✅ (primary)	-0.7%	[-3.3%, 1.4%]	133

Bootstrap: 476.631s -> 471.922s (-0.99%)
Artifact size: 391.32 MiB -> 388.56 MiB (-0.70%)

m-ou-se · 2025-11-10T18:19:30Z

Ooh that's pretty good :D

m-ou-se · 2025-11-10T19:55:06Z

Pretty much everything looks like a great improvement. Not only number of instructions executed, but also memory usage and binary size. 🎉

Only two significant negative results:

1. "image-0.25.6 opt incr-patched:println" with almost +6% instructions:u.

Looking at the detailed results, it looks like that's all LLVM. Probably because llvm got more optimization opportunities. That's not necessarily a bad thing.

2. The `fmt-write-str` runtime benchmark with over +12% instructions:u.

This could be concerning, but I can't seem to fully replicate it locally.

If I recompile and run this benchmark 100 times in both nightly and with this PR, I do get this interesting result though:

With the nightly compiler, the results vary, with many measurements clustered close to 25ms but also many around 40ms. With this PR, the results are very consistent, all clustered around 27ms.

So, the median result is worse, but the average is better.

My guess is that the indirection (a slice of string slices) can make things unpredictable, as the strings aren't always in the optimal place for caching. The lack of indirection in the new version then makes it much more predictable. This is just a guess though.

m-ou-se added 4 commits November 10, 2025 15:08

Expose expr_unsafe in LoweringContext.

565aec6

New format_args!()+fmt::Arguments implementation.

ccb9972

Bless tests.

cf43ea3

Make clippy happy.

9f41692

m-ou-se self-assigned this Nov 10, 2025

m-ou-se added the A-fmt Area: `core::fmt` label Nov 10, 2025

This comment has been minimized.

Sign in to view

rust-bors bot added a commit that referenced this pull request Nov 10, 2025

Auto merge of #148789 - m-ou-se:new-fmt-args-alt, r=<try>

6e6ba94

Experiment: New fmt::Arguments implementation (another one)

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 10, 2025

m-ou-se mentioned this pull request Nov 10, 2025

Tracking issue for improving std::fmt::Arguments and format_args!() #99012

Open

57 tasks

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Nov 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experiment: New fmt::Arguments implementation (another one) #148789

Experiment: New fmt::Arguments implementation (another one) #148789

Uh oh!

m-ou-se commented Nov 10, 2025 •

edited

Loading

Uh oh!

m-ou-se commented Nov 10, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-log-analyzer commented Nov 10, 2025

Uh oh!

rust-bors bot commented Nov 10, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Nov 10, 2025

Uh oh!

m-ou-se commented Nov 10, 2025

Uh oh!

m-ou-se commented Nov 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Experiment: New fmt::Arguments implementation (another one) #148789

Are you sure you want to change the base?

Experiment: New fmt::Arguments implementation (another one) #148789

Uh oh!

Conversation

m-ou-se commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-ou-se commented Nov 10, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-log-analyzer commented Nov 10, 2025

Uh oh!

rust-bors bot commented Nov 10, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Nov 10, 2025

Overall result: ❌✅ regressions and improvements - please read the text below

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Uh oh!

m-ou-se commented Nov 10, 2025

Uh oh!

m-ou-se commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. "image-0.25.6 opt incr-patched:println" with almost +6% instructions:u.

2. The fmt-write-str runtime benchmark with over +12% instructions:u.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

m-ou-se commented Nov 10, 2025 •

edited

Loading

m-ou-se commented Nov 10, 2025 •

edited

Loading

2. The `fmt-write-str` runtime benchmark with over +12% instructions:u.