perf: human log sink performance bump via reduction of allocs/total allocated memory #220

cstyan · 2025-11-08T02:01:33Z

I was poking around in profiling data for our dogfood instance and I noticed that over a 24h period debug logging through the human readable log sink was producing 10-12% of our memory allocation. Garbage collection is only ~7.5% of our CPU time, but hopefully this has a nice reduction for allocs/memory allocated and small reduction for GC related CPU usage.

Benchmark data:

goos: darwin
goarch: arm64
pkg: cdr.dev/slog/sloggers/sloghuman
cpu: Apple M4 Pro
                                          │   original   │                 new                 │
                                          │    sec/op    │   sec/op     vs base                │
HumanSinkLogEntry_SingleLine-12              843.2n ± 2%   446.4n ± 1%  -47.06% (p=0.000 n=10)
HumanSinkLogEntry_MultilineField_Small-12   1291.0n ± 7%   675.3n ± 2%  -47.69% (p=0.000 n=10)
HumanSinkLogEntry_MultilineField_Large-12    48.05µ ± 2%   27.36µ ± 0%  -43.07% (p=0.000 n=10)
HumanSinkLogEntry_MultilineMessage-12        833.5n ± 1%   390.3n ± 2%  -53.18% (p=0.000 n=10)
Fmt_SingleLine-12                            540.8n ± 1%   438.9n ± 2%  -18.84% (p=0.000 n=10)
Fmt_MultilineField_Small-12                  763.7n ± 2%   552.0n ± 2%  -27.72% (p=0.000 n=10)
Fmt_MultilineField_Large-12                  35.38µ ± 1%   17.17µ ± 1%  -51.48% (p=0.000 n=10)
Fmt_MultilineMessage-12                      465.2n ± 1%   322.8n ± 1%  -30.60% (p=0.000 n=10)
Fmt_WithNames-12                             515.3n ± 1%   524.1n ± 2%   +1.70% (p=0.006 n=10)
Fmt_WithSpan-12                              342.4n ± 1%   302.6n ± 2%  -11.64% (p=0.000 n=10)
Fmt_WithValidSpan-12                         907.3n ± 1%   559.1n ± 3%  -38.38% (p=0.000 n=10)
geomean                                      1.420µ        916.5n       -35.47%

                                          │   original    │                 new                  │
                                          │     B/op      │     B/op      vs base                │
HumanSinkLogEntry_SingleLine-12               4253.0 ± 0%     104.0 ± 0%  -97.55% (p=0.000 n=10)
HumanSinkLogEntry_MultilineField_Small-12     4662.0 ± 0%     152.0 ± 0%  -96.74% (p=0.000 n=10)
HumanSinkLogEntry_MultilineField_Large-12   51.782Ki ± 0%   9.332Ki ± 0%  -81.98% (p=0.000 n=10)
HumanSinkLogEntry_MultilineMessage-12        4387.00 ± 0%     80.00 ± 0%  -98.18% (p=0.000 n=10)
Fmt_SingleLine-12                              264.0 ± 0%     216.0 ± 0%  -18.18% (p=0.000 n=10)
Fmt_MultilineField_Small-12                    864.0 ± 0%     648.0 ± 0%  -25.00% (p=0.000 n=10)
Fmt_MultilineField_Large-12                  63.87Ki ± 0%   41.33Ki ± 0%  -35.28% (p=0.000 n=10)
Fmt_MultilineMessage-12                        528.0 ± 0%     320.0 ± 0%  -39.39% (p=0.000 n=10)
Fmt_WithNames-12                               312.0 ± 0%     240.0 ± 0%  -23.08% (p=0.000 n=10)
Fmt_WithSpan-12                                208.0 ± 0%     168.0 ± 0%  -19.23% (p=0.000 n=10)
Fmt_WithValidSpan-12                           816.0 ± 0%     424.0 ± 0%  -48.04% (p=0.000 n=10)
geomean                                      1.942Ki          490.1       -75.36%

                                          │   original    │                new                 │
                                          │   allocs/op   │ allocs/op   vs base                │
HumanSinkLogEntry_SingleLine-12                19.00 ± 0%   12.00 ± 0%  -36.84% (p=0.000 n=10)
HumanSinkLogEntry_MultilineField_Small-12     23.000 ± 0%   9.000 ± 0%  -60.87% (p=0.000 n=10)
HumanSinkLogEntry_MultilineField_Large-12   1013.000 ± 0%   9.000 ± 0%  -99.11% (p=0.000 n=10)
HumanSinkLogEntry_MultilineMessage-12         14.000 ± 0%   5.000 ± 0%  -64.29% (p=0.000 n=10)
Fmt_SingleLine-12                              20.00 ± 0%   14.00 ± 0%  -30.00% (p=0.000 n=10)
Fmt_MultilineField_Small-12                    25.00 ± 0%   13.00 ± 0%  -48.00% (p=0.000 n=10)
Fmt_MultilineField_Large-12                  1015.00 ± 0%   19.00 ± 0%  -98.13% (p=0.000 n=10)
Fmt_MultilineMessage-12                       16.000 ± 0%   8.000 ± 0%  -50.00% (p=0.000 n=10)
Fmt_WithNames-12                               16.00 ± 0%   17.00 ± 0%   +6.25% (p=0.000 n=10)
Fmt_WithSpan-12                               12.000 ± 0%   9.000 ± 0%  -25.00% (p=0.000 n=10)
Fmt_WithValidSpan-12                           27.00 ± 0%   18.00 ± 0%  -33.33% (p=0.000 n=10)
geomean                                        38.31        11.26       -70.60%

Note that the existing benchmark didn't cover many use cases, so the changes are not as obvious there:

goos: darwin
goarch: arm64
pkg: cdr.dev/slog/internal/entryhuman
cpu: Apple M4 Pro
                     │ orig_entry  │              new_entry              │
                     │   sec/op    │   sec/op     vs base                │
Fmt/Colored-nf=1-12    4.462µ ± 1%   4.421µ ± 1%   -0.93% (p=0.003 n=10)
Fmt/Colored-nf=4-12    10.87µ ± 1%   10.92µ ± 1%        ~ (p=0.255 n=10)
Fmt/Colored-nf=16-12   36.66µ ± 2%   36.80µ ± 1%        ~ (p=0.529 n=10)
Fmt/nf=1-12            370.2n ± 1%   321.4n ± 0%  -13.19% (p=0.000 n=10)
Fmt/nf=4-12            897.8n ± 1%   836.4n ± 1%   -6.83% (p=0.000 n=10)
Fmt/nf=16-12           2.871µ ± 2%   2.823µ ± 1%   -1.67% (p=0.009 n=10)
geomean                3.453µ        3.324µ        -3.75%

                     │  orig_entry  │              new_entry               │
                     │     B/op     │     B/op      vs base                │
Fmt/Colored-nf=1-12    1.986Ki ± 0%   1.947Ki ± 0%   -1.97% (p=0.000 n=10)
Fmt/Colored-nf=4-12    4.599Ki ± 0%   4.560Ki ± 0%   -0.85% (p=0.000 n=10)
Fmt/Colored-nf=16-12   14.05Ki ± 0%   14.01Ki ± 0%   -0.28% (p=0.000 n=10)
Fmt/nf=1-12              224.0 ± 0%     184.0 ± 0%  -17.86% (p=0.000 n=10)
Fmt/nf=4-12              496.0 ± 0%     456.0 ± 0%   -8.06% (p=0.000 n=10)
Fmt/nf=16-12           1.297Ki ± 0%   1.258Ki ± 0%   -3.01% (p=0.000 n=10)
geomean                1.613Ki        1.524Ki        -5.55%

                     │ orig_entry  │              new_entry              │
                     │  allocs/op  │  allocs/op   vs base                │
Fmt/Colored-nf=1-12     122.0 ± 0%    119.0 ± 0%   -2.46% (p=0.000 n=10)
Fmt/Colored-nf=4-12     303.0 ± 0%    300.0 ± 0%   -0.99% (p=0.000 n=10)
Fmt/Colored-nf=16-12   1.024k ± 0%   1.021k ± 0%   -0.29% (p=0.000 n=10)
Fmt/nf=1-12            12.000 ± 0%    9.000 ± 0%  -25.00% (p=0.000 n=10)
Fmt/nf=4-12             31.00 ± 0%    28.00 ± 0%   -9.68% (p=0.000 n=10)
Fmt/nf=16-12            104.0 ± 0%    101.0 ± 0%   -2.88% (p=0.000 n=10)
geomean                 106.6         98.76        -7.33%

Current alloc space:

Current alloc object:

Alloc objects path:

Alloc space path:

GC CPU time:

Signed-off-by: Callum Styan <callumstyan@gmail.com>

…e the optimized version should return an error Signed-off-by: Callum Styan <callumstyan@gmail.com>

coveralls · 2025-11-08T02:04:03Z

Pull Request Test Coverage Report for Build bb87268ba9bc796f570dc37547e5f0281129ddf5-PR-220

Details

144 of 162 (88.89%) changed or added relevant lines in 3 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage decreased (-1.1%) to 95.699%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
sloggers/sloghuman/sloghuman.go	26	29	89.66%
internal/entryhuman/entry.go	117	132	88.64%

Files with Coverage Reduction	New Missed Lines	%
internal/entryhuman/entry.go	1	88.28%

Totals
Change from base Build 9df5e0a6c14572480711c6c06dd26ceabe32ad72:	-1.1%
Covered Lines:	890
Relevant Lines:	930

💛 - Coveralls

Signed-off-by: Callum Styan <callumstyan@gmail.com>

…ne very specific case that should never happen if we're actually going to OOM via logging it would panic Signed-off-by: Callum Styan <callumstyan@gmail.com>

… files for new test cases Signed-off-by: Callum Styan <callumstyan@gmail.com>

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Emyrk · 2025-11-14T21:31:44Z

internal/entryhuman/entry.go

-	level := ent.Level.String()
-	level = strings.ToLower(level)
-	if len(level) > 4 {
-		level = level[:4]
-	}
-	level = "[" + level + "]"


wow this old code was kinda dumb in comparison.

Emyrk · 2025-11-14T21:31:59Z

internal/entryhuman/entry.go

+func bracketedLevel(l slog.Level) string {
+	switch l {
+	case slog.LevelDebug:
+		return "[debu]"
+	case slog.LevelInfo:
+		return "[info]"
+	case slog.LevelWarn:
+		return "[warn]"
+	case slog.LevelError:
+		return "[erro]"
+	case slog.LevelCritical:
+		return "[crit]"
+	case slog.LevelFatal:
+		return "[fata]"
+	default:
+		return "[unkn]"
+	}
+}


internal/entryhuman/entry.go

internal/entryhuman/out.txt

Signed-off-by: Callum Styan <callumstyan@gmail.com>

…only saves ~1 byte per op) Signed-off-by: Callum Styan <callumstyan@gmail.com>

dannykopping

Awesome result 💪
Only nits

dannykopping · 2025-11-20T08:35:30Z

internal/entryhuman/entry.go

+	case int:
+		var a [20]byte
+		_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))
+		return true, err
+	case int8:
+		var a [20]byte
+		_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))
+		return true, err
+	case int16:
+		var a [20]byte
+		_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))
+		return true, err
+	case int32:
+		var a [20]byte
+		_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))
+		return true, err


Nit: DRY

Suggested change

case int:

var a [20]byte

_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))

return true, err

case int8:

var a [20]byte

_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))

return true, err

case int16:

var a [20]byte

_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))

return true, err

case int32:

var a [20]byte

_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))

return true, err

case int, int8, int16, int32:

var a [20]byte

_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))

return true, err

unfortunately we can't do it quite like this, the compiler still treats each type as distinct even if they're all in the same case like this, which means we'd either have to type assert OR reflect within the case itself

we can at least move the byte slice declaration, write call, and return, into reusable writeSignedInt and writeUnsignedInt functions

internal/entryhuman/entry.go

internal/entryhuman/entry_test.go

Signed-off-by: Callum Styan <callumstyan@gmail.com>

cstyan added 3 commits November 7, 2025 16:44

add benchmarks

b05c3fd

Signed-off-by: Callum Styan <callumstyan@gmail.com>

optimizations to entryhuman Fmt and sloghuman LogEntry

873ed35

Signed-off-by: Callum Styan <callumstyan@gmail.com>

cleanup + leave option of keeping existing non-optimized version sinc…

5816218

…e the optimized version should return an error Signed-off-by: Callum Styan <callumstyan@gmail.com>

cstyan changed the title ~~Callum/human perf~~ perf: human log sink performance bump via reduction of allocs/total allocated memory Nov 10, 2025

cstyan added 5 commits November 12, 2025 16:10

fix/update tests

6d17803

Signed-off-by: Callum Styan <callumstyan@gmail.com>

2 more test cases

0ca41cc

Signed-off-by: Callum Styan <callumstyan@gmail.com>

simplify OptimizedFmt, in reality a bytes.Buffer will only error in o…

bbd432e

…ne very specific case that should never happen if we're actually going to OOM via logging it would panic Signed-off-by: Callum Styan <callumstyan@gmail.com>

clean up; don't keep original Fmt, consolidate tests, generate golden…

ab9bf95

… files for new test cases Signed-off-by: Callum Styan <callumstyan@gmail.com>

fix ci fmt issue

5838eb4

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Emyrk reviewed Nov 14, 2025

View reviewed changes

cstyan added 2 commits November 17, 2025 10:44

remove mistakenly added files related to benchmark outputs

1212760

Signed-off-by: Callum Styan <callumstyan@gmail.com>

add benchmark for many strings case + string case in writeValueFast (…

76ecec2

…only saves ~1 byte per op) Signed-off-by: Callum Styan <callumstyan@gmail.com>

Emyrk approved these changes Nov 19, 2025

View reviewed changes

dannykopping approved these changes Nov 20, 2025

View reviewed changes

address review feedback

034384c

Signed-off-by: Callum Styan <callumstyan@gmail.com>

cstyan merged commit 40ff199 into main Nov 20, 2025
1 check passed

cstyan deleted the callum/human-perf branch November 20, 2025 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: human log sink performance bump via reduction of allocs/total allocated memory #220

perf: human log sink performance bump via reduction of allocs/total allocated memory #220

Uh oh!

cstyan commented Nov 8, 2025

Uh oh!

coveralls commented Nov 8, 2025 •

edited

Loading

Uh oh!

Emyrk Nov 14, 2025

Uh oh!

Emyrk Nov 14, 2025

Uh oh!

Uh oh!

Uh oh!

dannykopping left a comment

Uh oh!

dannykopping Nov 20, 2025

Uh oh!

cstyan Nov 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

perf: human log sink performance bump via reduction of allocs/total allocated memory #220

perf: human log sink performance bump via reduction of allocs/total allocated memory #220

Uh oh!

Conversation

cstyan commented Nov 8, 2025

Uh oh!

coveralls commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build bb87268ba9bc796f570dc37547e5f0281129ddf5-PR-220

Details

💛 - Coveralls

Uh oh!

Emyrk Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Emyrk Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dannykopping left a comment

Choose a reason for hiding this comment

Uh oh!

dannykopping Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

cstyan Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coveralls commented Nov 8, 2025 •

edited

Loading