Skip to content

Conversation

@cstyan
Copy link
Contributor

@cstyan cstyan commented Nov 8, 2025

I was poking around in profiling data for our dogfood instance and I noticed that over a 24h period debug logging through the human readable log sink was producing 10-12% of our memory allocation. Garbage collection is only ~7.5% of our CPU time, but hopefully this has a nice reduction for allocs/memory allocated and small reduction for GC related CPU usage.

Benchmark data:

goos: darwin
goarch: arm64
pkg: cdr.dev/slog/sloggers/sloghuman
cpu: Apple M4 Pro
                                          │   original   │                 new                 │
                                          │    sec/op    │   sec/op     vs base                │
HumanSinkLogEntry_SingleLine-12              843.2n ± 2%   446.4n ± 1%  -47.06% (p=0.000 n=10)
HumanSinkLogEntry_MultilineField_Small-12   1291.0n ± 7%   675.3n ± 2%  -47.69% (p=0.000 n=10)
HumanSinkLogEntry_MultilineField_Large-12    48.05µ ± 2%   27.36µ ± 0%  -43.07% (p=0.000 n=10)
HumanSinkLogEntry_MultilineMessage-12        833.5n ± 1%   390.3n ± 2%  -53.18% (p=0.000 n=10)
Fmt_SingleLine-12                            540.8n ± 1%   438.9n ± 2%  -18.84% (p=0.000 n=10)
Fmt_MultilineField_Small-12                  763.7n ± 2%   552.0n ± 2%  -27.72% (p=0.000 n=10)
Fmt_MultilineField_Large-12                  35.38µ ± 1%   17.17µ ± 1%  -51.48% (p=0.000 n=10)
Fmt_MultilineMessage-12                      465.2n ± 1%   322.8n ± 1%  -30.60% (p=0.000 n=10)
Fmt_WithNames-12                             515.3n ± 1%   524.1n ± 2%   +1.70% (p=0.006 n=10)
Fmt_WithSpan-12                              342.4n ± 1%   302.6n ± 2%  -11.64% (p=0.000 n=10)
Fmt_WithValidSpan-12                         907.3n ± 1%   559.1n ± 3%  -38.38% (p=0.000 n=10)
geomean                                      1.420µ        916.5n       -35.47%

                                          │   original    │                 new                  │
                                          │     B/op      │     B/op      vs base                │
HumanSinkLogEntry_SingleLine-12               4253.0 ± 0%     104.0 ± 0%  -97.55% (p=0.000 n=10)
HumanSinkLogEntry_MultilineField_Small-12     4662.0 ± 0%     152.0 ± 0%  -96.74% (p=0.000 n=10)
HumanSinkLogEntry_MultilineField_Large-12   51.782Ki ± 0%   9.332Ki ± 0%  -81.98% (p=0.000 n=10)
HumanSinkLogEntry_MultilineMessage-12        4387.00 ± 0%     80.00 ± 0%  -98.18% (p=0.000 n=10)
Fmt_SingleLine-12                              264.0 ± 0%     216.0 ± 0%  -18.18% (p=0.000 n=10)
Fmt_MultilineField_Small-12                    864.0 ± 0%     648.0 ± 0%  -25.00% (p=0.000 n=10)
Fmt_MultilineField_Large-12                  63.87Ki ± 0%   41.33Ki ± 0%  -35.28% (p=0.000 n=10)
Fmt_MultilineMessage-12                        528.0 ± 0%     320.0 ± 0%  -39.39% (p=0.000 n=10)
Fmt_WithNames-12                               312.0 ± 0%     240.0 ± 0%  -23.08% (p=0.000 n=10)
Fmt_WithSpan-12                                208.0 ± 0%     168.0 ± 0%  -19.23% (p=0.000 n=10)
Fmt_WithValidSpan-12                           816.0 ± 0%     424.0 ± 0%  -48.04% (p=0.000 n=10)
geomean                                      1.942Ki          490.1       -75.36%

                                          │   original    │                new                 │
                                          │   allocs/op   │ allocs/op   vs base                │
HumanSinkLogEntry_SingleLine-12                19.00 ± 0%   12.00 ± 0%  -36.84% (p=0.000 n=10)
HumanSinkLogEntry_MultilineField_Small-12     23.000 ± 0%   9.000 ± 0%  -60.87% (p=0.000 n=10)
HumanSinkLogEntry_MultilineField_Large-12   1013.000 ± 0%   9.000 ± 0%  -99.11% (p=0.000 n=10)
HumanSinkLogEntry_MultilineMessage-12         14.000 ± 0%   5.000 ± 0%  -64.29% (p=0.000 n=10)
Fmt_SingleLine-12                              20.00 ± 0%   14.00 ± 0%  -30.00% (p=0.000 n=10)
Fmt_MultilineField_Small-12                    25.00 ± 0%   13.00 ± 0%  -48.00% (p=0.000 n=10)
Fmt_MultilineField_Large-12                  1015.00 ± 0%   19.00 ± 0%  -98.13% (p=0.000 n=10)
Fmt_MultilineMessage-12                       16.000 ± 0%   8.000 ± 0%  -50.00% (p=0.000 n=10)
Fmt_WithNames-12                               16.00 ± 0%   17.00 ± 0%   +6.25% (p=0.000 n=10)
Fmt_WithSpan-12                               12.000 ± 0%   9.000 ± 0%  -25.00% (p=0.000 n=10)
Fmt_WithValidSpan-12                           27.00 ± 0%   18.00 ± 0%  -33.33% (p=0.000 n=10)
geomean                                        38.31        11.26       -70.60%

Note that the existing benchmark didn't cover many use cases, so the changes are not as obvious there:

goos: darwin
goarch: arm64
pkg: cdr.dev/slog/internal/entryhuman
cpu: Apple M4 Pro
                     │ orig_entry  │              new_entry              │
                     │   sec/op    │   sec/op     vs base                │
Fmt/Colored-nf=1-12    4.462µ ± 1%   4.421µ ± 1%   -0.93% (p=0.003 n=10)
Fmt/Colored-nf=4-12    10.87µ ± 1%   10.92µ ± 1%        ~ (p=0.255 n=10)
Fmt/Colored-nf=16-12   36.66µ ± 2%   36.80µ ± 1%        ~ (p=0.529 n=10)
Fmt/nf=1-12            370.2n ± 1%   321.4n ± 0%  -13.19% (p=0.000 n=10)
Fmt/nf=4-12            897.8n ± 1%   836.4n ± 1%   -6.83% (p=0.000 n=10)
Fmt/nf=16-12           2.871µ ± 2%   2.823µ ± 1%   -1.67% (p=0.009 n=10)
geomean                3.453µ        3.324µ        -3.75%

                     │  orig_entry  │              new_entry               │
                     │     B/op     │     B/op      vs base                │
Fmt/Colored-nf=1-12    1.986Ki ± 0%   1.947Ki ± 0%   -1.97% (p=0.000 n=10)
Fmt/Colored-nf=4-12    4.599Ki ± 0%   4.560Ki ± 0%   -0.85% (p=0.000 n=10)
Fmt/Colored-nf=16-12   14.05Ki ± 0%   14.01Ki ± 0%   -0.28% (p=0.000 n=10)
Fmt/nf=1-12              224.0 ± 0%     184.0 ± 0%  -17.86% (p=0.000 n=10)
Fmt/nf=4-12              496.0 ± 0%     456.0 ± 0%   -8.06% (p=0.000 n=10)
Fmt/nf=16-12           1.297Ki ± 0%   1.258Ki ± 0%   -3.01% (p=0.000 n=10)
geomean                1.613Ki        1.524Ki        -5.55%

                     │ orig_entry  │              new_entry              │
                     │  allocs/op  │  allocs/op   vs base                │
Fmt/Colored-nf=1-12     122.0 ± 0%    119.0 ± 0%   -2.46% (p=0.000 n=10)
Fmt/Colored-nf=4-12     303.0 ± 0%    300.0 ± 0%   -0.99% (p=0.000 n=10)
Fmt/Colored-nf=16-12   1.024k ± 0%   1.021k ± 0%   -0.29% (p=0.000 n=10)
Fmt/nf=1-12            12.000 ± 0%    9.000 ± 0%  -25.00% (p=0.000 n=10)
Fmt/nf=4-12             31.00 ± 0%    28.00 ± 0%   -9.68% (p=0.000 n=10)
Fmt/nf=16-12            104.0 ± 0%    101.0 ± 0%   -2.88% (p=0.000 n=10)
geomean                 106.6         98.76        -7.33%

Current alloc space:
Screenshot 2025-11-07 at 4 36 02 PM

Current alloc object:
Screenshot 2025-11-07 at 4 36 13 PM

Alloc objects path:
Screenshot 2025-11-07 at 4 36 54 PM

Alloc space path:
Screenshot 2025-11-07 at 4 37 22 PM

GC CPU time:
Screenshot 2025-11-07 at 4 57 30 PM

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
…e the optimized version should return an error

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@coveralls
Copy link

coveralls commented Nov 8, 2025

Pull Request Test Coverage Report for Build bb87268ba9bc796f570dc37547e5f0281129ddf5-PR-220

Details

  • 144 of 162 (88.89%) changed or added relevant lines in 3 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-1.1%) to 95.699%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sloggers/sloghuman/sloghuman.go 26 29 89.66%
internal/entryhuman/entry.go 117 132 88.64%
Files with Coverage Reduction New Missed Lines %
internal/entryhuman/entry.go 1 88.28%
Totals Coverage Status
Change from base Build 9df5e0a6c14572480711c6c06dd26ceabe32ad72: -1.1%
Covered Lines: 890
Relevant Lines: 930

💛 - Coveralls

@cstyan cstyan changed the title Callum/human perf perf: human log sink performance bump via reduction of allocs/total allocated memory Nov 10, 2025
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
…ne very specific case that should never happen

if we're actually going to OOM via logging it would panic

Signed-off-by: Callum Styan <callumstyan@gmail.com>
… files for new test cases

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Comment on lines -108 to -113
level := ent.Level.String()
level = strings.ToLower(level)
if len(level) > 4 {
level = level[:4]
}
level = "[" + level + "]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow this old code was kinda dumb in comparison.

Comment on lines +93 to +110
func bracketedLevel(l slog.Level) string {
switch l {
case slog.LevelDebug:
return "[debu]"
case slog.LevelInfo:
return "[info]"
case slog.LevelWarn:
return "[warn]"
case slog.LevelError:
return "[erro]"
case slog.LevelCritical:
return "[crit]"
case slog.LevelFatal:
return "[fata]"
default:
return "[unkn]"
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Signed-off-by: Callum Styan <callumstyan@gmail.com>
…only saves ~1 byte per op)

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Copy link

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome result 💪
Only nits

Comment on lines 128 to 143
case int:
var a [20]byte
_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))
return true, err
case int8:
var a [20]byte
_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))
return true, err
case int16:
var a [20]byte
_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))
return true, err
case int32:
var a [20]byte
_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))
return true, err

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: DRY

Suggested change
case int:
var a [20]byte
_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))
return true, err
case int8:
var a [20]byte
_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))
return true, err
case int16:
var a [20]byte
_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))
return true, err
case int32:
var a [20]byte
_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))
return true, err
case int, int8, int16, int32:
var a [20]byte
_, err := w.Write(strconv.AppendInt(a[:0], int64(x), 10))
return true, err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately we can't do it quite like this, the compiler still treats each type as distinct even if they're all in the same case like this, which means we'd either have to type assert OR reflect within the case itself

we can at least move the byte slice declaration, write call, and return, into reusable writeSignedInt and writeUnsignedInt functions

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan cstyan merged commit 40ff199 into main Nov 20, 2025
1 check passed
@cstyan cstyan deleted the callum/human-perf branch November 20, 2025 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants