-
Notifications
You must be signed in to change notification settings - Fork 315
Reduce PendingTrace Lock Contention #9932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hi! 👋 Thanks for your pull request! 🎉 To help us review it, please make sure to:
If you need help, please check our contributing guidelines. |
| } | ||
|
|
||
| private PublishState decrementRefAndMaybeWrite(boolean isRootSpan) { | ||
| private PublishState decrementRefAndMaybeWrite(boolean isRootSpan, boolean addedSpan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, I'm curious what others think of this potential change.
I'm intending to write a microbenchmark to see if I can verify that this change is profitable.
I also think I can write a test verifies the PendingTrace behavior by using a custom writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to me like clever trick. This changes a bit the write dynamic, where the next chance to write is when a new span is added, or when the root span is finished (and the other queueing states). I believe this is good. I've seen some instrumentations like aerospike that explicitly cancel the "continuation", but I don't think this is an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure if this addresses the reported issue.
However, it does cut my macrobenchmark by 2-3%. Given that my macrobenchmark uses @Trace annotations which are rather heavy, I suspect the gains might be larger with typical auto-instrumentation.
|
🎯 Code Coverage 🔗 Commit SHA: 34a6e6b | Docs | Datadog PR Page | Was this helpful? Give us feedback! |
mcculls
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good optimization.
Looking forward to seeing the microbenchmark results, I suspect it will show a positive improvement when there are a lot of context migrations.
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 54 metrics, 11 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.56.0-SNAPSHOT~34a6e6ba5a, baseline=1.56.0-SNAPSHOT~5db793a092
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.051 s) : 0, 1051435
Total [baseline] (8.647 s) : 0, 8647421
Agent [candidate] (1.051 s) : 0, 1051338
Total [candidate] (8.654 s) : 0, 8653531
section iast
Agent [baseline] (1.179 s) : 0, 1178910
Total [baseline] (9.242 s) : 0, 9241889
Agent [candidate] (1.184 s) : 0, 1184126
Total [candidate] (9.264 s) : 0, 9264463
gantt
title insecure-bank - break down per module: candidate=1.56.0-SNAPSHOT~34a6e6ba5a, baseline=1.56.0-SNAPSHOT~5db793a092
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.469 ms) : 0, 1469
crashtracking [candidate] (1.489 ms) : 0, 1489
BytebuddyAgent [baseline] (707.684 ms) : 0, 707684
BytebuddyAgent [candidate] (707.182 ms) : 0, 707182
GlobalTracer [baseline] (246.576 ms) : 0, 246576
GlobalTracer [candidate] (247.091 ms) : 0, 247091
AppSec [baseline] (32.456 ms) : 0, 32456
AppSec [candidate] (32.53 ms) : 0, 32530
Debugger [baseline] (6.421 ms) : 0, 6421
Debugger [candidate] (6.387 ms) : 0, 6387
Remote Config [baseline] (733.97 µs) : 0, 734
Remote Config [candidate] (716.449 µs) : 0, 716
Telemetry [baseline] (13.887 ms) : 0, 13887
Telemetry [candidate] (15.396 ms) : 0, 15396
Flare Poller [baseline] (7.368 ms) : 0, 7368
Flare Poller [candidate] (5.808 ms) : 0, 5808
section iast
crashtracking [baseline] (1.455 ms) : 0, 1455
crashtracking [candidate] (1.487 ms) : 0, 1487
BytebuddyAgent [baseline] (827.925 ms) : 0, 827925
BytebuddyAgent [candidate] (830.608 ms) : 0, 830608
GlobalTracer [baseline] (234.368 ms) : 0, 234368
GlobalTracer [candidate] (235.836 ms) : 0, 235836
IAST [baseline] (33.209 ms) : 0, 33209
IAST [candidate] (32.695 ms) : 0, 32695
AppSec [baseline] (28.068 ms) : 0, 28068
AppSec [candidate] (29.138 ms) : 0, 29138
Debugger [baseline] (6.002 ms) : 0, 6002
Debugger [candidate] (6.092 ms) : 0, 6092
Remote Config [baseline] (604.401 µs) : 0, 604
Remote Config [candidate] (611.474 µs) : 0, 611
Telemetry [baseline] (8.403 ms) : 0, 8403
Telemetry [candidate] (8.635 ms) : 0, 8635
Flare Poller [baseline] (4.105 ms) : 0, 4105
Flare Poller [candidate] (4.134 ms) : 0, 4134
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.56.0-SNAPSHOT~34a6e6ba5a, baseline=1.56.0-SNAPSHOT~5db793a092
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.05 s) : 0, 1049545
Total [baseline] (10.754 s) : 0, 10753829
Agent [candidate] (1.05 s) : 0, 1050413
Total [candidate] (10.811 s) : 0, 10811482
section appsec
Agent [baseline] (1.225 s) : 0, 1224572
Total [baseline] (10.893 s) : 0, 10893117
Agent [candidate] (1.232 s) : 0, 1232414
Total [candidate] (10.93 s) : 0, 10929541
section iast
Agent [baseline] (1.186 s) : 0, 1185735
Total [baseline] (11.187 s) : 0, 11186874
Agent [candidate] (1.18 s) : 0, 1180018
Total [candidate] (11.115 s) : 0, 11115021
section profiling
Agent [baseline] (1.202 s) : 0, 1201995
Total [baseline] (11.013 s) : 0, 11012999
Agent [candidate] (1.194 s) : 0, 1194461
Total [candidate] (10.884 s) : 0, 10883734
gantt
title petclinic - break down per module: candidate=1.56.0-SNAPSHOT~34a6e6ba5a, baseline=1.56.0-SNAPSHOT~5db793a092
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.459 ms) : 0, 1459
crashtracking [candidate] (1.464 ms) : 0, 1464
BytebuddyAgent [baseline] (706.218 ms) : 0, 706218
BytebuddyAgent [candidate] (707.451 ms) : 0, 707451
GlobalTracer [baseline] (246.565 ms) : 0, 246565
GlobalTracer [candidate] (246.384 ms) : 0, 246384
AppSec [baseline] (32.351 ms) : 0, 32351
AppSec [candidate] (32.408 ms) : 0, 32408
Debugger [baseline] (6.398 ms) : 0, 6398
Debugger [candidate] (6.462 ms) : 0, 6462
Remote Config [baseline] (729.641 µs) : 0, 730
Remote Config [candidate] (707.913 µs) : 0, 708
Telemetry [baseline] (12.97 ms) : 0, 12970
Telemetry [candidate] (14.98 ms) : 0, 14980
Flare Poller [baseline] (8.063 ms) : 0, 8063
Flare Poller [candidate] (5.725 ms) : 0, 5725
section appsec
crashtracking [baseline] (1.461 ms) : 0, 1461
crashtracking [candidate] (1.48 ms) : 0, 1480
BytebuddyAgent [baseline] (730.872 ms) : 0, 730872
BytebuddyAgent [candidate] (735.7 ms) : 0, 735700
GlobalTracer [baseline] (238.227 ms) : 0, 238227
GlobalTracer [candidate] (239.942 ms) : 0, 239942
AppSec [baseline] (175.019 ms) : 0, 175019
AppSec [candidate] (175.572 ms) : 0, 175572
Debugger [baseline] (5.998 ms) : 0, 5998
Debugger [candidate] (6.088 ms) : 0, 6088
Remote Config [baseline] (666.004 µs) : 0, 666
Remote Config [candidate] (658.835 µs) : 0, 659
Telemetry [baseline] (8.48 ms) : 0, 8480
Telemetry [candidate] (8.64 ms) : 0, 8640
Flare Poller [baseline] (3.989 ms) : 0, 3989
Flare Poller [candidate] (4.081 ms) : 0, 4081
IAST [baseline] (24.911 ms) : 0, 24911
IAST [candidate] (25.111 ms) : 0, 25111
section iast
crashtracking [baseline] (1.467 ms) : 0, 1467
crashtracking [candidate] (1.454 ms) : 0, 1454
BytebuddyAgent [baseline] (832.899 ms) : 0, 832899
BytebuddyAgent [candidate] (828.225 ms) : 0, 828225
GlobalTracer [baseline] (235.485 ms) : 0, 235485
GlobalTracer [candidate] (234.989 ms) : 0, 234989
AppSec [baseline] (29.042 ms) : 0, 29042
AppSec [candidate] (29.582 ms) : 0, 29582
Debugger [baseline] (6.058 ms) : 0, 6058
Debugger [candidate] (5.987 ms) : 0, 5987
Remote Config [baseline] (604.672 µs) : 0, 605
Remote Config [candidate] (601.47 µs) : 0, 601
Telemetry [baseline] (8.506 ms) : 0, 8506
Telemetry [candidate] (8.431 ms) : 0, 8431
Flare Poller [baseline] (4.177 ms) : 0, 4177
Flare Poller [candidate] (4.058 ms) : 0, 4058
IAST [baseline] (32.55 ms) : 0, 32550
IAST [candidate] (31.847 ms) : 0, 31847
section profiling
ProfilingAgent [baseline] (112.969 ms) : 0, 112969
ProfilingAgent [candidate] (110.715 ms) : 0, 110715
crashtracking [baseline] (1.44 ms) : 0, 1440
crashtracking [candidate] (1.454 ms) : 0, 1454
BytebuddyAgent [baseline] (733.27 ms) : 0, 733270
BytebuddyAgent [candidate] (730.516 ms) : 0, 730516
GlobalTracer [baseline] (223.55 ms) : 0, 223550
GlobalTracer [candidate] (222.256 ms) : 0, 222256
AppSec [baseline] (32.8 ms) : 0, 32800
AppSec [candidate] (32.014 ms) : 0, 32014
Debugger [baseline] (8.41 ms) : 0, 8410
Debugger [candidate] (9.064 ms) : 0, 9064
Remote Config [baseline] (721.739 µs) : 0, 722
Remote Config [candidate] (682.254 µs) : 0, 682
Telemetry [baseline] (14.709 ms) : 0, 14709
Telemetry [candidate] (13.88 ms) : 0, 13880
Flare Poller [baseline] (4.181 ms) : 0, 4181
Flare Poller [candidate] (4.087 ms) : 0, 4087
Profiling [baseline] (113.65 ms) : 0, 113650
Profiling [candidate] (111.383 ms) : 0, 111383
LoadParameters
See matching parameters
SummaryFound 5 performance improvements and 2 performance regressions! Performance is the same for 13 metrics, 16 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.56.0-SNAPSHOT~34a6e6ba5a, baseline=1.56.0-SNAPSHOT~5db793a092
dateFormat X
axisFormat %s
section baseline
no_agent (1.18 ms) : 1168, 1191
. : milestone, 1180,
iast (3.253 ms) : 3206, 3300
. : milestone, 3253,
iast_FULL (5.751 ms) : 5693, 5808
. : milestone, 5751,
iast_GLOBAL (3.566 ms) : 3517, 3616
. : milestone, 3566,
profiling (2.066 ms) : 2045, 2086
. : milestone, 2066,
tracing (1.853 ms) : 1836, 1869
. : milestone, 1853,
section candidate
no_agent (1.176 ms) : 1164, 1187
. : milestone, 1176,
iast (3.159 ms) : 3122, 3196
. : milestone, 3159,
iast_FULL (5.763 ms) : 5706, 5820
. : milestone, 5763,
iast_GLOBAL (3.511 ms) : 3462, 3559
. : milestone, 3511,
profiling (2.005 ms) : 1986, 2025
. : milestone, 2005,
tracing (1.789 ms) : 1774, 1804
. : milestone, 1789,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.56.0-SNAPSHOT~34a6e6ba5a, baseline=1.56.0-SNAPSHOT~5db793a092
dateFormat X
axisFormat %s
section baseline
no_agent (16.92 ms) : 16754, 17085
. : milestone, 16920,
appsec (18.608 ms) : 18420, 18796
. : milestone, 18608,
code_origins (18.543 ms) : 18357, 18728
. : milestone, 18543,
iast (17.773 ms) : 17598, 17949
. : milestone, 17773,
profiling (19.325 ms) : 19131, 19520
. : milestone, 19325,
tracing (18.894 ms) : 18700, 19089
. : milestone, 18894,
section candidate
no_agent (19.187 ms) : 18995, 19380
. : milestone, 19187,
appsec (18.479 ms) : 18293, 18666
. : milestone, 18479,
code_origins (17.527 ms) : 17351, 17703
. : milestone, 17527,
iast (17.95 ms) : 17772, 18128
. : milestone, 17950,
profiling (18.571 ms) : 18382, 18760
. : milestone, 18571,
tracing (17.593 ms) : 17418, 17769
. : milestone, 17593,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.56.0-SNAPSHOT~34a6e6ba5a, baseline=1.56.0-SNAPSHOT~5db793a092
dateFormat X
axisFormat %s
section baseline
no_agent (15.442 s) : 15442000, 15442000
. : milestone, 15442000,
appsec (14.747 s) : 14747000, 14747000
. : milestone, 14747000,
iast (18.96 s) : 18960000, 18960000
. : milestone, 18960000,
iast_GLOBAL (18.127 s) : 18127000, 18127000
. : milestone, 18127000,
profiling (15.542 s) : 15542000, 15542000
. : milestone, 15542000,
tracing (14.833 s) : 14833000, 14833000
. : milestone, 14833000,
section candidate
no_agent (14.924 s) : 14924000, 14924000
. : milestone, 14924000,
appsec (15.132 s) : 15132000, 15132000
. : milestone, 15132000,
iast (18.339 s) : 18339000, 18339000
. : milestone, 18339000,
iast_GLOBAL (17.876 s) : 17876000, 17876000
. : milestone, 17876000,
profiling (14.856 s) : 14856000, 14856000
. : milestone, 14856000,
tracing (14.673 s) : 14673000, 14673000
. : milestone, 14673000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.56.0-SNAPSHOT~34a6e6ba5a, baseline=1.56.0-SNAPSHOT~5db793a092
dateFormat X
axisFormat %s
section baseline
no_agent (1.472 ms) : 1461, 1484
. : milestone, 1472,
appsec (3.652 ms) : 3439, 3866
. : milestone, 3652,
iast (2.212 ms) : 2148, 2275
. : milestone, 2212,
iast_GLOBAL (2.253 ms) : 2189, 2317
. : milestone, 2253,
profiling (2.067 ms) : 2015, 2119
. : milestone, 2067,
tracing (2.025 ms) : 1976, 2075
. : milestone, 2025,
section candidate
no_agent (1.47 ms) : 1458, 1481
. : milestone, 1470,
appsec (3.699 ms) : 3479, 3919
. : milestone, 3699,
iast (2.201 ms) : 2138, 2264
. : milestone, 2201,
iast_GLOBAL (2.257 ms) : 2193, 2321
. : milestone, 2257,
profiling (2.048 ms) : 1997, 2100
. : milestone, 2048,
tracing (2.012 ms) : 1963, 2061
. : milestone, 2012,
|
| } | ||
|
|
||
| private PublishState decrementRefAndMaybeWrite(boolean isRootSpan) { | ||
| private PublishState decrementRefAndMaybeWrite(boolean isRootSpan, boolean addedSpan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Maybe rename addedSpan for allowPartialWrite
| } | ||
|
|
||
| private PublishState decrementRefAndMaybeWrite(boolean isRootSpan) { | ||
| private PublishState decrementRefAndMaybeWrite(boolean isRootSpan, boolean addedSpan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to me like clever trick. This changes a bit the write dynamic, where the next chance to write is when a new span is added, or when the root span is finished (and the other queueing states). I believe this is good. I've seen some instrumentations like aerospike that explicitly cancel the "continuation", but I don't think this is an issue.
| // DQH - We only trigger a partial flush, when a span has just been added | ||
| // This prevents a bunch of threads which are only performing scope/context operations | ||
| // from all fighting to perform the partialFlush after the threshold is crossed. | ||
|
|
||
| // This is an important optimization for virtual threads where a continuation might | ||
| // be created even though no span is created. In that situation, virtual threads | ||
| // can end up fighting to perform the partialFlush. And even trying to perform a | ||
| // partialFlush requires taking the PendingTrace lock which can lead to unmounting | ||
| // the virtual thread from its carrier thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know whole picture, but just from my experience, what if we should not fight for flush at all?
Maybe we can refactor logic that all threads that interested in flushing would just set some flag to true and some background tread would check it and periodically flush data?
Does it make sense at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of boolean flag, counter would be better solution to implement. Also it would be useful info to know how many times flush was requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm inclined to agree. This was mostly intended as a quick fix / experiment to see if we could improve the reported issue. In my macrobenchmark, I did see a 2% throughput improvement but haven't yet replicated the stall that was reported.
I do like the idea of flipping a boolean. I also don't like that we're taking a long held lock in the application critical path, so there's definitely still a lot of room for improvement.
I did a quick stand-alone macrobenchmark. The macrobenchmark shows a modest but consistent 2% reduction in execution time. |
What Does This Do
Aims to reduce lock contention in PendingTrace by only attempting partialFlush when a span has just been added to PendingTrace.
Prior to this change, we would also attempt a partialFlush after scope/context close as well, but closing a scope cannot cause us to cross the partialFlush threshold.
The theory is that this will improve our lock contention with virtual threads.
The concern is that virtual threads are often only restoring context, but then not creating a span.
That can lead the virtual thread to attempt a partialFlush which requires taking the PendingTrace lock.
If the PendingTrace lock cannot be acquired, then the virtual thread will be unmounted from its carrier thread.
Motivation
Report of high overhead and lock contention when using virtual threads
Additional Notes
Contributor Checklist
type:and (comp:orinst:) labels in addition to any useful labelsclose,fixor any linking keywords when referencing an issue.Use
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]