-
Notifications
You must be signed in to change notification settings - Fork 315
Reduce PendingTrace Lock Contention #9932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -234,7 +234,7 @@ PublishState onPublish(final DDSpan span) { | |
| if (span == rootSpan) { | ||
| tracer.onRootSpanPublished(rootSpan); | ||
| } | ||
| return decrementRefAndMaybeWrite(span == rootSpan); | ||
| return decrementRefAndMaybeWrite(span == rootSpan, true); | ||
| } | ||
|
|
||
| @Override | ||
|
|
@@ -265,10 +265,10 @@ public void registerContinuation(final AgentScope.Continuation continuation) { | |
|
|
||
| @Override | ||
| public void removeContinuation(final AgentScope.Continuation continuation) { | ||
| decrementRefAndMaybeWrite(false); | ||
| decrementRefAndMaybeWrite(false, false); | ||
| } | ||
|
|
||
| private PublishState decrementRefAndMaybeWrite(boolean isRootSpan) { | ||
| private PublishState decrementRefAndMaybeWrite(boolean isRootSpan, boolean addedSpan) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick: Maybe rename |
||
| final int count = PENDING_REFERENCE_COUNT.decrementAndGet(this); | ||
| if (strictTraceWrites && count < 0) { | ||
| throw new IllegalStateException("Pending reference count " + count + " is negative"); | ||
|
|
@@ -283,8 +283,19 @@ private PublishState decrementRefAndMaybeWrite(boolean isRootSpan) { | |
| // Finished root with pending work ... delay write | ||
| pendingTraceBuffer.enqueue(this); | ||
| return PublishState.ROOT_BUFFERED; | ||
| } else if (partialFlushMinSpans > 0 && size() >= partialFlushMinSpans) { | ||
| } else if (addedSpan && partialFlushMinSpans > 0 && size() >= partialFlushMinSpans) { | ||
| // Trace is getting too big, write anything completed. | ||
|
|
||
| // DQH - We only trigger a partial flush, when a span has just been added | ||
| // This prevents a bunch of threads which are only performing scope/context operations | ||
| // from all fighting to perform the partialFlush after the threshold is crossed. | ||
|
|
||
| // This is an important optimization for virtual threads where a continuation might | ||
| // be created even though no span is created. In that situation, virtual threads | ||
| // can end up fighting to perform the partialFlush. And even trying to perform a | ||
| // partialFlush requires taking the PendingTrace lock which can lead to unmounting | ||
| // the virtual thread from its carrier thread. | ||
|
Comment on lines
+289
to
+297
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not know whole picture, but just from my experience, what if we should not fight for flush at all?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe instead of boolean flag, counter would be better solution to implement. Also it would be useful info to know how many times flush was requested.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I'm inclined to agree. This was mostly intended as a quick fix / experiment to see if we could improve the reported issue. In my macrobenchmark, I did see a 2% throughput improvement but haven't yet replicated the stall that was reported. I do like the idea of flipping a boolean. I also don't like that we're taking a long held lock in the application critical path, so there's definitely still a lot of room for improvement. |
||
|
|
||
| partialFlush(); | ||
| return PublishState.PARTIAL_FLUSH; | ||
| } else if (rootSpanWritten) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, I'm curious what others think of this potential change.
I'm intending to write a microbenchmark to see if I can verify that this change is profitable.
I also think I can write a test verifies the PendingTrace behavior by using a custom writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to me like clever trick. This changes a bit the write dynamic, where the next chance to write is when a new span is added, or when the root span is finished (and the other queueing states). I believe this is good. I've seen some instrumentations like aerospike that explicitly cancel the "continuation", but I don't think this is an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure if this addresses the reported issue.
However, it does cut my macrobenchmark by 2-3%. Given that my macrobenchmark uses
@Traceannotations which are rather heavy, I suspect the gains might be larger with typical auto-instrumentation.