Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ PublishState onPublish(final DDSpan span) {
if (span == rootSpan) {
tracer.onRootSpanPublished(rootSpan);
}
return decrementRefAndMaybeWrite(span == rootSpan);
return decrementRefAndMaybeWrite(span == rootSpan, true);
}

@Override
Expand Down Expand Up @@ -265,10 +265,10 @@ public void registerContinuation(final AgentScope.Continuation continuation) {

@Override
public void removeContinuation(final AgentScope.Continuation continuation) {
decrementRefAndMaybeWrite(false);
decrementRefAndMaybeWrite(false, false);
}

private PublishState decrementRefAndMaybeWrite(boolean isRootSpan) {
private PublishState decrementRefAndMaybeWrite(boolean isRootSpan, boolean addedSpan) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, I'm curious what others think of this potential change.
I'm intending to write a microbenchmark to see if I can verify that this change is profitable.
I also think I can write a test verifies the PendingTrace behavior by using a custom writer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to me like clever trick. This changes a bit the write dynamic, where the next chance to write is when a new span is added, or when the root span is finished (and the other queueing states). I believe this is good. I've seen some instrumentations like aerospike that explicitly cancel the "continuation", but I don't think this is an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure if this addresses the reported issue.
However, it does cut my macrobenchmark by 2-3%. Given that my macrobenchmark uses @Trace annotations which are rather heavy, I suspect the gains might be larger with typical auto-instrumentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Maybe rename addedSpan for allowPartialWrite

final int count = PENDING_REFERENCE_COUNT.decrementAndGet(this);
if (strictTraceWrites && count < 0) {
throw new IllegalStateException("Pending reference count " + count + " is negative");
Expand All @@ -283,8 +283,19 @@ private PublishState decrementRefAndMaybeWrite(boolean isRootSpan) {
// Finished root with pending work ... delay write
pendingTraceBuffer.enqueue(this);
return PublishState.ROOT_BUFFERED;
} else if (partialFlushMinSpans > 0 && size() >= partialFlushMinSpans) {
} else if (addedSpan && partialFlushMinSpans > 0 && size() >= partialFlushMinSpans) {
// Trace is getting too big, write anything completed.

// DQH - We only trigger a partial flush, when a span has just been added
// This prevents a bunch of threads which are only performing scope/context operations
// from all fighting to perform the partialFlush after the threshold is crossed.

// This is an important optimization for virtual threads where a continuation might
// be created even though no span is created. In that situation, virtual threads
// can end up fighting to perform the partialFlush. And even trying to perform a
// partialFlush requires taking the PendingTrace lock which can lead to unmounting
// the virtual thread from its carrier thread.
Comment on lines +289 to +297
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know whole picture, but just from my experience, what if we should not fight for flush at all?
Maybe we can refactor logic that all threads that interested in flushing would just set some flag to true and some background tread would check it and periodically flush data?
Does it make sense at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of boolean flag, counter would be better solution to implement. Also it would be useful info to know how many times flush was requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm inclined to agree. This was mostly intended as a quick fix / experiment to see if we could improve the reported issue. In my macrobenchmark, I did see a 2% throughput improvement but haven't yet replicated the stall that was reported.

I do like the idea of flipping a boolean. I also don't like that we're taking a long held lock in the application critical path, so there's definitely still a lot of room for improvement.


partialFlush();
return PublishState.PARTIAL_FLUSH;
} else if (rootSpanWritten) {
Expand Down
Loading