-
Notifications
You must be signed in to change notification settings - Fork 398
Fix losing original karafka trace after iterating through the messages with distributed_tracing on #4876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Err thanks for the patience, I've asked the team to take a look at this asap :) |
|
@Drowze I took a first pass and I need to allocate proper time to understand how we best want to present this kind of workflow (span links, directly linking, something else). I have scheduled time to take a good look at this next week. |
|
Regarding span links: I read https://docs.datadoghq.com/tracing/trace_collection/span_links/ to try to understand their usage. The documentation page says the span links are for relationships other than parent/child. In this PR they are used for parent/child relationship? What is the actual gain/benefit? |
|
Looking at this PR again I don't understand the fix. The problem is the parent trace gets reset somehow, therefore I would expect the fix to repair the reset. How is adding links fixing the reset? |
54c704d to
4f1847d
Compare
|
Coming back to this now - apologies for the delay 🙇
Sure - sounds good to me. I understand that span links are not widely used at the moment and using them here just made the PR a bit confusing to review, so I'll move that part into a separate pull request - and we can continue discussing span links there. @p-datadog I just rebased the PR with latest master and removed the code referring to span links - can you have another look please? 🙇 |
48c1d1f to
1d49933
Compare
p-datadog
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Drowze for splitting the PR.
I think this PR look good but I am not an expert in the tracing side, I'll try to find another reviewer.
|
|
||
| # assert that the message span is not continuation of the producer span | ||
| expect(span.parent_id).to eq(consumer_span.id) | ||
| expect(span.trace_id).to eq(consumer_trace.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the other test, is consumer (the middle span, if I am understanding the logic correctly) expected to be a child of producer? Is that asserted somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In these tests, the consumer span is never expected to be a child of any span. Referring to each test individually:
-
when the message has tracing headers, when distributed tracing is enabled, it continues the span that produced the messageconsumeris a root span and it doesn't have any child spansproduceris a root span and it has one child span from iterating the messages(*)
-
when the message has tracing headers, when distributed tracing is not enabled, it does not continue the span that produced the messageconsumeris a root span and it has one child span from iterating the messages(*)produceris a root span and it doesn't have any child spans
Hope that makes sense! 🙇
(*) we create spans from iterating the messages as the karafka integration patches ::Karafka::Messages::Messages#each
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(*) we create spans from iterating the messages as the karafka integration patches ::Karafka::Messages::Messages#each
I think that this might not be clear enough from our tests... The messages are being iterated here:
# this creates a new span
expect(messages).to all(be_a(::Karafka::Messages::Message))
# ...
# in these expectations, `span` refers to the span created from iterating the messages
expect(span.parent_id).to eq(consumer_span.id)
expect(span.trace_id).to eq(consumer_trace.id)
# because of these `let`s (from the beginning of the file)
let(:span) do
spans.find { |s| s.name == span_name }
end
let(:span_name) { Datadog::Tracing::Contrib::Karafka::Ext::SPAN_MESSAGE_CONSUME }I'll update the test now to use an explicit each and add a comment about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just updated the test with comments better expliciting the test intentions - please let me know if it's more clear now 🙇
note: I didn't add an explicit each because this same pattern of relying on expect(...).to all(...) to create a span is used across pretty much all the tests in that file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think I understand what you are saying, but it's strange to me that the iterating span changed from being under the producer to being under the consumer but there is no relationship between the producer and the consumer. Might be a question outside the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating the message does not really change any spans - it's continuing a trace that changes the parent/child relationship of spans.
In other words:
- when
distributed_tracing=true, the message span is a continuation of the span that created it (if any) (i.e.: the message span turns into a child of the span who produced the message) - when
distributed_tracing=false, the message span is just a regular span (which may have a parent - which in the case of theses tests is theconsumerspan who's iterating the messages)
Distributed tracing gets confusing quickly 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just fyi: I found this issue describing a similar use-case, so I've included our own use-case there as well (along with an example & screenshots)
f8a9e45 to
c32d6a6
Compare
|
Hey Datadog team 👋 @vpellan @p-datadog @marcotc @ivoanjo As of now we're having to use our own dd-trace-rb fork due to some issues/missing features around the Karafka/WaterDrop integration (our fork is the latest dd-trace-rb release (v2.22.0), plus this PR, #5120 and #5107). |
What does this PR do?
On the Karafka integration, when
distributed_tracingis on, the original trace was lost after iterating through the messages. This PR aims to fix thatand also add span links linking the message traces with the parent consumer trace (so it's easier to find each other using the Datadog APM UI).Fixes #4873
Motivation:
Fix and improve the Datadog Karafka support.
Change log entry
Additional Notes:
How to test the change?
See #4873 for a full testing snippet.