CASSJAVA-116: Retry or Speculative Execution with RequestIdGenerator throws "Duplicate Key" #2060

SiyaoIsHiding · 2025-10-31T18:41:26Z

No description provided.

lukasz-antoniak · 2025-11-03T10:23:15Z

core/src/main/java/com/datastax/oss/driver/api/core/tracker/RequestIdGenerator.java

  default Statement<?> getDecoratedStatement(
      @NonNull Statement<?> statement, @NonNull String requestId) {
+    // in case of retry or speculative execution
+    if (statement.getCustomPayload().containsKey(getCustomPayloadKey())) {


Should we not override the value instead of early-exiting? traceparent key will contain node request ID and IMHO it should be unique per attempt.

Retries or speculative execution attempts are ideally supposed to be the same (unmodified) content that went out the first time... kinda makes sense given the role those two features play in the driver. With that in mind I assume you'd want to have the same ID here so that the retry or speculative execution attempt could be more easily identified as what they are and not mistaken for other follow-up requests from the driver.

@SiyaoIsHiding can confirm if that's what she was after here or not but that seems right to me.

I think I agree, I would want to retain the same request ID as the fact that the query was speculated/retried is useful context when tracing a request.

We should override instead of using the same ID for the node requests under the same session request.

TL;DR: In all of our ID generator implementations, the node request ID already contains the session request ID in it.

So let's say the session request ID is "4bf92f3577b34da6a3ce929d0e0e4736", if we search nodeRequestID.contains("4bf92f3577b34da6a3ce929d0e0e4736"), we will get all the retry/speculative executions for this session request, e.g. "00-4bf92f3577b34da6a3ce929d0e0e4736-a3ce929d0e0e4736-01" or 00-4bf92f3577b34da6a3ce929d0e0e4736-0e4736a3ce929d0e-01"

This aligns with W3C context , that the trace ID is the same across the transasction, but span ID should be always unique.

This is also documented in our implementations.

cassandra-java-driver/core/src/main/java/com/datastax/oss/driver/internal/core/tracker/UuidRequestIdGenerator.java

Lines 35 to 40 in f631081

/**

* {session-request-id}-{random-uuid} All node requests for a session request will have the same

* session request id

*/

@Override

public String getNodeRequestId(@NonNull Request statement, @NonNull String parentId) {

cassandra-java-driver/core/src/main/java/com/datastax/oss/driver/internal/core/tracker/W3CContextRequestIdGenerator.java

Lines 50 to 57 in f631081

/**

* Following the format of W3C "traceparent" spec,

* https://www.w3.org/TR/trace-context/#traceparent-header-field-values e.g.

* "00-4bf92f3577b34da6a3ce929d0e0e4736-a3ce929d0e0e4736-01" All node requests in the same session

* request share the same "trace-id" field value

*/

@Override

public String getNodeRequestId(@NonNull Request statement, @NonNull String parentId) {

Good catch from @lukasz-antoniak 🙏

Maybe I'm missing something but as I understand it the question isn't whether node request ID contains session request ID or not. The question is whether we should update the node request ID (regardless of it's structure) when we need to send retries or speculative executions. In other contexts we go to considerable lengths to make sure we're sending the exact same messages we sent earlier... so it seems a bit weird not to do so here.

The W3C docs linked above indicate that trace-id SHOULD be unique but they don't require it. Furthermore they're talking about uniqueness across individual requests; this spec is silent on things like retrying a request. And at least some sources I'm finding seem to indicate that the same trace-id should be used if a transaction is retried.

This value in the custom payload, what we call "request-id", is not the trace ID, it's the "traceparent" for context propagation.

Driver concept W3C concept Example

Request ID value in the custom payload traceparent (for context propagation) 00-4bf92f3577b34da6a3ce929d0e0e4736-a3ce929d0e0e4736-01

Session request ID trace ID 4bf92f3577b34da6a3ce929d0e0e4736

The 8 bytes that we add to a session request ID to form the node request ID span ID a3ce929d0e0e4736

And the traceparent context propagation format is:
version "-" trace-id "-" span-id "-" trace-flags

Trace ID needs to be the same across all spans in a transaction.

Span ID is always unique. Otherwise, let's say if the driver sends out 2 requests, including a retry, both with the same node request ID. Then on the server side, you see two identical requests, how can you tell which one is the first, which one is the retry?

Span ID is an identifier that always need to be unique to identify each individual operation. Trace ID is how we correlate retries and speculative executions under one session request. The value in the custom payload needs both, so that we can both correlate and separate different operations in one transaction.

To close the loop on this: the level 2 trace-context spec is pretty clear on this point:

The value of span-id SHOULD be unique within a distributed trace. If the value of span-id is not unique within a distributed trace, parent-child relationships between spans within the distributed trace may be ambiguous.

…ng entry

tolbertam · 2025-11-04T15:21:25Z

core/src/main/java/com/datastax/oss/driver/api/core/tracker/RequestIdGenerator.java

+    Map<String, ByteBuffer> existing = statement.getCustomPayload();
+    String key = getCustomPayloadKey();
+
+    NullAllowingImmutableMap.Builder<String, ByteBuffer> builder =


Just wondering, at this point what value is NullAllowingImmutableMap providing?

Since it is what has the duplicate key restriction, and it's main value is allowing null keys and values, but we would never set a null key/value in this context (correct me if I'm wrong?).

As far as I understand it, it's just ImmutableMap that has the null key/value restriction. Why wouldn't we just use a regular Map implementation and wrap it in Collections.unmodifiableMap?

I did some spelunking and it looks like @emerkle826 made that mod way back in the day. I've already pinged him ~~demanding that he explain himself~~ asking very politely if we could provide the relevant context 🧌

re: what map to use.

It seems like the protocol allows null values in the custom payload (source).

So the custom payload provided by the user can already have null values in it, and we need to support that.

But a normal ImmutableMap does not allow a null value. I'm guessing that's why we were using a NullAllowingImmutableMap. Pending for @olim7t and @emerkle826 to confirm.

However, Collections.unmodifiableMap is also immutable and allows null values. I cannot quite see why we don't use that.

The code here will be cleaner if we use Collections.unmodifiableMap. If no objections, I think we can go that way.

Just chiming in that when I added the NullAllowingImmutableMap, I believe I did it simply because it was the Map type we were using around changes made in Native Protocol related to #1087. I don't recall any comments or real reasons around choosing that Map implementation as opposed to some other immutable map that allowed nulls other than we were already using it in native protocol so why not. :)

tolbertam

👍 this looks like it should address the issue to me!

lukasz-antoniak · 2025-11-06T06:39:35Z

integration-tests/src/test/java/com/datastax/oss/driver/core/tracker/RequestIdGeneratorIT.java

    try (CqlSession session = SessionUtils.newSession(ccmRule, loader)) {
      String query = "SELECT * FROM system.local";
      ResultSet rs = session.execute(query);
      assertThat(rs.getExecutionInfo().getRequest().getCustomPayload().get("trace_key")).isNull();


I think we should check here for absence of request-id (default), not trace_key (copied from upper test).

lukasz-antoniak · 2025-11-06T06:40:19Z

integration-tests/src/test/java/com/datastax/oss/driver/core/tracker/RequestIdGeneratorIT.java

+          SimpleStatement.newInstance(query).setCustomPayload(customPayload);
+      assertThatStage(session.executeAsync(statement)).isSuccess();
+      ResultSet rs = session.execute(query);
+      assertThat(rs.getExecutionInfo().getRequest().getCustomPayload().get("trace_key")).isNull();


IMO this assertion is not needed in this test.

lukasz-antoniak

Left two NIT comments. PR looks good!

tolbertam · 2025-11-07T01:25:17Z

@SiyaoIsHiding looks like we got 2 +1s on the review 🎉

Mind implementing Lukasz's suggestions and then squashing into one commit and adding:

"patch by Jane He; reviewed by Andy Tolbert and Lukasz Atoniak for CASSJAVA-116" to the last line of the commit message?

absurdfarce · 2025-11-07T05:52:16Z

core/src/main/java/com/datastax/oss/driver/api/core/tracker/RequestIdGenerator.java

+
+    Map<String, ByteBuffer> unmodifiableMap = Collections.unmodifiableMap(existing);
+
+    return statement.setCustomPayload(unmodifiableMap);


I don't know that I want to hold things up any but I'd argue ImmutableMap is more concise (and avoids the need to create at least one of the intermediate maps here):

default Statement<?> getDecoratedStatement( @NonNull Statement<?> statement, @NonNull String requestId) { return statement.setCustomPayload( ImmutableMap.<String,ByteBuffer>builder() .putAll(statement.getCustomPayload()) .put(getCustomPayloadKey(), ByteBuffer.wrap(requestId.getBytes(StandardCharsets.UTF_8))) .buildKeepingLast()); }

The original map implementation that was used that caused this issue NullAllowingImmutableMap was presumably being used to allow null keys/values, which ImmutableMap doesn't allow, so if we use that it's possible any existing custom payload map with null keys and values will cause an exception to be thrown here (discussion here: #2060 (comment))

I'm gonna make some notes here 'cause in all honesty this was pretty perplexing to me... but I think I finally got to something resembling an answer. I don't know how useful all of this will be but brain dump follows.

The doc referenced by @tolbertam above is from the 3.6 Java driver, which was an entirely different code base. The constant Statement.NULL_PAYLOAD_VALUE is indeed present in that version... so far so good. Thing is... there's no corresponding entry in the 4.x manual and the equivalent 4.x class doesn't have such a constant. So... what's going on here?

Let's take a look at what the v4 native protocol spec actually says here. A bytes map is understood to have a string for a key and bytes for a map. A string is understood to be "A [short] n, followed by n bytes representing an UTF-8 string". There's no obvious way for that to be null, which is consistent with the verbiage in the 3.6 manual page. bytes on the other hand is defined as follows: "A [int] n, followed by n bytes if n >= 0. If n < 0, no byte should follow and the value represented is null" That's presumably the point of the marker referenced in the 3.6 manual page.

To confirm all of this I ran a simple test client against C* 5.0.6 using Java driver 4.19.1. In each case a custom payload was set on a simple statement and executed. Results seem to follow the pattern described above: when using a null key I get a weird native-protocol exception (an issue has been created for that elsewhere) while a null value seems to work just fine.

So where does this leave us? In order to leverage ImmutableMap you need to translate null values into zero-length ByteBuffers... but do so without acquiring intermediate state. You can do it with something like the following:

public class KeyspaceCount { static class ImmutableEntry<K,V> implements Map.Entry<K,V> { private final K key; private final V val; public ImmutableEntry(K key, V val) { this.key = key; this.val = val; } @Override public K getKey() { return key; } @Override public V getValue() { return val; } @Override public V setValue(Object value) { throw new UnsupportedOperationException("You can't do that"); } } public static void main(String[] args) { try (CqlSession session = CqlSession.builder().build()) { SimpleStatement stmt = SimpleStatement.builder("select release_version from system.local").build(); HashMap<String, ByteBuffer> map = new HashMap<>(); map.put("key", ByteBuffer.allocate(0)); ImmutableMap.Builder<String,ByteBuffer> builder = ImmutableMap.builder(); map.entrySet() .stream() .map((Map.Entry<String,ByteBuffer> entry) -> { return entry.getValue() != null ? entry : new ImmutableEntry<>(entry.getKey(), ByteBuffer.allocate(0)); }) .forEach(builder::put); ResultSet rs = session.execute(stmt.setCustomPayload(builder.buildKeepingLast())); Row row = rs.one(); assert row != null; String releaseVersion = row.getString("release_version"); System.out.printf("Cassandra version is: %s%n", releaseVersion); } } }

That actually does work, and it's all stream-based so that the only map you create is the one you're actually add to the Statement object. You could pull the immutable Map.Entry impl into a utility class somewhere to help keep the change down but in the end yeah, I agree it's not quite as clean as I'd like. I'd still argue it's an improvement because (a) you avoid the intermediate maps and (b) you get the immutable part reflected in the type system but these maps are pretty short-lived anyway... so it probably doesn't matter that much.

fix CASSJAVA-116

7fd929c

lukasz-antoniak reviewed Nov 3, 2025

View reviewed changes

decorated statement() to overwrite instead of doing nothing on existi…

63eb118

…ng entry

SiyaoIsHiding requested review from absurdfarce, lukasz-antoniak and tolbertam November 4, 2025 06:02

This was referenced Nov 4, 2025

Downgrade to Cassandra Driver 4.19.0 spring-projects/spring-data-cassandra#1617

Closed

Downgrade to Cassandra Driver 4.19.0 spring-projects/spring-boot#47926

Closed

tolbertam reviewed Nov 4, 2025

View reviewed changes

Use Collections.unmodifiableMap

eaa9173

SiyaoIsHiding requested a review from tolbertam November 5, 2025 02:54

tolbertam approved these changes Nov 5, 2025

View reviewed changes

lukasz-antoniak reviewed Nov 6, 2025

View reviewed changes

lukasz-antoniak approved these changes Nov 6, 2025

View reviewed changes

absurdfarce reviewed Nov 7, 2025

View reviewed changes

	/**
	* {session-request-id}-{random-uuid} All node requests for a session request will have the same
	* session request id
	*/
	@Override
	public String getNodeRequestId(@NonNull Request statement, @NonNull String parentId) {

	/**
	* Following the format of W3C "traceparent" spec,
	* https://www.w3.org/TR/trace-context/#traceparent-header-field-values e.g.
	* "00-4bf92f3577b34da6a3ce929d0e0e4736-a3ce929d0e0e4736-01" All node requests in the same session
	* request share the same "trace-id" field value
	*/
	@Override
	public String getNodeRequestId(@NonNull Request statement, @NonNull String parentId) {

Driver concept	W3C concept	Example
Request ID value in the custom payload	traceparent (for context propagation)	00-4bf92f3577b34da6a3ce929d0e0e4736-a3ce929d0e0e4736-01
Session request ID	trace ID	4bf92f3577b34da6a3ce929d0e0e4736
The 8 bytes that we add to a session request ID to form the node request ID	span ID	a3ce929d0e0e4736


		Map<String, ByteBuffer> unmodifiableMap = Collections.unmodifiableMap(existing);

		return statement.setCustomPayload(unmodifiableMap);

CASSJAVA-116: Retry or Speculative Execution with RequestIdGenerator throws "Duplicate Key" #2060

Are you sure you want to change the base?

CASSJAVA-116: Retry or Speculative Execution with RequestIdGenerator throws "Duplicate Key" #2060

Conversation

SiyaoIsHiding commented Oct 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tolbertam left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukasz-antoniak left a comment

Choose a reason for hiding this comment

Uh oh!

tolbertam commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tolbertam commented Nov 7, 2025 •

edited

Loading