Skip to content

Conversation

@ericm-db
Copy link
Contributor

@ericm-db ericm-db commented Nov 18, 2025

What changes were proposed in this pull request?

Introducing the OffsetMap format to key source progress by source name, as opposed to ordinal in the logical plan

Why are the changes needed?

These changes are needed in order to enable source evolution on a streaming query (adding, removing, reordering sources) without requiring the user to set a new checkpoint directory

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests

Was this patch authored or co-authored using generative AI tooling?

No

@ericm-db ericm-db changed the title [WIP] Introducing OffsetMap to enable Named Streaming Sources [SPARK-54423] Introducing OffsetMap to enable Named Streaming Sources Nov 19, 2025
@HyukjinKwon HyukjinKwon changed the title [SPARK-54423] Introducing OffsetMap to enable Named Streaming Sources [SPARK-54423][SS] Introducing OffsetMap to enable Named Streaming Sources Nov 19, 2025
// Using index as sourceId initially, can be extended to support user-provided names
// This is initialized in the same path as the sources Seq (defined above) and is used
// in the same way, when OffsetLog v2 is used.
@volatile protected var sourceIdMap: Map[String, SparkDataStream] = Map.empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we mark this as volatile ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants