Skip to content

Conversation

@sumeerbhola
Copy link
Collaborator

The existing PendingRangeChange is now used in all the exported Allocator methods. This makes it clear that the set of changes are being treated as an atomic unit in the interface. The MMA code also reduces the switching between pendingReplicaChange, ChangeID and ReplicaChange which was confusing. It now uses pendingReplicaChange in more cases.

Since we use PendingRangeChange to describe the change, and then later annotate it with the change ids, start time etc., there is some refactoring which involves producing a change using MakePendingRangeChange, and later changing its state using clusterState.addPendingRangeChange.

Minor cleanup: The Allocator interface now includes integration methods implemented by allocatorState, as originally intended.

Informs #157049

Epic: CRDB-55052

Release note: None

@sumeerbhola sumeerbhola requested review from tbg and wenyihu6 November 7, 2025 15:58
@sumeerbhola sumeerbhola requested review from a team as code owners November 7, 2025 15:58
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@wenyihu6 please take a look as well.

@tbg reviewed 8 of 8 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @wenyihu6)


pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 549 at r1 (raw file):

				replicaChanges := MakeLeaseTransferChanges(
					rangeID, rstate.replicas, rstate.load, addTarget, removeTarget)
				leaseChange := MakePendingRangeChange(rangeID, replicaChanges[:])

not new, but why the [:]?


pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 844 at r1 (raw file):

		// Check that we can undo these changes.
		if err := a.cs.preCheckOnUndoReplicaChanges(changes); err != nil {
			panic(err)

thinking out loud: this is required to pass pre-check because whatever the partial set of changes is, if we're rolling them all back, we are moving to a valid state. (even though it may have regressed compared to the latest leaseholder msg, see your example about "s4 should get added with lease but we only see it without lease and eventually the pending change reverts").


pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 398 at r1 (raw file):

// MakePendingRangeChange creates a PendingRangeChange for the given rangeID
// and changes. It is later fully initialized by MMA.

This kind of comment makes me uneasy. If we're creating something half-initialized, we need to be very explicit about which parts of initialization are missing. Ideally we would avoid such a situation altogether. Is that within reach here?

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg and @wenyihu6)


pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 549 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

not new, but why the [:]?

for the trivial reason that the MakeLeaseTransferChanges and other related methods return arrays and we need a slice here.


pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 844 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

thinking out loud: this is required to pass pre-check because whatever the partial set of changes is, if we're rolling them all back, we are moving to a valid state. (even though it may have regressed compared to the latest leaseholder msg, see your example about "s4 should get added with lease but we only see it without lease and eventually the pending change reverts").

Correct. These are all the changes that are left on the rangeState so if !no-rollback, we have to be able to undo them.


pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 398 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This kind of comment makes me uneasy. If we're creating something half-initialized, we need to be very explicit about which parts of initialization are missing. Ideally we would avoid such a situation altogether. Is that within reach here?

I've expanded the comment.
I intend to return to this in the next PR since the ownership situation is not where I need it to be. I've tweaked some of the ownership comments in this PR.

Copy link
Contributor

@wenyihu6 wenyihu6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)


pkg/kv/kvserver/mmaintegration/allocator_sync.go line 126 at r1 (raw file):

	as.mu.changeSeqGen++
	syncChangeID := as.mu.changeSeqGen
	as.mu.trackedChanges[syncChangeID] = change

nit: could we add an assertion here that isMMARegistered == (change.mmaChange.pendingReplicaChanges != nil)


pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 820 at r1 (raw file):

	if !ok {
		// Range no longer exists.
		return

What you’re doing here makes sense, but wanted to confirm it’s an intentional change - the range might no longer exist if the leaseholder has been transferred elsewhere, and the new store leaseholder message no longer includes this range, deleting it before we receive the result of this pending change. So the panic could have been triggered.

The existing PendingRangeChange is now used in all the exported Allocator
methods. This makes it clear that the set of changes are being treated as
an atomic unit in the interface. The MMA code also reduces the switching
between pendingReplicaChange, ChangeID and ReplicaChange which was
confusing. It now uses pendingReplicaChange in more cases.

Since we use PendingRangeChange to describe the change, and then later
annotate it with the change ids, start time etc., there is some refactoring
which involves producing a change using MakePendingRangeChange, and later
changing its state using clusterState.addPendingRangeChange.

Minor cleanup: The Allocator interface now includes integration methods
implemented by allocatorState, as originally intended.

Informs cockroachdb#157049

Epic: CRDB-55052

Release note: None
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @tbg and @wenyihu6)


pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 820 at r1 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

What you’re doing here makes sense, but wanted to confirm it’s an intentional change - the range might no longer exist if the leaseholder has been transferred elsewhere, and the new store leaseholder message no longer includes this range, deleting it before we receive the result of this pending change. So the panic could have been triggered.

Correct. I've added a comment.


pkg/kv/kvserver/mmaintegration/allocator_sync.go line 126 at r1 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

nit: could we add an assertion here that isMMARegistered == (change.mmaChange.pendingReplicaChanges != nil)

I didn't make this change. pendingReplicaChanges is not visible in this package. I could have asserted on RangeID, but we don't even explicitly document that 0 is not a valid RangeID, so I did't bother.

@sumeerbhola
Copy link
Collaborator Author

bors r=tbg,wenyihu6

@craig
Copy link
Contributor

craig bot commented Nov 9, 2025

@craig craig bot merged commit df1585e into cockroachdb:master Nov 9, 2025
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants