-
Notifications
You must be signed in to change notification settings - Fork 4k
mma: more cleanup of pending changes #157066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
55198e6 to
7cb5609
Compare
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wenyihu6 please take a look as well.
@tbg reviewed 8 of 8 files at r1, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @wenyihu6)
pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 549 at r1 (raw file):
replicaChanges := MakeLeaseTransferChanges( rangeID, rstate.replicas, rstate.load, addTarget, removeTarget) leaseChange := MakePendingRangeChange(rangeID, replicaChanges[:])
not new, but why the [:]?
pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 844 at r1 (raw file):
// Check that we can undo these changes. if err := a.cs.preCheckOnUndoReplicaChanges(changes); err != nil { panic(err)
thinking out loud: this is required to pass pre-check because whatever the partial set of changes is, if we're rolling them all back, we are moving to a valid state. (even though it may have regressed compared to the latest leaseholder msg, see your example about "s4 should get added with lease but we only see it without lease and eventually the pending change reverts").
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 398 at r1 (raw file):
// MakePendingRangeChange creates a PendingRangeChange for the given rangeID // and changes. It is later fully initialized by MMA.
This kind of comment makes me uneasy. If we're creating something half-initialized, we need to be very explicit about which parts of initialization are missing. Ideally we would avoid such a situation altogether. Is that within reach here?
7cb5609 to
37a0df5
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @tbg and @wenyihu6)
pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 549 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
not new, but why the
[:]?
for the trivial reason that the MakeLeaseTransferChanges and other related methods return arrays and we need a slice here.
pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 844 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
thinking out loud: this is required to pass pre-check because whatever the partial set of changes is, if we're rolling them all back, we are moving to a valid state. (even though it may have regressed compared to the latest leaseholder msg, see your example about "s4 should get added with lease but we only see it without lease and eventually the pending change reverts").
Correct. These are all the changes that are left on the rangeState so if !no-rollback, we have to be able to undo them.
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 398 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This kind of comment makes me uneasy. If we're creating something half-initialized, we need to be very explicit about which parts of initialization are missing. Ideally we would avoid such a situation altogether. Is that within reach here?
I've expanded the comment.
I intend to return to this in the next PR since the ownership situation is not where I need it to be. I've tweaked some of the ownership comments in this PR.
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)
pkg/kv/kvserver/mmaintegration/allocator_sync.go line 126 at r1 (raw file):
as.mu.changeSeqGen++ syncChangeID := as.mu.changeSeqGen as.mu.trackedChanges[syncChangeID] = change
nit: could we add an assertion here that isMMARegistered == (change.mmaChange.pendingReplicaChanges != nil)
pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 820 at r1 (raw file):
if !ok { // Range no longer exists. return
What you’re doing here makes sense, but wanted to confirm it’s an intentional change - the range might no longer exist if the leaseholder has been transferred elsewhere, and the new store leaseholder message no longer includes this range, deleting it before we receive the result of this pending change. So the panic could have been triggered.
The existing PendingRangeChange is now used in all the exported Allocator methods. This makes it clear that the set of changes are being treated as an atomic unit in the interface. The MMA code also reduces the switching between pendingReplicaChange, ChangeID and ReplicaChange which was confusing. It now uses pendingReplicaChange in more cases. Since we use PendingRangeChange to describe the change, and then later annotate it with the change ids, start time etc., there is some refactoring which involves producing a change using MakePendingRangeChange, and later changing its state using clusterState.addPendingRangeChange. Minor cleanup: The Allocator interface now includes integration methods implemented by allocatorState, as originally intended. Informs cockroachdb#157049 Epic: CRDB-55052 Release note: None
37a0df5 to
972d746
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @tbg and @wenyihu6)
pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 820 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
What you’re doing here makes sense, but wanted to confirm it’s an intentional change - the range might no longer exist if the leaseholder has been transferred elsewhere, and the new store leaseholder message no longer includes this range, deleting it before we receive the result of this pending change. So the panic could have been triggered.
Correct. I've added a comment.
pkg/kv/kvserver/mmaintegration/allocator_sync.go line 126 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
nit: could we add an assertion here that isMMARegistered == (change.mmaChange.pendingReplicaChanges != nil)
I didn't make this change. pendingReplicaChanges is not visible in this package. I could have asserted on RangeID, but we don't even explicitly document that 0 is not a valid RangeID, so I did't bother.
|
bors r=tbg,wenyihu6 |
The existing PendingRangeChange is now used in all the exported Allocator methods. This makes it clear that the set of changes are being treated as an atomic unit in the interface. The MMA code also reduces the switching between pendingReplicaChange, ChangeID and ReplicaChange which was confusing. It now uses pendingReplicaChange in more cases.
Since we use PendingRangeChange to describe the change, and then later annotate it with the change ids, start time etc., there is some refactoring which involves producing a change using MakePendingRangeChange, and later changing its state using clusterState.addPendingRangeChange.
Minor cleanup: The Allocator interface now includes integration methods implemented by allocatorState, as originally intended.
Informs #157049
Epic: CRDB-55052
Release note: None