-
Notifications
You must be signed in to change notification settings - Fork 38
✨ Multi cluster controller sharding #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Nelo-T. Wallus <10514301+ntnn@users.noreply.github.com>
Co-authored-by: Marvin Beckers <mail@embik.me>
feat: controller sharding
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: zachsmith1 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Welcome @zachsmith1! |
|
this looks amazing 🚀 with this implementation, is it possible to shard individual clusters too or is it per cluster? |
This is just for clusters but I think there is another community project we could leverage for the intra cluster sharding |
|
@sttts I fixed a startup ordering bug: providers were starting a remote cluster’s cache before registering controller watches and before publishing the cluster to the provider map. That meant the informer’s initial list/sync happened with no handlers attached (so initial “add” events were missed), and early reconciles could fail because mgr.GetCluster(req.ClusterName) wasn’t resolvable yet. We now publish the cluster first (so GetCluster works), then Engage (which registers watches), and only then start the cache and wait for sync. This mirrors controller-runtime’s contract (start event sources before cache sync), ensuring initial add events reach handlers and “existing objects” are reconciled immediately—making the test "runs the reconciler for existing objects" pass. |
mjudeikis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just few comment from last weekd debugging session. Didnt looked to other code yet,
|
|
||
| // Engage before starting cache so handlers are registered for initial adds. | ||
| if aware != nil { | ||
| if err := aware.Engage(ctx, clusterName, cl); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this not start the provider via runnable and hence trigger the reconciler? So basically triggering the reconciler with not started informers? When adding 2nd or 3rd clusters to clusters, you will trigger the reconciler before the cache?
We had these issues last week with @ntnn . How are you testing this? Meaning, which providers are you running, so it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this flow is bit different for 1st cluster where reconcile loop might not be running yet and second, when reconcilers is already there. So once you engage the cluster it already pops up in the queue. And if you have non-shared informers and caches (not started before) - this might fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Swapping this around feels wrong. I had this originally the other way around in clusters.Clusters with a similar intent of "first register handlers and reconcilers, then start" - but given that this applies to all Awares this can lead to opaque errors. In our case the opaque error was that the mcmanager is waiting on the cache sync when engaging the cluster.
My guess is a similar thing is happening here now and swapping this around in the providers is just masking the actual error.
With the sync engine taking charge of the runnables, maybe this is a problem of the providers being part of the runnables?
Also it looks a bit off that the engine is added as a runnable to the manager but simultaneously the engine takes charge of the runnables?
| p.log.Info("Added new cluster") | ||
|
|
||
| // engage manager. | ||
| // engage before starting cache so handlers are registered for initial adds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above. I feel this will not work for follow-up clusters? might work for first one, but not second, third?
| if err := mgr.Add(eng.Runnable()); err != nil { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be mgr.GetLocalManager().Add so the engine isn't added as a runnable to itself?
Haven't looked closer but that's what it looks like at a cursory look.
What
Deterministic sharding via HRW (rendezvous) over live peers.
Peer leases mcr-peer-* (membership/weights) + shard leases mcr-shard- (per-cluster fencing, single writer).
Clean handoff: watches detach/re-attach on ownership change.
Rebalance on scale up/down.
Why
Prevent double reconciles & cold-start stampede.
Balanced, deterministic ownership with fast failover.
How to try
Follow the README (examples/sharded-namespace) for build/deploy/observe steps.
Changes
Manager: HRW + per-cluster Lease fencing, peer/fence prefix split.
Controller: per-cluster engagement context, re-engage fix.
Source: removable handler, re-register on new ctx.