Skip to content

Conversation

@zachsmith1
Copy link
Contributor

What

Deterministic sharding via HRW (rendezvous) over live peers.
Peer leases mcr-peer-* (membership/weights) + shard leases mcr-shard- (per-cluster fencing, single writer).
Clean handoff: watches detach/re-attach on ownership change.
Rebalance on scale up/down.

Why

Prevent double reconciles & cold-start stampede.
Balanced, deterministic ownership with fast failover.

How to try

Follow the README (examples/sharded-namespace) for build/deploy/observe steps.

Changes

Manager: HRW + per-cluster Lease fencing, peer/fence prefix split.
Controller: per-cluster engagement context, re-engage fix.
Source: removable handler, re-register on new ctx.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zachsmith1
Once this PR has been reviewed and has the lgtm label, please assign jeremyot for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from skitt and sttts September 18, 2025 17:28
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 18, 2025
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Sep 18, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Contributor

Welcome @zachsmith1!

It looks like this is your first PR to kubernetes-sigs/multicluster-runtime 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/multicluster-runtime has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 18, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Sep 18, 2025
@embik embik self-requested a review September 19, 2025 07:31
@FourFifthsCode
Copy link
Contributor

this looks amazing 🚀

with this implementation, is it possible to shard individual clusters too or is it per cluster?

@zachsmith1
Copy link
Contributor Author

zachsmith1 commented Sep 19, 2025

this looks amazing 🚀

with this implementation, is it possible to shard individual clusters too or is it per cluster?

This is just for clusters but I think there is another community project we could leverage for the intra cluster sharding

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 22, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 30, 2025
@zachsmith1
Copy link
Contributor Author

@sttts I fixed a startup ordering bug: providers were starting a remote cluster’s cache before registering controller watches and before publishing the cluster to the provider map. That meant the informer’s initial list/sync happened with no handlers attached (so initial “add” events were missed), and early reconciles could fail because mgr.GetCluster(req.ClusterName) wasn’t resolvable yet. We now publish the cluster first (so GetCluster works), then Engage (which registers watches), and only then start the cache and wait for sync. This mirrors controller-runtime’s contract (start event sources before cache sync), ensuring initial add events reach handlers and “existing objects” are reconciled immediately—making the test "runs the reconciler for existing objects" pass.

Copy link
Contributor

@mjudeikis mjudeikis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just few comment from last weekd debugging session. Didnt looked to other code yet,


// Engage before starting cache so handlers are registered for initial adds.
if aware != nil {
if err := aware.Engage(ctx, clusterName, cl); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this not start the provider via runnable and hence trigger the reconciler? So basically triggering the reconciler with not started informers? When adding 2nd or 3rd clusters to clusters, you will trigger the reconciler before the cache?

We had these issues last week with @ntnn . How are you testing this? Meaning, which providers are you running, so it works?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this flow is bit different for 1st cluster where reconcile loop might not be running yet and second, when reconcilers is already there. So once you engage the cluster it already pops up in the queue. And if you have non-shared informers and caches (not started before) - this might fail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Swapping this around feels wrong. I had this originally the other way around in clusters.Clusters with a similar intent of "first register handlers and reconcilers, then start" - but given that this applies to all Awares this can lead to opaque errors. In our case the opaque error was that the mcmanager is waiting on the cache sync when engaging the cluster.

My guess is a similar thing is happening here now and swapping this around in the providers is just masking the actual error.
With the sync engine taking charge of the runnables, maybe this is a problem of the providers being part of the runnables?
Also it looks a bit off that the engine is added as a runnable to the manager but simultaneously the engine takes charge of the runnables?

p.log.Info("Added new cluster")

// engage manager.
// engage before starting cache so handlers are registered for initial adds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. I feel this will not work for follow-up clusters? might work for first one, but not second, third?

Comment on lines +174 to +176
if err := mgr.Add(eng.Runnable()); err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be mgr.GetLocalManager().Add so the engine isn't added as a runnable to itself?
Haven't looked closer but that's what it looks like at a cursory look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants