-
Notifications
You must be signed in to change notification settings - Fork 17
Implement Cluster Scaling and StatefulSet Management #270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes in this pull request involve modifications to several files, primarily focusing on enhancing the Dockerfile and improving the functionality of the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (7)
cmd/app/commandline.go (1)
51-51: Enhance flag description and add domain validationWhile the flag is correctly defined, consider:
- Expanding the description to explain how it affects etcd client endpoint construction
- Adding validation for the domain format to prevent configuration issues
- pflag.String("cluster-domain", "cluster.local", "The cluster domain configured in kube-dns") + pflag.String("cluster-domain", "cluster.local", "The cluster domain configured in kube-dns. Used for constructing etcd client endpoints (e.g., service.namespace.svc.<cluster-domain>)")Also, consider adding domain validation in the ParseCmdLine function:
// Add after viper.GetString("cluster-domain") domain := viper.GetString("cluster-domain") if !isDomainValid(domain) { exitUsage(fmt.Errorf("invalid cluster domain format: %s", domain)) } // Helper function func isDomainValid(domain string) bool { // RFC 1123 subdomain regex pattern := `^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$` matched, _ := regexp.MatchString(pattern, domain) return matched }internal/controller/observables.go (2)
137-141: Address the TODO: Implement proper replica determination logic indesiredReplicas.The function
desiredReplicascontains a TODO comment indicating the need for a robust logic to determine the correct number of replicas based on the cluster state. Implementing this logic is important for accurate scaling behavior.Would you like assistance in developing the replica determination algorithm?
177-181: Implement thestatefulSetPodSpecCorrectfunction.The
statefulSetPodSpecCorrectfunction currently always returnstrueand includes a TODO comment. Implementing this function is crucial to ensure that the StatefulSet's pod specification matches the desired configuration, facilitating proper reconciliation.Would you like help in implementing this function to compare the desired and existing StatefulSet specifications?
internal/controller/etcdcluster_controller.go (4)
149-149: Remove outdated TODO comment forscaleUpFromZeromethod.The
scaleUpFromZerofunction is marked with a TODO comment indicating it needs implementing, but it appears to be implemented later in the code (lines 737–749). Consider removing the outdated TODO comment to avoid confusion.- return ctrl.Result{}, r.scaleUpFromZero(ctx, state) // TODO: needs implementing + return ctrl.Result{}, r.scaleUpFromZero(ctx, state)
Line range hint
634-682: Prevent potential panic due to premature channel closure in goroutines.The error channel
cis closed withdefer close(c)at the beginning of the function. If any goroutine attempts to send tocafter it has been closed, it will cause a panic.Move
close(c)to after all goroutines have completed to ensure they do not send to a closed channel:func (r *EtcdClusterReconciler) ensureUnconditionalObjects(ctx context.Context, state *observables) error { - c := make(chan error) - defer close(c) ctx, cancel := context.WithCancel(ctx) defer cancel() + c := make(chan error) var wg sync.WaitGroup wg.Add(concurrentOperations) wrapWithMsg := func(err error, msg string) error { if err != nil { return fmt.Errorf(msg+": %w", err) } return nil } // Goroutine definitions... for i := 0; i < concurrentOperations; i++ { if err := <-c; err != nil { cancel() wg.Wait() + close(c) return err } } + wg.Wait() + close(c) return nil }
736-749: Implement thepatchOrCreateObjectmethod.The
patchOrCreateObjectmethod is marked with a TODO and currently lacks a complete implementation. This function is critical for applying or creating Kubernetes objects.Would you like assistance in implementing
patchOrCreateObject? Proper implementation should handle both patching existing objects and creating new ones if they do not exist.
149-149: Remove redundant code comments and keep code clean.There are several comments like
// needs implementingthat may no longer be relevant. Keeping code comments up-to-date helps maintain code readability.Review and remove any outdated or unnecessary comments to keep the codebase clean.
🛑 Comments failed to post (8)
internal/controller/observables.go (4)
184-185:
⚠️ Potential issueAdd nil check for
statefulSet.Spec.ReplicasinstatefulSetReady.Since
o.statefulSet.Spec.Replicasis a pointer, it may benil. Add a nil check to avoid potential nil pointer dereferences when comparing replica counts.Apply this diff to add the nil check:
func (o *observables) statefulSetReady() bool { + if o.statefulSet.Spec.Replicas == nil { + return false + } return o.statefulSet.Status.ReadyReplicas == *o.statefulSet.Spec.Replicas }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.func (o *observables) statefulSetReady() bool { if o.statefulSet.Spec.Replicas == nil { return false } return o.statefulSet.Status.ReadyReplicas == *o.statefulSet.Spec.Replicas }
83-88:
⚠️ Potential issueFix off-by-one error in
memberListsAllEqualfunction.The loop currently starts from index
0, comparingmemberLists[0]to itself, which is unnecessary. To correctly compare each member list to the first one, the loop should start fromi = 1.Apply this diff to fix the off-by-one error:
func (o *observables) memberListsAllEqual() bool { // ... - for i := range memberLists { + for i := 1; i < len(memberLists); i++ { if !memberLists[0].Equals(memberLists[i]) { return false } } return true }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.for i := 1; i < len(memberLists); i++ { if !memberLists[0].Equals(memberLists[i]) { return false } } return true
104-117:
⚠️ Potential issueAdd checks to prevent index out of range errors in
pvcMaxIndex.The code assumes that splitting the PVC name by
"-"will result in a slice with at least one element. If the PVC name does not contain"-", accessingtokens[len(tokens)-1]could cause an index out of range error. Add a check to ensurelen(tokens) > 0before accessing the last element.Apply this diff to add the necessary checks:
func (o *observables) pvcMaxIndex() (max int) { max = -1 for i := range o.pvcs { tokens := strings.Split(o.pvcs[i].Name, "-") + if len(tokens) == 0 { + continue + } index, err := strconv.Atoi(tokens[len(tokens)-1]) if err != nil { continue } if index > max { max = index } } return max }Committable suggestion skipped: line range outside the PR's diff.
119-135:
⚠️ Potential issuePrevent index out of range errors in
endpointMaxIndex.The function assumes that the endpoint string is formatted to allow safe access to
tokens[len(tokens)-2]and subsequent indices. If the endpoint format is unexpected, this could result in index out of range errors. Add checks to ensure the slices have sufficient length before accessing their elements.Apply this diff to add the necessary checks:
func (o *observables) endpointMaxIndex() (max int) { for i := range o.endpoints { tokens := strings.Split(o.endpoints[i], ":") if len(tokens) < 2 { continue } tokens = strings.Split(tokens[len(tokens)-2], "-") + if len(tokens) == 0 { + continue + } index, err := strconv.Atoi(tokens[len(tokens)-1]) if err != nil { continue } if index > max { max = index } } return max }Committable suggestion skipped: line range outside the PR's diff.
internal/controller/etcdcluster_controller.go (4)
130-134:
⚠️ Potential issueImplement missing methods
state.statefulSetPodSpecCorrect()andfactory.TemplateStatefulSet().The code references
state.statefulSetPodSpecCorrect()andfactory.TemplateStatefulSet(), which are marked with TODO comments indicating they need to be implemented. Without these implementations, the reconciliation logic may not function as intended.Please implement these methods to ensure that the stateful set pod specifications are correctly validated and templated.
729-733:
⚠️ Potential issueAddress incomplete
createClusterFromScratchmethod containing a panic.The
createClusterFromScratchmethod ends withpanic("not yet implemented"), which will crash the controller if this code path is executed.Consider implementing this method or adding proper error handling to prevent unexpected panics. If the method is not ready for use, ensure that it's not called until fully implemented.
784-784:
⚠️ Potential issueAddress potential nil panic detected by static analysis.
Static analysis indicates a potential nil panic at line 784. Ensure that
memberListRespis not nil before accessing itsMembersfield.Add a check after
MemberListto verifymemberListRespis not nil:memberListResp, err := clusterClient.MemberList(ctx) if err != nil { return fmt.Errorf("failed to get member list: %w", err) } + if memberListResp == nil { + return fmt.Errorf("failed to get member list: response is nil") + }Alternatively, investigate why
memberListRespcould be nil even whenerris nil.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.memberListResp, err := clusterClient.MemberList(ctx) if err != nil { return fmt.Errorf("failed to get member list: %w", err) } if memberListResp == nil { return fmt.Errorf("failed to get member list: response is nil") }🧰 Tools
🪛 GitHub Check: nilaway-lint
[failure] 784-784:
�[31merror: �[0mPotential nil panic detected. Observed nil flow from source to dereference point:
777-786:
⚠️ Potential issueAdd nil check for
clusterClientto prevent nil pointer dereference.In the
promoteLearnersfunction,clusterClientmay benileven iferrisnilafter callingfactory.NewEtcdClientSet. UsingclusterClient.MemberList(ctx)without checking could cause a nil pointer dereference.Add a nil check for
clusterClientbefore using it:clusterClient, _, err := factory.NewEtcdClientSet(ctx, state.instance, r.Client) if err != nil { return fmt.Errorf("failed to create etcd client: %w", err) } + if clusterClient == nil { + return fmt.Errorf("failed to create etcd client: client is nil") + } defer clusterClient.Close()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.clusterClient, _, err := factory.NewEtcdClientSet(ctx, state.instance, r.Client) if err != nil { return fmt.Errorf("failed to create etcd client: %w", err) } if clusterClient == nil { return fmt.Errorf("failed to create etcd client: client is nil") } defer clusterClient.Close() // Retrieve the current member list memberListResp, err := clusterClient.MemberList(ctx) if err != nil { return fmt.Errorf("failed to get member list: %w", err)🧰 Tools
🪛 GitHub Check: nilaway-lint
[failure] 784-784:
�[31merror: �[0mPotential nil panic detected. Observed nil flow from source to dereference point:
a5725af to
0987b97
Compare
Summary
This PR implements key functions in the
EtcdClusterReconcilerfor the etcd-operator controller. These functions add critical functionality for managing cluster scaling, cluster state ConfigMap updates, StatefulSet management, and learner promotion within an etcd cluster.Key Changes
Cluster Scaling from Zero:
scaleUpFromZeroto handle the case where the etcd cluster is scaled up from zero replicas.Cluster State ConfigMap Management:
createOrUpdateClusterStateConfigMapfunction added to manage the cluster state ConfigMap, updating it based on the current state of the cluster.StatefulSet Creation and Update:
createOrUpdateStatefulSetfunction added to manage StatefulSet creation and updates, ensuring consistency between the desired and actual StatefulSet configurations.Learner Promotion:
promoteLearnersto promote any learner members to full voting members in the etcd cluster.Implementation Details
Reconcileto pass thestateparameter, ensuring alignment with updated function definitions.Testing
Future Improvements
This PR significantly enhances the functionality of the etcd-operator by providing robust cluster scaling and management capabilities.
Summary by CodeRabbit
New Features
--cluster-domainfor specifying the cluster domain, defaulting to"cluster.local".Improvements
observablesstruct.Bug Fixes
Chores