-
Notifications
You must be signed in to change notification settings - Fork 260
test: expand LRP test to include lifecycle events #4086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds negative Local Redirect Policy (LRP) test scenarios and parameterizes Prometheus endpoint usage in existing LRP tests.
- Extends testLRPCase to accept a Prometheus address for flexible metric validation.
- Introduces comprehensive negative and resilience LRP tests (resource recreation, pod restarts, cilium validation).
- Adds helper functions for validating Cilium LRP state and recreating test resources.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| test/integration/lrp/lrp_test.go | Adds negative LRP test flow, Prometheus address parameter, and Cilium/LRP validation helpers. |
| test/integration/lrp/lrp_fqdn_test.go | Updates calls to testLRPCase to include the new Prometheus address parameter. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| require.Greater(t, afterValue, beforeValue, "dns metric count did not increase after command - before: %.0f, after: %.0f", beforeValue, afterValue) | ||
| } else { | ||
| require.Equal(t, afterMetric.GetCounter().GetValue(), beforeMetric.GetCounter().GetValue(), "dns metric count increased after command") | ||
| require.Equal(t, afterValue, beforeValue, "dns metric count increased after command - before: %.0f, after: %.0f", beforeValue, afterValue) |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The testify assertions use formatted messages with placeholders but require.Greater and require.Equal do not perform format substitution; use require.Greaterf / require.Equalf or pre-format the message with fmt.Sprintf to ensure values appear in failure output.
| require.Equal(t, afterValue, beforeValue, "dns metric count increased after command - before: %.0f, after: %.0f", beforeValue, afterValue) | |
| require.Equalf(t, afterValue, beforeValue, "dns metric count increased after command - before: %.0f, after: %.0f", beforeValue, afterValue) |
| // Wait for deletion to complete | ||
| time.Sleep(10 * time.Second) |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed sleep introduces unnecessary delay and potential flakiness; replace with a poll-based wait (e.g., repeatedly checking for resource absence/recreation with a timeout) to reduce test duration and improve reliability.
| // Wait for deletion to complete | |
| time.Sleep(10 * time.Second) | |
| // Wait for deletion to complete (poll for absence of client DaemonSet and LRP) | |
| retry.DoWithTimeout(ctx, "wait for client DaemonSet deletion", 30*time.Second, func(ctx context.Context) (bool, error) { | |
| _, err := dsClient.Get(ctx, clientDS.Name, metav1.GetOptions{}) | |
| if err != nil { | |
| // DaemonSet not found | |
| return true, nil | |
| } | |
| // Still exists | |
| return false, nil | |
| }) | |
| retry.DoWithTimeout(ctx, "wait for LRP deletion", 30*time.Second, func(ctx context.Context) (bool, error) { | |
| _, err := lrpClient.Get(ctx, lrp.Name, metav1.GetOptions{}) | |
| if err != nil { | |
| // LRP not found | |
| return true, nil | |
| } | |
| // Still exists | |
| return false, nil | |
| }) |
321a3ea to
ba2b602
Compare
ba2b602 to
b947a40
Compare
49f04a2 to
1b852ce
Compare
dfb53d7 to
4e6b260
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
test/integration/lrp/lrp_test.go:517
- The
rand.Seed()call is made every timeTakeOne()is invoked, which can cause issues in concurrent tests or when called multiple times in quick succession (as the seed would be very similar). Additionally,rand.Seed()has been deprecated since Go 1.20 in favor of using the global generator which is automatically seeded.
Recommendation: Remove the rand.Seed() call and use the global random generator directly, which is automatically seeded in Go 1.20+:
func TakeOne[T any](slice []T) T {
if len(slice) == 0 {
var zero T
return zero
}
return slice[rand.Intn(len(slice))]
}Alternatively, if you need better randomness, use crypto/rand or math/rand/v2 for Go 1.22+.
// TakeOne takes one item from the slice randomly; if empty, it returns the empty value for the type
// Use in testing only
func TakeOne[T any](slice []T) T {
if len(slice) == 0 {
var zero T
return zero
}
rand.Seed(uint64(time.Now().UnixNano()))
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test/integration/manifests/cilium/v1.13/cilium-config/cilium-config.yaml
Show resolved
Hide resolved
test/integration/manifests/cilium/v1.13/cilium-config/cilium-config.yaml
Show resolved
Hide resolved
| // Wait for deletion to complete | ||
| time.Sleep(10 * time.Second) |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Using a hardcoded 10-second sleep to wait for resource deletion is fragile and can lead to flaky tests. The sleep might be too short in slow environments or unnecessarily long in fast environments.
Recommendation: Replace the hardcoded sleep with proper wait functions to ensure deletion has completed. You can use WaitForPodsDelete() for pods and consider implementing similar wait functions for LRP deletion, or use Kubernetes' metadata.deletionTimestamp to poll for actual deletion:
// Wait for client pods to be deleted
err = kubernetes.WaitForPodsDelete(ctx, cs, originalPod.Namespace, clientLabelSelector)
require.NoError(t, err)
// Wait for LRP to be deleted by polling
err = waitForLRPDelete(ctx, lrpClient, lrp.Name)
require.NoError(t, err)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
|
|
||
| // Step 1: Initial DNS test to verify LRP is working | ||
| t.Log("Step 1: Initial DNS test - verifying LRP functionality") | ||
| testLRPCase(t, ctx, clientPod, []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the same as the basic lrp test called in TestLRP?
| if strings.Contains(line, "nodelocaldns") && strings.Contains(line, "kube-system") { | ||
| // Validate that the line contains expected components | ||
| require.Contains(t, line, "kube-system", "LRP line should contain kube-system namespace") | ||
| require.Contains(t, line, "nodelocaldns", "LRP line should contain nodelocaldns name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two require checks are redundant right? If we are in the if statement the line should already contain kube-system and nodelocaldns.
|
|
||
| // restartClientPodsAndGetPod restarts the client daemonset and returns a new pod reference | ||
| func restartClientPodsAndGetPod(t *testing.T, ctx context.Context, cs *k8sclient.Clientset, originalPod corev1.Pod) corev1.Pod { | ||
| // Find the daemonset name by looking up the pod's owner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment still accurate? we seem to be finding the pod's node name with the pod detail information
| ciliumCS, err := ciliumClientset.NewForConfig(config) | ||
| require.NoError(t, err) | ||
|
|
||
| nodeName := originalPod.Spec.NodeName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing this with restartClientPodsAndGetPod-- this seems like a simpler way to get the original pod's nodename. Is there a difference between this and doing podDetails, err := cs.CoreV1().Pods(originalPod.Namespace).Get(ctx, originalPod.Name, metav1.GetOptions{}) in restartClientPodsAndGetPod ?
| recreatedPod := deleteAndRecreateResources(t, ctx, cs, clientPod) | ||
|
|
||
| // Step 7: Final verification after recreation | ||
| t.Log("Step 7: Final verification after resource recreation - skipping basic DNS test, will validate with metrics in Step 8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there supposed to be code for step 7?
Reason for Change:
Test Steps:
Purpose: This comprehensive lifecycle test ensures that Local Redirect Policy functionality is robust and survives various operational scenarios including pod restarts, resource deletion/recreation, and service restarts - validating both functional behavior through DNS metrics and dataplane configuration through Cilium CLI inspection.
Issue Fixed:
Requirements:
Notes: