diff --git a/.changelog/4026.added.txt b/.changelog/4026.added.txt new file mode 100644 index 0000000000..8222eb7d21 --- /dev/null +++ b/.changelog/4026.added.txt @@ -0,0 +1 @@ +feat(HPA): Adds autoscaling config \ No newline at end of file diff --git a/deploy/helm/sumologic/README.md b/deploy/helm/sumologic/README.md index 60a136401a..649632b3fb 100644 --- a/deploy/helm/sumologic/README.md +++ b/deploy/helm/sumologic/README.md @@ -129,6 +129,7 @@ The following table lists the configurable parameters of the Sumo Logic chart an | `sumologic.metrics.collector.otelcol.autoscaling.minReplicas` | Default min replicas for autoscaling. collector | `1` | | `sumologic.metrics.collector.otelcol.autoscaling.targetCPUUtilizationPercentage` | The desired target CPU utilization for autoscaling. | `70` | | `sumologic.metrics.collector.otelcol.autoscaling.targetMemoryUtilizationPercentage` | The desired target memory utilization for autoscaling. | `70` | +| `sumologic.metrics.collector.otelcol.autoscaling.behavior` | The desired target scaleUp and scaleDown behavior for autoscaling. | `{}` | | `sumologic.metrics.collector.otelcol.serviceMonitorSelector` | Selector for ServiceMonitors used for target discovery. By default, we select ServiceMonitors created by the Chart. See: https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api/targetallocators.md#targetallocatorspecprometheuscr | `{}` | | `sumologic.metrics.collector.otelcol.podMonitorSelector` | Selector for PodMonitors used for target discovery. By default, we select PodMonitors created by the Chart. See: https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api/targetallocators.md#targetallocatorspecprometheuscr | `{}` | | `sumologic.metrics.collector.otelcol.nodeSelector` | Node selector for the otelcol metrics collector. [See help.sumologic.com/docs/send-data/kubernetes/best-practices for more information.](https://help.sumologic.com/docs/send-data/kubernetes/best-practices/). | `{}` | @@ -349,6 +350,7 @@ The following table lists the configurable parameters of the Sumo Logic chart an | `otelcolInstrumentation.config.override` | Configuration for otelcol-instrumentation collector, replaces defaults. | {} | | `otelcolInstrumentation.autoscaling.targetCPUUtilizationPercentage` | The desired target CPU utilization for autoscaling. | `100` | | `otelcolInstrumentation.autoscaling.targetMemoryUtilizationPercentage` | The desired target memory utilization for autoscaling. | `50` | +| `otelcolInstrumentation.autoscaling.behavior` | The desired target scaleUp and scaleDown behavior for autoscaling. | `{}` | | `otelcolInstrumentation.statefulset.replicaCount` | Set the number of otelcol-instrumentation replicasets. | `3` | | `otelcolInstrumentation.statefulset.nodeSelector` | Node selector for otelcol-instrumentation statefulset. [See help.sumologic.com/docs/send-data/kubernetes/best-practices for more information.](https://help.sumologic.com/docs/send-data/kubernetes/best-practices/) | `{}` | | `otelcolInstrumentation.statefulset.priorityClassName` | Priority class name for otelcol-instrumentation pods. | If not provided then set to `RELEASE-NAME-sumologic-priorityclass`. | @@ -385,6 +387,7 @@ The following table lists the configurable parameters of the Sumo Logic chart an | `tracesGateway.autoscaling.maxReplicas` | Default max replicas for autoscaling | `10` | | `tracesGateway.autoscaling.targetCPUUtilizationPercentage` | The desired target CPU utilization for autoscaling. | `100` | | `tracesGateway.autoscaling.targetMemoryUtilizationPercentage` | The desired target memory utilization for autoscaling. | `50` | +| `tracesGateway.autoscaling.behavior` | The desired target scaleUp and scaleDown behavior for autoscaling. | `{}` | | `tracesGateway.deployment.replicas` | Set the number of OpenTelemetry Collector replicas. | `1` | | `tracesGateway.deployment.nodeSelector` | Node selector for otelcol deployment. [See help.sumologic.com/docs/send-data/kubernetes/best-practices for more information.](https://help.sumologic.com/docs/send-data/kubernetes/best-practices/) | `{}` | | `tracesGateway.deployment.priorityClassName` | Priority class name for OpenTelemetry Collector log pods. | `Nil` | @@ -495,6 +498,7 @@ The following table lists the configurable parameters of the Sumo Logic chart an | `metadata.metrics.autoscaling.maxReplicas` | Default max replicas for autoscaling | `10` | | `metadata.metrics.autoscaling.targetCPUUtilizationPercentage` | The desired target CPU utilization for autoscaling. | `80` | | `metadata.metrics.autoscaling.targetMemoryUtilizationPercentage` | The desired target memory utilization for autoscaling. | `Nil` | +| `metadata.metrics.autoscaling.behavior` | The desired target scaleUp and scaleDown behavior for autoscaling. | `{}` | | `metadata.metrics.podDisruptionBudget` | Pod Disruption Budget for metrics metadata enrichment (otelcol) statefulset and for otelcol metrics collector. | `{"minAvailable": 2}` | | `metadata.logs.enabled` | Flag to control deploying the otelcol logs statefulsets. | `true` | | `metadata.logs.logLevel` | Flag to control logging level for OpenTelemetry Collector for logs. Can be `debug`, `info`, `warn`, `error`, `dpanic`, `panic`, `fatal`. | `info` | @@ -529,6 +533,7 @@ The following table lists the configurable parameters of the Sumo Logic chart an | `metadata.logs.autoscaling.maxReplicas` | Default max replicas for autoscaling | `10` | | `metadata.logs.autoscaling.targetCPUUtilizationPercentage` | The desired target CPU utilization for autoscaling. | `80` | | `metadata.logs.autoscaling.targetMemoryUtilizationPercentage` | The desired target memory utilization for autoscaling. | `Nil` | +| `metadata.logs.autoscaling.behavior` | The desired target scaleUp and scaleDown behavior for autoscaling. | `{}` | | `metadata.logs.podDisruptionBudget` | Pod Disruption Budget for logs metadata enrichment (otelcol) statefulset. | `{"minAvailable": 2}` | | `otelevents.image.repository` | Image repository for otelcol docker container. | `` | | `otelevents.image.tag` | Image tag for otelcol docker container. | `` | diff --git a/deploy/helm/sumologic/templates/instrumentation/otelcol-instrumentation/hpa.yaml b/deploy/helm/sumologic/templates/instrumentation/otelcol-instrumentation/hpa.yaml index 8f4d9d803c..2c26543a3b 100644 --- a/deploy/helm/sumologic/templates/instrumentation/otelcol-instrumentation/hpa.yaml +++ b/deploy/helm/sumologic/templates/instrumentation/otelcol-instrumentation/hpa.yaml @@ -16,6 +16,10 @@ spec: name: {{ template "sumologic.metadata.name.otelcolinstrumentation.statefulset" . }} minReplicas: {{ $otelcolInstrumentation.autoscaling.minReplicas }} maxReplicas: {{ $otelcolInstrumentation.autoscaling.maxReplicas }} +{{- if $otelcolInstrumentation.autoscaling.behavior }} + behavior: + {{- toYaml $otelcolInstrumentation.autoscaling.behavior | nindent 4 }} +{{- end }} metrics: {{- if $otelcolInstrumentation.autoscaling.targetMemoryUtilizationPercentage }} - type: Resource diff --git a/deploy/helm/sumologic/templates/instrumentation/traces-gateway/hpa.yaml b/deploy/helm/sumologic/templates/instrumentation/traces-gateway/hpa.yaml index 219504d7bd..4df3cc11c4 100644 --- a/deploy/helm/sumologic/templates/instrumentation/traces-gateway/hpa.yaml +++ b/deploy/helm/sumologic/templates/instrumentation/traces-gateway/hpa.yaml @@ -16,6 +16,10 @@ spec: name: {{ template "sumologic.metadata.name.tracesgateway.deployment" . }} minReplicas: {{ $tracesGateway.autoscaling.minReplicas }} maxReplicas: {{ $tracesGateway.autoscaling.maxReplicas }} +{{- if $tracesGateway.autoscaling.behavior }} + behavior: + {{- toYaml $tracesGateway.autoscaling.behavior | nindent 4 }} +{{- end }} metrics: {{- if $tracesGateway.autoscaling.targetMemoryUtilizationPercentage }} - type: Resource diff --git a/deploy/helm/sumologic/templates/logs/common/hpa.yaml b/deploy/helm/sumologic/templates/logs/common/hpa.yaml index 616c744a16..7ff6219124 100644 --- a/deploy/helm/sumologic/templates/logs/common/hpa.yaml +++ b/deploy/helm/sumologic/templates/logs/common/hpa.yaml @@ -14,6 +14,10 @@ spec: name: {{ template "sumologic.metadata.name.logs.statefulset" . }} minReplicas: {{ .Values.metadata.logs.autoscaling.minReplicas }} maxReplicas: {{ .Values.metadata.logs.autoscaling.maxReplicas }} +{{- if .Values.metadata.logs.autoscaling.behavior }} + behavior: + {{- toYaml .Values.metadata.logs.autoscaling.behavior | nindent 4 }} +{{- end }} metrics: {{- if .Values.metadata.logs.autoscaling.targetMemoryUtilizationPercentage }} - type: Resource diff --git a/deploy/helm/sumologic/templates/metrics/collector/otelcol/opentelemetrycollector.yaml b/deploy/helm/sumologic/templates/metrics/collector/otelcol/opentelemetrycollector.yaml index b89968989e..eff0d609b4 100644 --- a/deploy/helm/sumologic/templates/metrics/collector/otelcol/opentelemetrycollector.yaml +++ b/deploy/helm/sumologic/templates/metrics/collector/otelcol/opentelemetrycollector.yaml @@ -95,6 +95,10 @@ spec: {{- if .Values.sumologic.metrics.collector.otelcol.autoscaling.targetMemoryUtilizationPercentage }} targetMemoryUtilization: {{ .Values.sumologic.metrics.collector.otelcol.autoscaling.targetMemoryUtilizationPercentage }} {{- end }} +{{- if .Values.sumologic.metrics.collector.otelcol.autoscaling.behavior }} + behavior: +{{ toYaml .Values.sumologic.metrics.collector.otelcol.autoscaling.behavior | nindent 6 }} +{{- end }} {{- end }} env: - name: METADATA_METRICS_SVC diff --git a/deploy/helm/sumologic/templates/metrics/common/hpa.yaml b/deploy/helm/sumologic/templates/metrics/common/hpa.yaml index 0d34093394..20fdbdbb43 100644 --- a/deploy/helm/sumologic/templates/metrics/common/hpa.yaml +++ b/deploy/helm/sumologic/templates/metrics/common/hpa.yaml @@ -14,6 +14,10 @@ spec: name: {{ template "sumologic.metadata.name.metrics.statefulset" . }} minReplicas: {{ .Values.metadata.metrics.autoscaling.minReplicas }} maxReplicas: {{ .Values.metadata.metrics.autoscaling.maxReplicas }} +{{- if .Values.metadata.metrics.autoscaling.behavior }} + behavior: + {{- toYaml .Values.metadata.metrics.autoscaling.behavior | nindent 4 }} +{{- end }} metrics: {{- if .Values.metadata.metrics.autoscaling.targetMemoryUtilizationPercentage }} - type: Resource diff --git a/deploy/helm/sumologic/values.yaml b/deploy/helm/sumologic/values.yaml index c5cad9d1dc..e04aa0a3da 100644 --- a/deploy/helm/sumologic/values.yaml +++ b/deploy/helm/sumologic/values.yaml @@ -518,6 +518,7 @@ sumologic: maxReplicas: 10 targetCPUUtilizationPercentage: 70 targetMemoryUtilizationPercentage: 70 + behavior: {} nodeSelector: {} @@ -1388,6 +1389,7 @@ otelcolInstrumentation: maxReplicas: 10 targetCPUUtilizationPercentage: 100 # targetMemoryUtilizationPercentage: 50 + behavior: {} statefulset: nodeSelector: {} @@ -1706,6 +1708,7 @@ metadata: maxReplicas: 10 targetCPUUtilizationPercentage: 80 # targetMemoryUtilizationPercentage: 50 + behavior: {} ## Option to specify PodDisrutionBudgets ## You can specify only one of maxUnavailable and minAvailable in a single PodDisruptionBudget @@ -1824,6 +1827,7 @@ metadata: maxReplicas: 10 targetCPUUtilizationPercentage: 80 # targetMemoryUtilizationPercentage: 50 + behavior: {} ## Option to specify PodDisrutionBudgets ## You can specify only one of maxUnavailable and minAvailable in a single PodDisruptionBudget @@ -1845,6 +1849,7 @@ tracesGateway: maxReplicas: 10 targetCPUUtilizationPercentage: 100 # targetMemoryUtilizationPercentage: 50 + behavior: {} deployment: replicas: 1 diff --git a/tests/helm/testdata/goldenfile/metrics_collector_otc/debug.input.yaml b/tests/helm/testdata/goldenfile/metrics_collector_otc/debug.input.yaml index 837b727a30..ec8853c5c8 100644 --- a/tests/helm/testdata/goldenfile/metrics_collector_otc/debug.input.yaml +++ b/tests/helm/testdata/goldenfile/metrics_collector_otc/debug.input.yaml @@ -3,3 +3,24 @@ debug: collector: print: true stopLogsIngestion: true + +sumologic: + metrics: + collector: + otelcol: + enabled: true + autoscaling: + enabled: true + behavior: + scaleUp: + stabilizationWindowSeconds: 60 + policies: + - type: Percent + value: 100 + periodSeconds: 60 + scaleDown: + stabilizationWindowSeconds: 60 + policies: + - type: Percent + value: 100 + periodSeconds: 60 diff --git a/tests/helm/testdata/goldenfile/metrics_collector_otc/debug.output.yaml b/tests/helm/testdata/goldenfile/metrics_collector_otc/debug.output.yaml index df1c8d13f1..5019803fc0 100644 --- a/tests/helm/testdata/goldenfile/metrics_collector_otc/debug.output.yaml +++ b/tests/helm/testdata/goldenfile/metrics_collector_otc/debug.output.yaml @@ -42,6 +42,20 @@ spec: minReplicas: 1 targetCPUUtilization: 70 targetMemoryUtilization: 70 + behavior: + scaleUp: + stabilizationWindowSeconds: 60 + policies: + - type: Percent + value: 100 + periodSeconds: 60 + scaleDown: + stabilizationWindowSeconds: 60 + policies: + - type: Percent + value: 100 + periodSeconds: 60 + env: - name: METADATA_METRICS_SVC valueFrom: diff --git a/tests/integration/features.go b/tests/integration/features.go index 5d43bdd51e..70280f6a1c 100644 --- a/tests/integration/features.go +++ b/tests/integration/features.go @@ -683,6 +683,31 @@ func GetMultipleMultilineLogsFeature() features.Feature { Feature() } +func GetHPAFeature(releaseName string) features.Feature { + expectedHPA := []string{ + fmt.Sprintf("%s-sumologic-metrics-collector", releaseName), + fmt.Sprintf("%s-sumologic-otelcol-instrumentation", releaseName), + fmt.Sprintf("%s-sumologic-otelcol-logs", releaseName), + fmt.Sprintf("%s-sumologic-otelcol-metrics", releaseName), + fmt.Sprintf("%s-sumologic-traces-gateway", releaseName), + } + expectedMetrics := map[string]map[string]int{} + for _, hpa := range expectedHPA { + expectedMetrics[hpa] = map[string]int{ + "cpu": 75, + "memory": 75, + } + } + return features.New("HPA"). + Assess("HPA configured", stepfuncs.WaitUntilHPAConfigured( + expectedHPA, + expectedMetrics, + waitDuration, + tickDuration, + )). + Feature() +} + func GetEventsFeature() features.Feature { return features.New("events"). Assess("events present", stepfuncs.WaitUntilExpectedLogsPresent( diff --git a/tests/integration/helm_ot_hpa_test.go b/tests/integration/helm_ot_hpa_test.go new file mode 100644 index 0000000000..61ba9889e6 --- /dev/null +++ b/tests/integration/helm_ot_hpa_test.go @@ -0,0 +1,29 @@ +//go:build allversions +// +build allversions + +package integration + +import ( + "testing" + + strings_internal "github.com/SumoLogic/sumologic-kubernetes-collection/tests/integration/internal/strings" +) + +func Test_Helm_OT_HPA(t *testing.T) { + installChecks := []featureCheck{ + CheckSumologicSecret(15), + CheckOtelcolMetadataLogsInstall, + CheckOtelcolMetadataMetricsInstall, + CheckOtelcolEventsInstall, + CheckOtelcolMetricsCollectorInstall, + CheckOtelcolLogsCollectorInstall, + CheckTracesInstall, + } + + featInstall := GetInstallFeature(installChecks) + + releaseName := strings_internal.ReleaseNameFromT(t) + featHPA := GetHPAFeature(releaseName) + + testenv.Test(t, featInstall, featHPA) +} diff --git a/tests/integration/internal/stepfuncs/assess_funcs.go b/tests/integration/internal/stepfuncs/assess_funcs.go index 331c30307d..768b103a23 100644 --- a/tests/integration/internal/stepfuncs/assess_funcs.go +++ b/tests/integration/internal/stepfuncs/assess_funcs.go @@ -4,6 +4,7 @@ import ( "context" "errors" "fmt" + "os" "regexp" "sort" "strings" @@ -12,6 +13,7 @@ import ( appsv1 "k8s.io/api/apps/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" + "k8s.io/client-go/kubernetes" log "k8s.io/klog/v2" "sigs.k8s.io/e2e-framework/klient/k8s" "sigs.k8s.io/e2e-framework/klient/k8s/resources" @@ -351,6 +353,84 @@ func WaitUntilExpectedMetricLabelsPresent( } } +// WaitUntilHPAConfigured returns a features.Func that can be used in `Assess` calls. +// It will wait until all the provided HPA are configured and active. +func WaitUntilHPAConfigured( + expectedHPAMetadata []string, + expectedMetrics map[string]map[string]int, + waitDuration time.Duration, + tickDuration time.Duration, +) features.Func { + return WaitUntilHPAPresent( + expectedHPAMetadata, + expectedMetrics, + waitDuration, + tickDuration, + ) +} + +// WaitUntilHPAPresent returns a features.Func that can be used in `Assess` calls. +// It will wait until all the provided HPA are configured and active. It will not verify the +// functionality of the HPA. +func WaitUntilHPAPresent(expectedHPAMetadata []string, + expectedMetrics map[string]map[string]int, + waitDuration time.Duration, + tickDuration time.Duration, +) features.Func { + return func(ctx context.Context, t *testing.T, envConf *envconf.Config) context.Context { + namespace := ctxopts.Namespace(ctx) + if namespace == "" { + log.Fatalf("Namespace not found in test context") + } + + cfg := envconf.New().WithKubeconfigFile(os.Getenv("KUBECONFIG")) + c, err := kubernetes.NewForConfig(cfg.Client().RESTConfig()) + require.NoError(t, err) + + assert.Eventually(t, func() bool { + totalCount := len(expectedHPAMetadata) + currCount := 0 + for _, expectedHPA := range expectedHPAMetadata { + hpa, err := c.AutoscalingV2().HorizontalPodAutoscalers(namespace). + Get(ctx, expectedHPA, metav1.GetOptions{}) + if err != nil { + log.Errorf("failed to list HPAs: %v", err) + return false + } + + if hpa == nil || hpa.Spec.Metrics == nil || len(hpa.Spec.Metrics) == 0 || hpa.Spec.Behavior == nil { + log.Infof("HPA %s is not configured. HPA: %v", expectedHPA, hpa) + return false + } + + if len(hpa.Spec.Metrics) >= 2 && hpa.Spec.Behavior.ScaleUp != nil && hpa.Spec.Behavior.ScaleDown != nil { + + resourceName1 := hpa.Spec.Metrics[0].Resource.Name + val1 := hpa.Spec.Metrics[0].Resource.Target.AverageUtilization + expectedVal1 := expectedMetrics[expectedHPA][string(resourceName1)] + + resourceName2 := hpa.Spec.Metrics[1].Resource.Name + val2 := hpa.Spec.Metrics[1].Resource.Target.AverageUtilization + expectedVal2 := expectedMetrics[expectedHPA][string(resourceName2)] + + if int(*val1) != expectedVal1 || int(*val2) != expectedVal2 { + log.Infof("HPA %s is configured and active but expectedValue: [%d, %d] "+ + "did not match current value [%d, %d].", + expectedHPA, expectedVal1, expectedVal2, int(*val1), int(*val2)) + return false + } + log.Infof("HPA %s is configured and active.", expectedHPA) + currCount++ + } + } + log.Infof("Total HPA count: %d, Current HPA count: %d, expectedHPA: %v", + totalCount, currCount, expectedHPAMetadata) + return currCount == totalCount + }, waitDuration, tickDuration) + return ctx + } +} + // WaitUntilExpectedMetricsPresent returns a features.Func that can be used in `Assess` calls. // It will wait until the provided number of logs with the provided labels are returned by sumologic-mock's HTTP API on // the provided Service and port, until it succeeds or waitDuration passes. diff --git a/tests/integration/values/values_helm_ot_hpa.yaml b/tests/integration/values/values_helm_ot_hpa.yaml new file mode 100644 index 0000000000..7d37d508e9 --- /dev/null +++ b/tests/integration/values/values_helm_ot_hpa.yaml @@ -0,0 +1,138 @@ +metadata: + logs: + autoscaling: + enabled: true + minReplicas: 1 + maxReplicas: 10 + targetCPUUtilizationPercentage: 75 + targetMemoryUtilizationPercentage: 75 + behavior: + scaleUp: + stabilizationWindowSeconds: 60 + policies: + - type: Percent + value: 100 + periodSeconds: 60 + scaleDown: + stabilizationWindowSeconds: 60 + policies: + - type: Percent + value: 100 + periodSeconds: 60 + metrics: + autoscaling: + enabled: true + minReplicas: 1 + maxReplicas: 10 + targetCPUUtilizationPercentage: 75 + targetMemoryUtilizationPercentage: 75 + behavior: + scaleUp: + stabilizationWindowSeconds: 60 + policies: + - type: Percent + value: 100 + periodSeconds: 60 + scaleDown: + stabilizationWindowSeconds: 60 + policies: + - type: Percent + value: 100 + periodSeconds: 60 + +sumologic: + autoscaling: + enabled: true + events: + enabled: true + logs: + enabled: true + metrics: + enabled: true + collector: + otelcol: + autoscaling: + enabled: true + minReplicas: 1 + maxReplicas: 10 + targetCPUUtilizationPercentage: 75 + targetMemoryUtilizationPercentage: 75 + behavior: + scaleUp: + stabilizationWindowSeconds: 60 + policies: + - type: Percent + value: 100 + periodSeconds: 60 + scaleDown: + stabilizationWindowSeconds: 60 + policies: + - type: Percent + value: 100 + periodSeconds: 60 + +metrics-server: + enabled: true + +opentelemetry-operator: + enabled: true + + manager: + resources: + requests: + cpu: 10m + memory: 64Mi + + kubeRBACProxy: + resources: + requests: + cpu: 5m + memory: 64Mi + +instrumentation: + createDefaultInstrumentation: true + instrumentationNamespaces: "ot-operator-instr-1" + +otelcolInstrumentation: + enabled: true + autoscaling: + enabled: true + minReplicas: 1 + maxReplicas: 10 + targetCPUUtilizationPercentage: 75 + targetMemoryUtilizationPercentage: 75 + behavior: + scaleUp: + stabilizationWindowSeconds: 60 + policies: + - type: Percent + value: 100 + periodSeconds: 60 + scaleDown: + stabilizationWindowSeconds: 60 + policies: + - type: Percent + value: 100 + periodSeconds: 60 + +tracesGateway: + enabled: true + autoscaling: + enabled: true + minReplicas: 1 + maxReplicas: 10 + targetCPUUtilizationPercentage: 75 + targetMemoryUtilizationPercentage: 75 + behavior: + scaleUp: + stabilizationWindowSeconds: 60 + policies: + - type: Percent + value: 100 + periodSeconds: 60 + scaleDown: + stabilizationWindowSeconds: 60 + policies: + - type: Percent + value: 100 + periodSeconds: 60