You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit introduces Prometheus metrics to the scheduler,
starting with a request counter.
It also updates several Go dependencies and adjusts the Dockerfile
to work with the vendored dependencies.
Copy file name to clipboardExpand all lines: Dockerfile.epp
+2-1Lines changed: 2 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,7 @@ WORKDIR /workspace
14
14
# Copy the Go Modules manifests
15
15
COPY go.mod go.mod
16
16
COPY go.sum go.sum
17
+
COPY vendor/ vendor/
17
18
18
19
# Copy the go source
19
20
COPY cmd/ cmd/
@@ -36,7 +37,7 @@ ENV GOOS=${TARGETOS:-linux}
36
37
ENV GOARCH=${TARGETARCH}
37
38
ARG COMMIT_SHA=unknown
38
39
ARG BUILD_REF
39
-
RUN go build -a -o bin/epp -ldflags="-extldflags '-L$(pwd)/lib' -X sigs.k8s.io/gateway-api-inference-extension/version.CommitSHA=${COMMIT_SHA} -X sigs.k8s.io/gateway-api-inference-extension/version.BuildRef=${BUILD_REF}" cmd/epp/main.go
40
+
RUN go build -mod=vendor -a -o bin/epp -ldflags="-extldflags '-L$(pwd)/lib' -X sigs.k8s.io/gateway-api-inference-extension/version.CommitSHA=${COMMIT_SHA} -X sigs.k8s.io/gateway-api-inference-extension/version.BuildRef=${BUILD_REF}" cmd/epp/main.go
40
41
41
42
# Use ubi9 as a minimal base image to package the manager binary
42
43
# Refer to https://catalog.redhat.com/software/containers/ubi9/ubi-minimal/615bd9b4075b022acc111bf5 for more details
The `llm-d-inference-scheduler` exposes the following Prometheus metrics to monitor its behavior and performance, particularly concerning Prefill/Decode Disaggregation.
4
+
5
+
All metrics are in the `llm_d_inference_scheduler` subsystem.
6
+
7
+
## Scrape and see the metric
8
+
9
+
Metrics defined in the scheduler plugin are extention of Inference Gateway metrics. For more details of seeing metrics, see the [Instruction](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/site-src/guides/metrics-and-observability.md).
10
+
11
+
## Metric Details
12
+
13
+
### `pd_decision_total`
14
+
15
+
***Type:** Counter
16
+
***Labels:**
17
+
*`decision_type`: string ("decode-only" or "prefill-decode")
18
+
***Release Stage:** ALPHA
19
+
***Description:** Counts the number of requests processed, broken down by the Prefill/Decode disaggregation decision.
20
+
*`prefill-decode`: The request was split into separate Prefill and Decode stages.
21
+
*`decode-only`: The request used the Decode-only path.
22
+
***Usage:** Provides a high-level view of how many requests are utilizing the disaggregated path versus the unified path.
23
+
***Actionability:**
24
+
* Monitor the ratio of "prefill-decode" to "decode-only" to understand the P/D engagement rate.
25
+
* Sudden changes in this ratio might indicate configuration issues, changes in workload patterns, or problems with the decision logic.
0 commit comments