You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `llm-d-inference-scheduler` exposes the following Prometheus metrics to monitor its behavior and performance, particularly concerning Prefill/Decode Disaggregation.
4
+
5
+
All metrics are in the `llm_d_inference_scheduler` subsystem.
6
+
7
+
## Scrape and see the metric
8
+
9
+
Metrics defined in the scheduler plugin are extention of Inference Gateway metrics. For more details of seeing metrics, see the [Instruction](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/site-src/guides/metrics-and-observability.md).
10
+
11
+
## Metric Details
12
+
13
+
### `pd_decision_total`
14
+
15
+
***Type:** Counter
16
+
***Labels:**
17
+
*`decision_type`: string ("decode-only" or "prefill-decode")
18
+
***Release Stage:** ALPHA
19
+
***Description:** Counts the number of requests processed, broken down by the Prefill/Decode disaggregation decision.
20
+
*`prefill-decode`: The request was split into separate Prefill and Decode stages.
21
+
*`decode-only`: The request used the Decode-only path.
22
+
***Usage:** Provides a high-level view of how many requests are utilizing the disaggregated path versus the unified path.
23
+
***Actionability:**
24
+
* Monitor the ratio of "prefill-decode" to "decode-only" to understand the P/D engagement rate.
25
+
* Sudden changes in this ratio might indicate configuration issues, changes in workload patterns, or problems with the decision logic.
0 commit comments