Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions vllm/v1/metrics/stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,9 @@ class RequestStateStats:
# Track if this request is corrupted (NaNs in logits)
is_corrupted: bool = False

# list of ttit's
inter_token_latencies: list[float] = field(default_factory=list)
Comment on lines +197 to +198
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The inter_token_latencies are collected in RequestStateStats but are not transferred to FinishedRequestStats when a request completes. This means the collected latency data is lost when RequestState is discarded upon request completion.

To fix this, you should also add inter_token_latencies to the FinishedRequestStats dataclass and populate it in IterationStats.update_from_finished_request.

  1. Add the field to FinishedRequestStats:
@dataclass
class FinishedRequestStats:
    # ... existing fields
    is_corrupted: bool = False
    inter_token_latencies: list[float] = field(default_factory=list)
  1. Populate it in update_from_finished_request:
        finished_req = FinishedRequestStats(
            # ... existing assignments
            is_corrupted=req_stats.is_corrupted,
            inter_token_latencies=req_stats.inter_token_latencies,
        )



@dataclass
class FinishedRequestStats:
Expand Down Expand Up @@ -283,6 +286,7 @@ def update_from_output(
else:
itl = engine_core_timestamp - req_stats.last_token_ts
self.inter_token_latencies_iter.append(itl)
req_stats.inter_token_latencies.append(itl)

req_stats.last_token_ts = engine_core_timestamp

Expand Down