Skip to content

Conversation

@AryanBagade
Copy link
Contributor

@AryanBagade AryanBagade commented Dec 2, 2025

Implement logprobs functionality for vLLM backend to pass log probability information from vLLM to the frontend.

Overview:

This PR adds logprobs support to the vLLM backend, enabling it to pass log probability information from vLLM's Request-Output back to the Dynamo's frontend. The frontend already supports logprobs outputs, but the vLLM backend was not propagating this particular info when giving responses.

Details:

handlers.py changes:

  • Added support for logprobs and prompt_logprobs in build_sampling_params(), these get passed to vLLM's SamplingParams
  • New _extract_logprobs() method grabs the logprob data from vLLM's output nd formats it for Dynamo
  • Updated generate_tokens() to call the extraction method nd add logprobs to the response

Testing:

  • Created validation classes that check logprobs are actually present and correctly structured in responses
  • Added aggregated_logprobs test config that tests both chat and completion endpoints with logprobs enabled
  • Tests validate the full round-trip: request with logprobs → vLLM processing → response with logprobs

Where should the reviewer start?

  1. components/src/dynamo/vllm/handlers.py - Core implementation:
  • Lines 99-110: output_options handling in build_sampling_params()
  • Lines 574-636: _extract_logprobs() method implementation
  • Lines 684-691: Integration in generate_tokens() method
  1. tests/utils/payloads.py - Validation logic:
  • Lines 159-188: ChatPayloadWithLogprobs validation
  • Lines 214-239: CompletionPayloadWithLogprobs validation
  1. tests/serve/test_vllm.py - E2E test:
  • Lines 56-78: aggregated_logprobs test configuration

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features

    • Added support for returning token-level log probabilities (logprobs and top_logprobs) in API responses
    • Chat and completion requests can now enable logprobs via request options
  • Tests

    • Added test payloads and validators to verify logprobs are present and correctly structured
    • New test configurations exercise logprob behavior for both chat and completion flows

✏️ Tip: You can customize this high-level summary in your review settings.

@AryanBagade AryanBagade requested review from a team as code owners December 2, 2025 11:21
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 2, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

👋 Hi AryanBagade! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added external-contribution Pull request is from an external contributor feat labels Dec 2, 2025
Implement logprobs functionality for vLLM backend to pass log probability information from vLLM to the frontend.

Signed-off-by: Aryan Bagade <aryan@aryanbagade.com>
@AryanBagade AryanBagade force-pushed the feat/add-logprobs-support-vllm-backend branch from b2760a7 to 3566e82 Compare December 2, 2025 11:23
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

Walkthrough

Adds end-to-end logprobs support for the vLLM backend: request output_options are consumed into sampling params, per-token logprobs/top_logprobs are extracted from vLLM CompletionOutput, emitted with token streams, and tests/payload validators added to cover the feature.

Changes

Cohort / File(s) Change Summary
Core vLLM handlers
components/src/dynamo/vllm/handlers.py
Added _extract_logprobs() (static) in handler classes; updated build_sampling_params() to read output_options (map logprobs and prompt_logprobs); integrated logprob extraction into token generation/streaming payloads and minor control-flow/logging adjustments.
Tests: vLLM config
tests/serve/test_vllm.py
Added aggregated_logprobs VLLMConfig entry; imported and wired chat_payload_with_logprobs and completion_payload_with_logprobs into request_payloads.
Tests: payload builders
tests/utils/payload_builder.py
Extended chat_payload() and completion_payload() signatures to accept logprobs/top_logprobs; added chat_payload_with_logprobs() and completion_payload_with_logprobs() builders that produce requests with logprobs enabled.
Tests: payload validators
tests/utils/payloads.py
Added ChatPayloadWithLogprobs and CompletionPayloadWithLogprobs classes that extend existing validators to assert presence and structure of logprobs, token_logprobs, tokens, and top_logprobs in responses.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review _extract_logprobs() logic for correct indexing relative to num_output_tokens_so_far and handling of missing fields.
  • Verify build_sampling_params() correctly converts request output_options into integer sampling params and doesn't alter other sampling behavior.
  • Confirm token-stream emission in generate_tokens() includes logprob fields without disrupting existing streaming payload format.
  • Check new test payload validators for robust JSON parsing and assertions that match runtime response shapes.

Poem

🐇 I nibble through outputs, counting each hop,
Tokens and logprobs — a delicious crop.
I fetch top choices and tidy their sums,
Streaming probability carrots for curious crumbs.
Hooray — the backend now sings in full stop!

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically describes the main change: adding logprobs support to the vLLM backend.
Description check ✅ Passed The PR description comprehensively covers all required template sections with detailed explanations of changes, testing approach, and specific line numbers for reviewer guidance.
Linked Issues check ✅ Passed The PR successfully implements all requirements from issue #4683: logprobs extraction in handlers, LLMEngineOutput field population, and E2E test coverage for both chat and completion endpoints.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing logprobs support: handlers logic, test utilities for payload building/validation, and E2E test configuration; no unrelated modifications detected.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
tests/utils/payloads.py (2)

158-188: Strengthen chat logprobs validation to catch null/empty responses

Right now this passes if the backend returns "logprobs": null or "content": [] even when you explicitly requested logprobs. If the intent of this payload is to ensure logprobs are actually produced, consider asserting that logprobs_data is not None and that content_logprobs is non-empty, so regressions where logprobs disappear don’t silently pass these tests.


213-244: Make completion logprobs checks fail when logprobs are omitted

Similar to the chat variant, this validation will still succeed when choice["logprobs"] is present but None (or when token_logprobs is empty). If this test is meant to guarantee that logprobs are actually returned, you may want to assert that logprobs_data is not None and that token_logprobs is non-empty, in addition to the existing length equality check.

components/src/dynamo/vllm/handlers.py (2)

99-110: output_options mapping is fine but only covers logprobs/prompt_logprobs

The mapping of output_options["logprobs"] and "prompt_logprobs" into SamplingParams looks reasonable and defensive (casting to int). Just be aware that any other keys added under output_options won’t be honored here; if future output options are introduced, this is the place they’ll need to be plumbed through.


573-637: Keep per-token alignment between token_ids, log_probs, and top_logprobs

The overall extraction logic looks correct, but two details are worth tightening:

  • You continue when token_logprobs_dict is None and only append to log_probs when you find at least one logprob, while top_logprobs always gets an entry (even if empty). This can easily lead to len(token_ids_slice) != len(log_probs) and/or len(log_probs) != len(top_logprobs).
  • Downstream consumers typically assume a 1:1 alignment between emitted tokens and their logprob/top_logprobs entries.

To avoid subtle bugs later, consider always appending a placeholder (e.g., None and []) for positions where logprobs are missing, and ensure log_probs and top_logprobs always have the same length as the new token_ids slice.

tests/serve/test_vllm.py (1)

18-25: New aggregated_logprobs scenario is consistent with existing VLLMConfigs

The additional imports and the aggregated_logprobs entry plug neatly into the existing vllm_configs pattern and should exercise both chat and completion logprobs end-to-end without affecting other scenarios. If you ever want logprobs runs to also validate metrics, you could add a metric_payload_default here similar to "aggregated", but that’s not strictly necessary.

Also applies to: 56-78

tests/utils/payload_builder.py (1)

294-371: *New _with_logprobs builders nicely encapsulate logprobs-specific tests

These helpers make the logprobs test cases much clearer by coupling the request shape with the stronger validation payload types. One minor nit: chat_payload treats logprobs as an Optional[int], while chat_payload_with_logprobs hardcodes "logprobs": True. For long-term maintainability it might be worth aligning on a single convention (either always an int count or always the OpenAI-style bool + top_logprobs) to avoid confusion about what “logprobs” means at the HTTP layer.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71f94ed and 3566e82.

📒 Files selected for processing (4)
  • components/src/dynamo/vllm/handlers.py (4 hunks)
  • tests/serve/test_vllm.py (2 hunks)
  • tests/utils/payload_builder.py (5 hunks)
  • tests/utils/payloads.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-24T20:59:35.725Z
Learnt from: ishandhanani
Repo: ai-dynamo/dynamo PR: 1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.

Applied to files:

  • components/src/dynamo/vllm/handlers.py
🧬 Code graph analysis (1)
tests/utils/payload_builder.py (1)
tests/utils/payloads.py (3)
  • ChatPayloadWithLogprobs (159-188)
  • CompletionPayload (192-210)
  • CompletionPayloadWithLogprobs (214-243)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
components/src/dynamo/vllm/handlers.py (1)

684-691: Streaming integration of logprobs looks good

Hooking _extract_logprobs into generate_tokens and attaching "log_probs" / "top_logprobs" to the streamed payloads is clean and isolated; existing fields (token_ids, finish_reason, completion_usage, etc.) are unchanged, so this should be backward-compatible for callers that ignore the new keys.

tests/utils/payload_builder.py (2)

7-14: chat_payload enhancements for logprobs are straightforward and non-breaking

Allowing logprobs and top_logprobs to flow through the generic chat_payload body is a clean extension and doesn’t disturb existing callers. The defaults and behavior for other arguments remain unchanged.

Also applies to: 131-165


167-191: completion_payload logprobs plumbed cleanly

The added logprobs parameter and the explicit body construction keep the function flexible while remaining compatible with prior usage. This also mirrors the pattern used for chat, which helps keep the payload API coherent.

@rmccorm4
Copy link
Contributor

rmccorm4 commented Dec 3, 2025

/ok to test 2deb598

@AryanBagade
Copy link
Contributor Author

will resolve the merge conflict!

AryanBagade and others added 3 commits December 9, 2025 01:32
Signed-off-by: Aryan Bagade <73382554+AryanBagade@users.noreply.github.com>
Signed-off-by: Aryan Bagade <aryan@aryanbagade.com>
Signed-off-by: Aryan Bagade <aryan@aryanbagade.com>
@AryanBagade
Copy link
Contributor Author

Added input validation for logprobs/prompt_logprobs :-> basically previous code would crash if someone passed an invalid value like "abc" or a negative number. Now it validates and logs a warning instead of throwing ValueError.

@AryanBagade AryanBagade requested a review from karen-sy December 9, 2025 10:19
@rmccorm4
Copy link
Contributor

rmccorm4 commented Dec 9, 2025

/ok to test c3d2f28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor feat size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Add logprobs support to vLLM Backend

3 participants