-
Notifications
You must be signed in to change notification settings - Fork 728
feat: Add logprobs support to vLLM backend (#4683) #4697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add logprobs support to vLLM backend (#4683) #4697
Conversation
|
👋 Hi AryanBagade! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
Implement logprobs functionality for vLLM backend to pass log probability information from vLLM to the frontend. Signed-off-by: Aryan Bagade <aryan@aryanbagade.com>
b2760a7 to
3566e82
Compare
WalkthroughAdds end-to-end logprobs support for the vLLM backend: request output_options are consumed into sampling params, per-token logprobs/top_logprobs are extracted from vLLM CompletionOutput, emitted with token streams, and tests/payload validators added to cover the feature. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
tests/utils/payloads.py (2)
158-188: Strengthen chat logprobs validation to catch null/empty responsesRight now this passes if the backend returns
"logprobs": nullor"content": []even when you explicitly requested logprobs. If the intent of this payload is to ensure logprobs are actually produced, consider asserting thatlogprobs_data is not Noneand thatcontent_logprobsis non-empty, so regressions where logprobs disappear don’t silently pass these tests.
213-244: Make completion logprobs checks fail when logprobs are omittedSimilar to the chat variant, this validation will still succeed when
choice["logprobs"]is present butNone(or whentoken_logprobsis empty). If this test is meant to guarantee that logprobs are actually returned, you may want to assert thatlogprobs_data is not Noneand thattoken_logprobsis non-empty, in addition to the existing length equality check.components/src/dynamo/vllm/handlers.py (2)
99-110: output_options mapping is fine but only covers logprobs/prompt_logprobsThe mapping of
output_options["logprobs"]and"prompt_logprobs"intoSamplingParamslooks reasonable and defensive (casting toint). Just be aware that any other keys added underoutput_optionswon’t be honored here; if future output options are introduced, this is the place they’ll need to be plumbed through.
573-637: Keep per-token alignment between token_ids, log_probs, and top_logprobsThe overall extraction logic looks correct, but two details are worth tightening:
- You
continuewhentoken_logprobs_dict is Noneand only append tolog_probswhen you find at least one logprob, whiletop_logprobsalways gets an entry (even if empty). This can easily lead tolen(token_ids_slice) != len(log_probs)and/orlen(log_probs) != len(top_logprobs).- Downstream consumers typically assume a 1:1 alignment between emitted tokens and their logprob/top_logprobs entries.
To avoid subtle bugs later, consider always appending a placeholder (e.g.,
Noneand[]) for positions where logprobs are missing, and ensurelog_probsandtop_logprobsalways have the same length as the newtoken_idsslice.tests/serve/test_vllm.py (1)
18-25: New aggregated_logprobs scenario is consistent with existing VLLMConfigsThe additional imports and the
aggregated_logprobsentry plug neatly into the existingvllm_configspattern and should exercise both chat and completion logprobs end-to-end without affecting other scenarios. If you ever want logprobs runs to also validate metrics, you could add ametric_payload_defaulthere similar to"aggregated", but that’s not strictly necessary.Also applies to: 56-78
tests/utils/payload_builder.py (1)
294-371: *New _with_logprobs builders nicely encapsulate logprobs-specific testsThese helpers make the logprobs test cases much clearer by coupling the request shape with the stronger validation payload types. One minor nit:
chat_payloadtreatslogprobsas anOptional[int], whilechat_payload_with_logprobshardcodes"logprobs": True. For long-term maintainability it might be worth aligning on a single convention (either always an int count or always the OpenAI-stylebool + top_logprobs) to avoid confusion about what “logprobs” means at the HTTP layer.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
components/src/dynamo/vllm/handlers.py(4 hunks)tests/serve/test_vllm.py(2 hunks)tests/utils/payload_builder.py(5 hunks)tests/utils/payloads.py(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-24T20:59:35.725Z
Learnt from: ishandhanani
Repo: ai-dynamo/dynamo PR: 1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
Applied to files:
components/src/dynamo/vllm/handlers.py
🧬 Code graph analysis (1)
tests/utils/payload_builder.py (1)
tests/utils/payloads.py (3)
ChatPayloadWithLogprobs(159-188)CompletionPayload(192-210)CompletionPayloadWithLogprobs(214-243)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
components/src/dynamo/vllm/handlers.py (1)
684-691: Streaming integration of logprobs looks goodHooking
_extract_logprobsintogenerate_tokensand attaching"log_probs"/"top_logprobs"to the streamed payloads is clean and isolated; existing fields (token_ids,finish_reason,completion_usage, etc.) are unchanged, so this should be backward-compatible for callers that ignore the new keys.tests/utils/payload_builder.py (2)
7-14: chat_payload enhancements for logprobs are straightforward and non-breakingAllowing
logprobsandtop_logprobsto flow through the genericchat_payloadbody is a clean extension and doesn’t disturb existing callers. The defaults and behavior for other arguments remain unchanged.Also applies to: 131-165
167-191: completion_payload logprobs plumbed cleanlyThe added
logprobsparameter and the explicitbodyconstruction keep the function flexible while remaining compatible with prior usage. This also mirrors the pattern used for chat, which helps keep the payload API coherent.
|
/ok to test 2deb598 |
|
will resolve the merge conflict! |
Signed-off-by: Aryan Bagade <73382554+AryanBagade@users.noreply.github.com>
Signed-off-by: Aryan Bagade <aryan@aryanbagade.com>
Signed-off-by: Aryan Bagade <aryan@aryanbagade.com>
|
Added input validation for logprobs/prompt_logprobs :-> basically previous code would crash if someone passed an invalid value like "abc" or a negative number. Now it validates and logs a warning instead of throwing ValueError. |
|
/ok to test c3d2f28 |
Implement logprobs functionality for vLLM backend to pass log probability information from vLLM to the frontend.
Overview:
This PR adds logprobs support to the vLLM backend, enabling it to pass log probability information from vLLM's Request-Output back to the Dynamo's frontend. The frontend already supports logprobs outputs, but the vLLM backend was not propagating this particular info when giving responses.
Details:
handlers.py changes:
Testing:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.