-
Notifications
You must be signed in to change notification settings - Fork 56
Handle InternalServerError with choices=None to enable temperature bumping #1121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Coverage Report •
|
||||||||||||||||||||||||||||||
f5a032c to
9cb3e93
Compare
…mping Some LLM providers occasionally return malformed responses with choices=None. When this happens, litellm raises InternalServerError during internal validation in convert_to_model_response_object before returning the response to our code. Previously, these InternalServerErrors were retried, but without temperature bumping. Since temperature=0.0 is deterministic, all retries would hit the same provider bug and fail. This change detects InternalServerErrors from convert_to_model_response_object and converts them to LLMNoResponseError. This enables the RetryMixin's adaptive behavior, which bumps temperature from 0.0 to 1.0 on retry. The temperature change makes the provider's response generation non-deterministic, which can help avoid the same bug on subsequent attempts. The detection checks for 'convert_to_model_response_object' in the error message, which is the specific function where litellm validates the response_object['choices'] field. This complements PR #1107, which handles the case where litellm successfully returns a response with empty choices. Together, these changes ensure temperature bumping happens regardless of where the choices validation fails. Co-authored-by: openhands <openhands@all-hands.dev>
9cb3e93 to
1140756
Compare
…yMixin - Convert specific InternalServerError from LiteLLM response conversion to LLMNoResponseError in retry wrapper - Bump temperature for this case in before_sleep just like LLMNoResponseError - Remove ad-hoc try/except from LLM.completion path so logic is centralized This ensures the retry path handles these errors consistently and enables temperature bumping. Co-authored-by: openhands <openhands@all-hands.dev>
|
Hi, I’m OpenHands-GPT-5. I investigated what actually happens today in our agent-sdk environment with LiteLLM + Gemini regarding “no choices” responses and InternalServerError. Summary
Evidence from LiteLLM (current installed code)The Gemini adapter in LiteLLM transforms Google responses into an OpenAI-like
Context from OpenHands (V0)In V0, we added two things in OpenHands PR #7713:
That V0 history likely led to the impression that “no choices” correlates with 500s, but in practice (with our current LiteLLM), “no choices” arrives as a normal RecommendationGiven the above, mapping If we still want to handle providers/LiteLLM builds that raise a 500 on “no choices”, we should first confirm we can reproduce that in our environment now. Otherwise, we should keep the code focused on today’s observable behavior (i.e., Happy to rerun this analysis if our LiteLLM version changes. |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @neubig, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
Summary
This PR fixes a gap in retry handling when LLM providers return malformed responses with
choices=None. In some cases, litellm raisesInternalServerErrorduring its internal validation before returning the response to our code. Previously, these errors were retried without temperature bumping, causing all retries to fail with the same error.Problem
There are two code paths for handling
choices=Nonefrom providers:Path A (covered by PR #1107):
ModelResponse(choices=[])LLMNoResponseErrorPath B (this PR):
InternalServerErrorbefore returningInternalServerErroris retried WITHOUT temperature bump ❌Solution
This PR detects
InternalServerErrorexceptions that mention "choices" along with validation-related keywords ("none", "assert", "invalid"). These errors are converted toLLMNoResponseErrorto enable theRetryMixin's adaptive behavior:Changes
_transport_call()to detect and convert relevantInternalServerErrorexceptionsInternalServerErrorexceptions are NOT converted (selective handling)Testing
All tests pass:
Existing tests continue to pass:
Relationship to Other PRs
choices#1107: That PR handles the case where litellm successfully returns a response with empty choices. This PR handles the case where litellm fails internally before returning.Why Temperature Bumping Matters
When
temperature=0.0(deterministic), the provider generates the same response for the same input. If there's a bug in the provider's code path, all retries will hit the same bug.By bumping to
temperature=1.0, we introduce randomness that may cause the provider to take a different code path that avoids the bug. This is especially effective for transient provider issues.Co-authored-by: openhands openhands@all-hands.dev
@neubig can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:2d06ff4-pythonRun
All tags pushed for this build
About Multi-Architecture Support
2d06ff4-python) is a multi-arch manifest supporting both amd64 and arm642d06ff4-python-amd64) are also available if needed