Handle InternalServerError with choices=None to enable temperature bumping #1121

neubig · 2025-11-10T14:33:57Z

Summary

This PR fixes a gap in retry handling when LLM providers return malformed responses with choices=None. In some cases, litellm raises InternalServerError during its internal validation before returning the response to our code. Previously, these errors were retried without temperature bumping, causing all retries to fail with the same error.

Problem

There are two code paths for handling choices=None from providers:

Path A (covered by PR #1107):

litellm successfully returns ModelResponse(choices=[])
Our validation at line 532 raises LLMNoResponseError
Temperature is bumped 0.0 → 1.0 on retry ✅
Retry succeeds

Path B (this PR):

litellm's internal validation fails with assertion error
litellm raises InternalServerError before returning
Our validation never runs
InternalServerError is retried WITHOUT temperature bump ❌
All retries fail (deterministic behavior hits same bug)

Solution

This PR detects InternalServerError exceptions that mention "choices" along with validation-related keywords ("none", "assert", "invalid"). These errors are converted to LLMNoResponseError to enable the RetryMixin's adaptive behavior:

Temperature is bumped from 0.0 → 1.0 on retry
Non-deterministic behavior helps avoid the same provider bug
Retries have a better chance of succeeding

Changes

openhands-sdk/openhands/sdk/llm/llm.py: Added try-catch around _transport_call() to detect and convert relevant InternalServerError exceptions
tests/sdk/llm/test_internal_server_error_choices_none.py: Added comprehensive tests covering:
- Retry with successful recovery after temperature bump
- Exhaust retries when all attempts fail
- Temperature bump verification (0.0 → 1.0)
- Unrelated InternalServerError exceptions are NOT converted (selective handling)

Testing

All tests pass:

$ uv run pytest tests/sdk/llm/test_internal_server_error_choices_none.py -v
PASSED test_internal_server_error_choices_none_retries_then_succeeds
PASSED test_internal_server_error_choices_none_exhausts_retries
PASSED test_internal_server_error_choices_none_bumps_temperature
PASSED test_internal_server_error_unrelated_not_converted

Existing tests continue to pass:

$ uv run pytest tests/sdk/llm/test_api_connection_error_retry.py -v
6 passed

Relationship to Other PRs

Complements PR tests(llm): add explicit retry behavior tests for empty choices #1107: That PR handles the case where litellm successfully returns a response with empty choices. This PR handles the case where litellm fails internally before returning.
Together, these changes ensure temperature bumping happens regardless of where the choices validation fails.

Why Temperature Bumping Matters

When temperature=0.0 (deterministic), the provider generates the same response for the same input. If there's a bug in the provider's code path, all retries will hit the same bug.

By bumping to temperature=1.0, we introduce randomness that may cause the provider to take a different code path that avoids the bug. This is especially effective for transient provider issues.

Co-authored-by: openhands openhands@all-hands.dev

@neubig can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:2d06ff4-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-2d06ff4-python \
  ghcr.io/openhands/agent-server:2d06ff4-python

All tags pushed for this build

ghcr.io/openhands/agent-server:2d06ff4-golang-amd64
ghcr.io/openhands/agent-server:2d06ff4-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:2d06ff4-golang-arm64
ghcr.io/openhands/agent-server:2d06ff4-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:2d06ff4-java-amd64
ghcr.io/openhands/agent-server:2d06ff4-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:2d06ff4-java-arm64
ghcr.io/openhands/agent-server:2d06ff4-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:2d06ff4-python-amd64
ghcr.io/openhands/agent-server:2d06ff4-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:2d06ff4-python-arm64
ghcr.io/openhands/agent-server:2d06ff4-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:2d06ff4-golang
ghcr.io/openhands/agent-server:2d06ff4-java
ghcr.io/openhands/agent-server:2d06ff4-python

About Multi-Architecture Support

Each variant tag (e.g., 2d06ff4-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 2d06ff4-python-amd64) are also available if needed

github-actions · 2025-11-10T14:37:19Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/llm
llm.py	438	176	59%	309, 313, 318, 322, 326, 330–332, 336–337, 348–349, 351–352, 356, 373, 398, 403, 407, 412, 432, 461, 482, 486, 501, 507–508, 528–529, 539, 564–569, 590–591, 594, 598, 610, 615–618, 625, 628, 636–641, 645–648, 650, 663, 667–669, 671–672, 677–678, 680, 687, 690–695, 763–766, 771–775, 777–782, 784–786, 795–797, 800–801, 842–843, 846–849, 890, 904, 954, 957–959, 962–970, 974–976, 979, 982–984, 991–992, 1001, 1008–1010, 1014, 1016–1021, 1023–1040, 1043–1047, 1049–1050, 1056–1065, 1078, 1092, 1097
openhands-sdk/openhands/sdk/llm/utils
retry_mixin.py	67	47	29%	24–26, 28, 50, 52–53, 56–60, 63, 67–73, 78, 99–101, 104, 113–114, 118, 120–123, 126–129, 132–133, 135–139, 142–144, 146
TOTAL	11922	5527	53%

…mping Some LLM providers occasionally return malformed responses with choices=None. When this happens, litellm raises InternalServerError during internal validation in convert_to_model_response_object before returning the response to our code. Previously, these InternalServerErrors were retried, but without temperature bumping. Since temperature=0.0 is deterministic, all retries would hit the same provider bug and fail. This change detects InternalServerErrors from convert_to_model_response_object and converts them to LLMNoResponseError. This enables the RetryMixin's adaptive behavior, which bumps temperature from 0.0 to 1.0 on retry. The temperature change makes the provider's response generation non-deterministic, which can help avoid the same bug on subsequent attempts. The detection checks for 'convert_to_model_response_object' in the error message, which is the specific function where litellm validates the response_object['choices'] field. This complements PR #1107, which handles the case where litellm successfully returns a response with empty choices. Together, these changes ensure temperature bumping happens regardless of where the choices validation fails. Co-authored-by: openhands <openhands@all-hands.dev>

openhands-sdk/openhands/sdk/llm/llm.py

…yMixin - Convert specific InternalServerError from LiteLLM response conversion to LLMNoResponseError in retry wrapper - Bump temperature for this case in before_sleep just like LLMNoResponseError - Remove ad-hoc try/except from LLM.completion path so logic is centralized This ensures the retry path handles these errors consistently and enables temperature bumping. Co-authored-by: openhands <openhands@all-hands.dev>

enyst · 2025-11-10T19:53:04Z

Hi, I’m OpenHands-GPT-5. I investigated what actually happens today in our agent-sdk environment with LiteLLM + Gemini regarding “no choices” responses and InternalServerError.

Summary

In our current agent-sdk .venv, LiteLLM does NOT raise InternalServerError when Google/Vertex returns empty or missing candidates for Gemini.
Instead, LiteLLM returns a normal ModelResponse with choices=[]. Our SDK then raises LLMNoResponseError based on that empty-choices response (and retries with temp bump), which matches our existing behavior.
When the upstream returns a non-200 HTTP status, LiteLLM raises VertexAIError (with the provider HTTP status code), not InternalServerError.
Therefore, this PR’s mapping of 500/InternalServerError → LLMNoResponseError does not trigger for the Gemini-empty-choices scenario in our environment today. It may address a hypothetical or alternate LiteLLM/provider implementation, but it’s not currently observed with the installed LiteLLM version.

Evidence from LiteLLM (current installed code)

The Gemini adapter in LiteLLM transforms Google responses into an OpenAI-like ModelResponse. Key places:

transform_response and the downstream conversion set model_response.choices = [] and only populate from candidates if present. If there are no candidates, the response is returned with choices=[]:
- transform_response
- _transform_google_generate_content_to_openai_model_response
Streaming path initializes with empty choices (also showing the pattern of choices=[] by default):
- ModelResponseIterator.chunk_parser → ModelResponseStream(choices=[])
HTTP error handling raises VertexAIError (with the upstream HTTP status), not InternalServerError:
- async_completion error handling
- completion error handling

Context from OpenHands (V0)

In V0, we added two things in OpenHands PR #7713:

Retry on InternalServerError and Timeout (to be robust to 500s/timeouts)
Raise LLMNoResponseError when choices is empty (observed case with Gemini), which triggers our retry-with-temperature-bump logic

That V0 history likely led to the impression that “no choices” correlates with 500s, but in practice (with our current LiteLLM), “no choices” arrives as a normal ModelResponse(choices=[]), not a 500. The 500 path shows up as VertexAIError from LiteLLM.

Recommendation

Given the above, mapping InternalServerError → LLMNoResponseError won’t activate for the Gemini-empty-choices scenario we actually see today. Our current SDK behavior (raise LLMNoResponseError when choices=[] and bump temperature on retry) already handles the real-world case.

If we still want to handle providers/LiteLLM builds that raise a 500 on “no choices”, we should first confirm we can reproduce that in our environment now. Otherwise, we should keep the code focused on today’s observable behavior (i.e., choices=[] → LLMNoResponseError; provider HTTP errors → VertexAIError and standard retry without temp bump).

Happy to rerun this analysis if our LiteLLM version changes.

blacksmith-sh · 2025-11-17T13:00:55Z

[Automatic Post]: It has been a while since there was any activity on this PR. @neubig, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

neubig force-pushed the fix/handle-internal-server-error-choices-none branch from f5a032c to 9cb3e93 Compare November 10, 2025 14:39

neubig force-pushed the fix/handle-internal-server-error-choices-none branch from 9cb3e93 to 1140756 Compare November 10, 2025 14:45

neubig marked this pull request as ready for review November 10, 2025 15:49

Merge branch 'main' into fix/handle-internal-server-error-choices-none

ac11499

neubig requested a review from enyst November 10, 2025 15:50

enyst reviewed Nov 10, 2025

View reviewed changes

openhands-sdk/openhands/sdk/llm/llm.py Outdated Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle InternalServerError with choices=None to enable temperature bumping #1121

Handle InternalServerError with choices=None to enable temperature bumping #1121

Uh oh!

neubig commented Nov 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

enyst commented Nov 10, 2025

Uh oh!

blacksmith-sh bot commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Handle InternalServerError with choices=None to enable temperature bumping #1121

Are you sure you want to change the base?

Handle InternalServerError with choices=None to enable temperature bumping #1121

Uh oh!

Conversation

neubig commented Nov 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes

Testing

Relationship to Other PRs

Why Temperature Bumping Matters

Uh oh!

github-actions bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

enyst commented Nov 10, 2025

Summary

Evidence from LiteLLM (current installed code)

Context from OpenHands (V0)

Recommendation

Uh oh!

blacksmith-sh bot commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

neubig commented Nov 10, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Nov 10, 2025 •

edited

Loading