Skip to content

Conversation

@neubig
Copy link
Contributor

@neubig neubig commented Nov 10, 2025

Summary

This PR fixes a gap in retry handling when LLM providers return malformed responses with choices=None. In some cases, litellm raises InternalServerError during its internal validation before returning the response to our code. Previously, these errors were retried without temperature bumping, causing all retries to fail with the same error.

Problem

There are two code paths for handling choices=None from providers:

Path A (covered by PR #1107):

  • litellm successfully returns ModelResponse(choices=[])
  • Our validation at line 532 raises LLMNoResponseError
  • Temperature is bumped 0.0 → 1.0 on retry ✅
  • Retry succeeds

Path B (this PR):

  • litellm's internal validation fails with assertion error
  • litellm raises InternalServerError before returning
  • Our validation never runs
  • InternalServerError is retried WITHOUT temperature bump ❌
  • All retries fail (deterministic behavior hits same bug)

Solution

This PR detects InternalServerError exceptions that mention "choices" along with validation-related keywords ("none", "assert", "invalid"). These errors are converted to LLMNoResponseError to enable the RetryMixin's adaptive behavior:

  1. Temperature is bumped from 0.0 → 1.0 on retry
  2. Non-deterministic behavior helps avoid the same provider bug
  3. Retries have a better chance of succeeding

Changes

  • openhands-sdk/openhands/sdk/llm/llm.py: Added try-catch around _transport_call() to detect and convert relevant InternalServerError exceptions
  • tests/sdk/llm/test_internal_server_error_choices_none.py: Added comprehensive tests covering:
    • Retry with successful recovery after temperature bump
    • Exhaust retries when all attempts fail
    • Temperature bump verification (0.0 → 1.0)
    • Unrelated InternalServerError exceptions are NOT converted (selective handling)

Testing

All tests pass:

$ uv run pytest tests/sdk/llm/test_internal_server_error_choices_none.py -v
PASSED test_internal_server_error_choices_none_retries_then_succeeds
PASSED test_internal_server_error_choices_none_exhausts_retries
PASSED test_internal_server_error_choices_none_bumps_temperature
PASSED test_internal_server_error_unrelated_not_converted

Existing tests continue to pass:

$ uv run pytest tests/sdk/llm/test_api_connection_error_retry.py -v
6 passed

Relationship to Other PRs

  • Complements PR tests(llm): add explicit retry behavior tests for empty choices #1107: That PR handles the case where litellm successfully returns a response with empty choices. This PR handles the case where litellm fails internally before returning.
  • Together, these changes ensure temperature bumping happens regardless of where the choices validation fails.

Why Temperature Bumping Matters

When temperature=0.0 (deterministic), the provider generates the same response for the same input. If there's a bug in the provider's code path, all retries will hit the same bug.

By bumping to temperature=1.0, we introduce randomness that may cause the provider to take a different code path that avoids the bug. This is especially effective for transient provider issues.

Co-authored-by: openhands openhands@all-hands.dev

@neubig can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:2d06ff4-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-2d06ff4-python \
  ghcr.io/openhands/agent-server:2d06ff4-python

All tags pushed for this build

ghcr.io/openhands/agent-server:2d06ff4-golang-amd64
ghcr.io/openhands/agent-server:2d06ff4-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:2d06ff4-golang-arm64
ghcr.io/openhands/agent-server:2d06ff4-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:2d06ff4-java-amd64
ghcr.io/openhands/agent-server:2d06ff4-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:2d06ff4-java-arm64
ghcr.io/openhands/agent-server:2d06ff4-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:2d06ff4-python-amd64
ghcr.io/openhands/agent-server:2d06ff4-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:2d06ff4-python-arm64
ghcr.io/openhands/agent-server:2d06ff4-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:2d06ff4-golang
ghcr.io/openhands/agent-server:2d06ff4-java
ghcr.io/openhands/agent-server:2d06ff4-python

About Multi-Architecture Support

  • Each variant tag (e.g., 2d06ff4-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 2d06ff4-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Nov 10, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm
   llm.py43817659%309, 313, 318, 322, 326, 330–332, 336–337, 348–349, 351–352, 356, 373, 398, 403, 407, 412, 432, 461, 482, 486, 501, 507–508, 528–529, 539, 564–569, 590–591, 594, 598, 610, 615–618, 625, 628, 636–641, 645–648, 650, 663, 667–669, 671–672, 677–678, 680, 687, 690–695, 763–766, 771–775, 777–782, 784–786, 795–797, 800–801, 842–843, 846–849, 890, 904, 954, 957–959, 962–970, 974–976, 979, 982–984, 991–992, 1001, 1008–1010, 1014, 1016–1021, 1023–1040, 1043–1047, 1049–1050, 1056–1065, 1078, 1092, 1097
openhands-sdk/openhands/sdk/llm/utils
   retry_mixin.py674729%24–26, 28, 50, 52–53, 56–60, 63, 67–73, 78, 99–101, 104, 113–114, 118, 120–123, 126–129, 132–133, 135–139, 142–144, 146
TOTAL11922552753% 

@neubig neubig force-pushed the fix/handle-internal-server-error-choices-none branch from f5a032c to 9cb3e93 Compare November 10, 2025 14:39
…mping

Some LLM providers occasionally return malformed responses with choices=None.
When this happens, litellm raises InternalServerError during internal
validation in convert_to_model_response_object before returning the
response to our code.

Previously, these InternalServerErrors were retried, but without temperature
bumping. Since temperature=0.0 is deterministic, all retries would hit the
same provider bug and fail.

This change detects InternalServerErrors from convert_to_model_response_object
and converts them to LLMNoResponseError. This enables the RetryMixin's
adaptive behavior, which bumps temperature from 0.0 to 1.0 on retry. The
temperature change makes the provider's response generation non-deterministic,
which can help avoid the same bug on subsequent attempts.

The detection checks for 'convert_to_model_response_object' in the error
message, which is the specific function where litellm validates the
response_object['choices'] field.

This complements PR #1107, which handles the case where litellm successfully
returns a response with empty choices. Together, these changes ensure
temperature bumping happens regardless of where the choices validation fails.

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig neubig force-pushed the fix/handle-internal-server-error-choices-none branch from 9cb3e93 to 1140756 Compare November 10, 2025 14:45
@neubig neubig marked this pull request as ready for review November 10, 2025 15:49
@neubig neubig requested a review from enyst November 10, 2025 15:50
…yMixin

- Convert specific InternalServerError from LiteLLM response conversion to LLMNoResponseError in retry wrapper
- Bump temperature for this case in before_sleep just like LLMNoResponseError
- Remove ad-hoc try/except from LLM.completion path so logic is centralized

This ensures the retry path handles these errors consistently and enables temperature bumping.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Collaborator

enyst commented Nov 10, 2025

Hi, I’m OpenHands-GPT-5. I investigated what actually happens today in our agent-sdk environment with LiteLLM + Gemini regarding “no choices” responses and InternalServerError.

Summary

  • In our current agent-sdk .venv, LiteLLM does NOT raise InternalServerError when Google/Vertex returns empty or missing candidates for Gemini.
  • Instead, LiteLLM returns a normal ModelResponse with choices=[]. Our SDK then raises LLMNoResponseError based on that empty-choices response (and retries with temp bump), which matches our existing behavior.
  • When the upstream returns a non-200 HTTP status, LiteLLM raises VertexAIError (with the provider HTTP status code), not InternalServerError.
  • Therefore, this PR’s mapping of 500/InternalServerError → LLMNoResponseError does not trigger for the Gemini-empty-choices scenario in our environment today. It may address a hypothetical or alternate LiteLLM/provider implementation, but it’s not currently observed with the installed LiteLLM version.

Evidence from LiteLLM (current installed code)

The Gemini adapter in LiteLLM transforms Google responses into an OpenAI-like ModelResponse. Key places:

Context from OpenHands (V0)

In V0, we added two things in OpenHands PR #7713:

  • Retry on InternalServerError and Timeout (to be robust to 500s/timeouts)
  • Raise LLMNoResponseError when choices is empty (observed case with Gemini), which triggers our retry-with-temperature-bump logic

That V0 history likely led to the impression that “no choices” correlates with 500s, but in practice (with our current LiteLLM), “no choices” arrives as a normal ModelResponse(choices=[]), not a 500. The 500 path shows up as VertexAIError from LiteLLM.

Recommendation

Given the above, mapping InternalServerErrorLLMNoResponseError won’t activate for the Gemini-empty-choices scenario we actually see today. Our current SDK behavior (raise LLMNoResponseError when choices=[] and bump temperature on retry) already handles the real-world case.

If we still want to handle providers/LiteLLM builds that raise a 500 on “no choices”, we should first confirm we can reproduce that in our environment now. Otherwise, we should keep the code focused on today’s observable behavior (i.e., choices=[]LLMNoResponseError; provider HTTP errors → VertexAIError and standard retry without temp bump).

Happy to rerun this analysis if our LiteLLM version changes.

@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Nov 17, 2025

[Automatic Post]: It has been a while since there was any activity on this PR. @neubig, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants