Skip to content

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Nov 8, 2025

Summary
This PR adds targeted tests to validate how the SDK handles LiteLLM responses with empty choices — a scenario observed with some providers (e.g., Gemini). These tests ensure that:

  • We raise LLMNoResponseError inside the retry boundary when choices is None/empty
  • We retry appropriately on LLMNoResponseError
  • We bump temperature from 0.0 → 1.0 on retry (RetryMixin behavior)
  • We bubble up LLMNoResponseError after exhausting all retries

Scope

  • Tests only; no runtime code changes. This is intended to lock in current behavior and protect against regressions.

Details
Added tests/sdk/llm/test_llm_no_response_retry.py with three cases:

  1. Retries then succeeds: first response has empty choices, second is valid → returns LLMResponse
  2. Exhausts retries: all attempts have empty choices → raises LLMNoResponseError
  3. Temperature bump: verifies temperature is set to 1.0 on the retry when the initial value is 0.0

Why this matters
Empty choices can occur due to transient provider issues. We want to:

  • Ensure the SDK treats this as an in-boundary, retryable condition (LLMNoResponseError)
  • Confirm RetryMixin’s adaptive behavior (temperature bump) is applied
  • Make the final raised exception explicit when all attempts fail (LLMNoResponseError)

Notes

  • Complements reliability work around provider errors and mapped SDK exceptions, but is independent of those changes.
  • No documentation updates necessary (tests only).

Co-authored-by: openhands openhands@all-hands.dev

@enyst can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:a2f54f1-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-a2f54f1-python \
  ghcr.io/openhands/agent-server:a2f54f1-python

All tags pushed for this build

ghcr.io/openhands/agent-server:a2f54f1-golang-amd64
ghcr.io/openhands/agent-server:a2f54f1-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:a2f54f1-golang-arm64
ghcr.io/openhands/agent-server:a2f54f1-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:a2f54f1-java-amd64
ghcr.io/openhands/agent-server:a2f54f1-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:a2f54f1-java-arm64
ghcr.io/openhands/agent-server:a2f54f1-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:a2f54f1-python-amd64
ghcr.io/openhands/agent-server:a2f54f1-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:a2f54f1-python-arm64
ghcr.io/openhands/agent-server:a2f54f1-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:a2f54f1-golang
ghcr.io/openhands/agent-server:a2f54f1-java
ghcr.io/openhands/agent-server:a2f54f1-python

About Multi-Architecture Support

  • Each variant tag (e.g., a2f54f1-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., a2f54f1-python-amd64) are also available if needed

…onseError)

- Retries then succeeds
- Exhausts retries and bubbles LLMNoResponseError
- Verifies temperature bump from 0.0 -> 1.0 on retry

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm
   llm.py43817659%306, 310, 315, 319, 323, 327–329, 333–334, 345–346, 348–349, 353, 370, 395, 400, 404, 409, 429, 458, 479, 483, 498, 504–505, 524–525, 535, 560–565, 586–587, 590, 594, 606, 611–614, 621, 624, 632–637, 641–644, 646, 659, 663–665, 667–668, 673–674, 676, 683, 686–691, 759–762, 767–771, 773–778, 780–782, 791–793, 796–797, 838–839, 842–845, 886, 900, 950, 953–955, 958–966, 970–972, 975, 978–980, 987–988, 997, 1004–1006, 1010, 1012–1017, 1019–1036, 1039–1043, 1045–1046, 1052–1061, 1074, 1088, 1093
TOTAL11791544653% 

…pty choices

Add comment near choices validation to make retry semantics explicit.

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst enyst requested a review from neubig November 8, 2025 01:54
…LMNoResponseError

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst enyst changed the title tests(llm): add explicit retry behavior tests for empty choices (LLMNoResponseError) tests(llm): add explicit retry behavior tests for empty choices Nov 8, 2025
Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking this! I'm going to have to dig in deeper to the error, I'll try to share a version of what happened in a new issue description.

neubig pushed a commit that referenced this pull request Nov 10, 2025
…mping

Some LLM providers (e.g., Gemini) occasionally return malformed responses
with choices=None. When this happens, litellm may raise InternalServerError
during internal validation before returning the response to our code.

Previously, these InternalServerErrors were retried, but without temperature
bumping. Since temperature=0.0 is deterministic, all retries would hit the
same provider bug and fail.

This change detects InternalServerErrors related to malformed choices and
converts them to LLMNoResponseError. This enables the RetryMixin's adaptive
behavior, which bumps temperature from 0.0 to 1.0 on retry. The temperature
change makes the provider's response generation non-deterministic, which can
help avoid the same bug on subsequent attempts.

This complements PR #1107, which handles the case where litellm successfully
returns a response with empty choices. Together, these changes ensure
temperature bumping happens regardless of where the choices validation fails.

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst enyst merged commit 53c0162 into main Nov 10, 2025
20 checks passed
@enyst enyst deleted the empty-choices-tests branch November 10, 2025 14:38
neubig pushed a commit that referenced this pull request Nov 10, 2025
…mping

Some LLM providers occasionally return malformed responses with choices=None.
When this happens, litellm may raise InternalServerError during internal
validation (in convert_to_model_response_object) before returning the
response to our code.

Previously, these InternalServerErrors were retried, but without temperature
bumping. Since temperature=0.0 is deterministic, all retries would hit the
same provider bug and fail.

This change detects InternalServerErrors related to litellm's response_object
validation and converts them to LLMNoResponseError. This enables the
RetryMixin's adaptive behavior, which bumps temperature from 0.0 to 1.0 on
retry. The temperature change makes the provider's response generation
non-deterministic, which can help avoid the same bug on subsequent attempts.

The detection is specific to litellm's internal validation patterns:
- response_object AND choices
- choices AND is not None
- convert_to_model_response_object

This complements PR #1107, which handles the case where litellm successfully
returns a response with empty choices. Together, these changes ensure
temperature bumping happens regardless of where the choices validation fails.

Co-authored-by: openhands <openhands@all-hands.dev>
neubig pushed a commit that referenced this pull request Nov 10, 2025
…mping

Some LLM providers occasionally return malformed responses with choices=None.
When this happens, litellm raises InternalServerError during internal
validation in convert_to_model_response_object before returning the
response to our code.

Previously, these InternalServerErrors were retried, but without temperature
bumping. Since temperature=0.0 is deterministic, all retries would hit the
same provider bug and fail.

This change detects InternalServerErrors from convert_to_model_response_object
and converts them to LLMNoResponseError. This enables the RetryMixin's
adaptive behavior, which bumps temperature from 0.0 to 1.0 on retry. The
temperature change makes the provider's response generation non-deterministic,
which can help avoid the same bug on subsequent attempts.

The detection checks for 'convert_to_model_response_object' in the error
message, which is the specific function where litellm validates the
response_object['choices'] field.

This complements PR #1107, which handles the case where litellm successfully
returns a response with empty choices. Together, these changes ensure
temperature bumping happens regardless of where the choices validation fails.

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants