tests(llm): add explicit retry behavior tests for empty `choices` #1107

enyst · 2025-11-08T00:40:49Z

Summary
This PR adds targeted tests to validate how the SDK handles LiteLLM responses with empty choices — a scenario observed with some providers (e.g., Gemini). These tests ensure that:

We raise LLMNoResponseError inside the retry boundary when choices is None/empty
We retry appropriately on LLMNoResponseError
We bump temperature from 0.0 → 1.0 on retry (RetryMixin behavior)
We bubble up LLMNoResponseError after exhausting all retries

Scope

Tests only; no runtime code changes. This is intended to lock in current behavior and protect against regressions.

Details
Added tests/sdk/llm/test_llm_no_response_retry.py with three cases:

Retries then succeeds: first response has empty choices, second is valid → returns LLMResponse
Exhausts retries: all attempts have empty choices → raises LLMNoResponseError
Temperature bump: verifies temperature is set to 1.0 on the retry when the initial value is 0.0

Why this matters
Empty choices can occur due to transient provider issues. We want to:

Ensure the SDK treats this as an in-boundary, retryable condition (LLMNoResponseError)
Confirm RetryMixin’s adaptive behavior (temperature bump) is applied
Make the final raised exception explicit when all attempts fail (LLMNoResponseError)

Notes

Complements reliability work around provider errors and mapped SDK exceptions, but is independent of those changes.
No documentation updates necessary (tests only).

Co-authored-by: openhands openhands@all-hands.dev

@enyst can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:a2f54f1-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-a2f54f1-python \
  ghcr.io/openhands/agent-server:a2f54f1-python

All tags pushed for this build

ghcr.io/openhands/agent-server:a2f54f1-golang-amd64
ghcr.io/openhands/agent-server:a2f54f1-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:a2f54f1-golang-arm64
ghcr.io/openhands/agent-server:a2f54f1-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:a2f54f1-java-amd64
ghcr.io/openhands/agent-server:a2f54f1-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:a2f54f1-java-arm64
ghcr.io/openhands/agent-server:a2f54f1-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:a2f54f1-python-amd64
ghcr.io/openhands/agent-server:a2f54f1-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:a2f54f1-python-arm64
ghcr.io/openhands/agent-server:a2f54f1-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:a2f54f1-golang
ghcr.io/openhands/agent-server:a2f54f1-java
ghcr.io/openhands/agent-server:a2f54f1-python

About Multi-Architecture Support

Each variant tag (e.g., a2f54f1-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., a2f54f1-python-amd64) are also available if needed

…onseError) - Retries then succeeds - Exhausts retries and bubbles LLMNoResponseError - Verifies temperature bump from 0.0 -> 1.0 on retry Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2025-11-08T00:44:17Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/llm
llm.py	438	176	59%	306, 310, 315, 319, 323, 327–329, 333–334, 345–346, 348–349, 353, 370, 395, 400, 404, 409, 429, 458, 479, 483, 498, 504–505, 524–525, 535, 560–565, 586–587, 590, 594, 606, 611–614, 621, 624, 632–637, 641–644, 646, 659, 663–665, 667–668, 673–674, 676, 683, 686–691, 759–762, 767–771, 773–778, 780–782, 791–793, 796–797, 838–839, 842–845, 886, 900, 950, 953–955, 958–966, 970–972, 975, 978–980, 987–988, 997, 1004–1006, 1010, 1012–1017, 1019–1036, 1039–1043, 1045–1046, 1052–1061, 1074, 1088, 1093
TOTAL	11791	5446	53%

…pty choices Add comment near choices validation to make retry semantics explicit. Co-authored-by: openhands <openhands@all-hands.dev>

…LMNoResponseError Co-authored-by: openhands <openhands@all-hands.dev>

neubig

Thanks for checking this! I'm going to have to dig in deeper to the error, I'll try to share a version of what happened in a new issue description.

…mping Some LLM providers (e.g., Gemini) occasionally return malformed responses with choices=None. When this happens, litellm may raise InternalServerError during internal validation before returning the response to our code. Previously, these InternalServerErrors were retried, but without temperature bumping. Since temperature=0.0 is deterministic, all retries would hit the same provider bug and fail. This change detects InternalServerErrors related to malformed choices and converts them to LLMNoResponseError. This enables the RetryMixin's adaptive behavior, which bumps temperature from 0.0 to 1.0 on retry. The temperature change makes the provider's response generation non-deterministic, which can help avoid the same bug on subsequent attempts. This complements PR #1107, which handles the case where litellm successfully returns a response with empty choices. Together, these changes ensure temperature bumping happens regardless of where the choices validation fails. Co-authored-by: openhands <openhands@all-hands.dev>

…mping Some LLM providers occasionally return malformed responses with choices=None. When this happens, litellm may raise InternalServerError during internal validation (in convert_to_model_response_object) before returning the response to our code. Previously, these InternalServerErrors were retried, but without temperature bumping. Since temperature=0.0 is deterministic, all retries would hit the same provider bug and fail. This change detects InternalServerErrors related to litellm's response_object validation and converts them to LLMNoResponseError. This enables the RetryMixin's adaptive behavior, which bumps temperature from 0.0 to 1.0 on retry. The temperature change makes the provider's response generation non-deterministic, which can help avoid the same bug on subsequent attempts. The detection is specific to litellm's internal validation patterns: - response_object AND choices - choices AND is not None - convert_to_model_response_object This complements PR #1107, which handles the case where litellm successfully returns a response with empty choices. Together, these changes ensure temperature bumping happens regardless of where the choices validation fails. Co-authored-by: openhands <openhands@all-hands.dev>

…mping Some LLM providers occasionally return malformed responses with choices=None. When this happens, litellm raises InternalServerError during internal validation in convert_to_model_response_object before returning the response to our code. Previously, these InternalServerErrors were retried, but without temperature bumping. Since temperature=0.0 is deterministic, all retries would hit the same provider bug and fail. This change detects InternalServerErrors from convert_to_model_response_object and converts them to LLMNoResponseError. This enables the RetryMixin's adaptive behavior, which bumps temperature from 0.0 to 1.0 on retry. The temperature change makes the provider's response generation non-deterministic, which can help avoid the same bug on subsequent attempts. The detection checks for 'convert_to_model_response_object' in the error message, which is the specific function where litellm validates the response_object['choices'] field. This complements PR #1107, which handles the case where litellm successfully returns a response with empty choices. Together, these changes ensure temperature bumping happens regardless of where the choices validation fails. Co-authored-by: openhands <openhands@all-hands.dev>

tests: add explicit retry behavior tests for empty choices (LLMNoResp…

66ed80b

…onseError) - Retries then succeeds - Exhausts retries and bubbles LLMNoResponseError - Verifies temperature bump from 0.0 -> 1.0 on retry Co-authored-by: openhands <openhands@all-hands.dev>

docs(llm): clarify LLMNoResponseError raised in retry boundary for em…

14b60a7

…pty choices Add comment near choices validation to make retry semantics explicit. Co-authored-by: openhands <openhands@all-hands.dev>

enyst requested a review from neubig November 8, 2025 01:54

docs(llm): note Gemini may return empty choices; clarify retries on L…

cc6ad40

…LMNoResponseError Co-authored-by: openhands <openhands@all-hands.dev>

enyst changed the title ~~tests(llm): add explicit retry behavior tests for empty choices (LLMNoResponseError)~~ tests(llm): add explicit retry behavior tests for empty choices Nov 8, 2025

neubig approved these changes Nov 10, 2025

View reviewed changes

neubig mentioned this pull request Nov 10, 2025

Handle InternalServerError with choices=None to enable temperature bumping #1121

Open

enyst merged commit 53c0162 into main Nov 10, 2025
20 checks passed

enyst deleted the empty-choices-tests branch November 10, 2025 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tests(llm): add explicit retry behavior tests for empty `choices` #1107

tests(llm): add explicit retry behavior tests for empty `choices` #1107

Uh oh!

enyst commented Nov 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 8, 2025 •

edited

Loading

Uh oh!

neubig left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tests(llm): add explicit retry behavior tests for empty choices #1107

tests(llm): add explicit retry behavior tests for empty choices #1107

Uh oh!

Conversation

enyst commented Nov 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neubig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tests(llm): add explicit retry behavior tests for empty `choices` #1107

tests(llm): add explicit retry behavior tests for empty `choices` #1107

enyst commented Nov 8, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Nov 8, 2025 •

edited

Loading