Skip to content

Conversation

@surfiniaburger
Copy link
Contributor

Hardened the server so that it would not crash if the data were malformed.

google-labs-jules bot and others added 7 commits November 9, 2025 11:37
This commit fixes a critical bug where the Gunicorn server for the DIPG environment would crash when receiving a malformed string from the LLM. The crash was caused by unhandled exceptions in the reward functions during string parsing.

This commit addresses the issue by:
1.  **Hardening Reward Functions:** Wrapping the parsing logic within each reward function in `dipg_environment.py` with a `try...except` block. This ensures that any malformed string will be caught, penalized with a `missing_answer_penalty`, and will no longer crash the server process.
2.  **Adding a Regression Test:** A new test case, `test_malformed_step`, has been added to `test_dipg_environment.py`. This test sends a known problematic string to the server to verify that it handles the error gracefully and does not crash, preventing future regressions.
3.  **Client-Side Resilience:** The Jupyter notebook `dipg-rl.ipynb` was also updated to make the training loop more resilient. It now catches `ReadTimeout` and `ConnectionError` exceptions, which can occur if the server crashes for any reason, and continues the training process.
…med-input

Fix server crash on malformed LLM responses
This commit provides a comprehensive fix for the training script crashes caused by `ReadTimeout` and `ConnectionError` exceptions. The root cause was the environment server crashing on malformed LLM-generated strings.

This commit addresses the issue on multiple levels:

1.  **Server-Side Robustness:** The core logic in `dipg_environment.py` has been hardened. The `step` function, which calculates rewards, now contains a `try...except` block that catches any exception during reward calculation. This prevents a single malformed response from crashing the entire server process. Instead, an error is logged, and a penalty is assigned.

2.  **Client-Side Resilience:** A new file, `reward_function.py`, has been created to provide the user with a corrected `create_reward_fn`. This function now correctly handles both `ConnectionError` and `ReadTimeout` exceptions, preventing the client-side training script from crashing and allowing it to continue robustly.

3.  **Regression Testing:** The existing regression test, `test_malformed_step`, was used to verify that the server no longer crashes when receiving malformed input, ensuring the server-side fix is effective.
…med-input

Fix Server Crash and Provide Robust Client-Side Function
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 10, 2025
@jspisak jspisak added the enhancement New feature or request label Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants