-
Notifications
You must be signed in to change notification settings - Fork 2.4k
fix(dspy): Example.toDict() fails to serialize dspy.History objects #9047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: TomeHirata <33407409+TomeHirata@users.noreply.github.com>
|
The LLM beat me to it. |
dspy/evaluate/evaluate.py
Outdated
| return [ | ||
| ( | ||
| merge_dicts(example, prediction) | {metric_name: score} | ||
| merge_dicts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot do we still need this change after adding changes to Example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need both changes. While Example.toDict() now serializes History correctly, dict(example) and example.items() still return raw History objects. The original code used merge_dicts(example, prediction) which calls .items() internally, so it would still fail JSON serialization without this change.
Without the evaluate.py change: merge_dicts(example, prediction) → fails with "Object of type History is not JSON serializable"
With both changes: merge_dicts(example.toDict(), prediction) → succeeds because History is already serialized to dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot I see, can we change merge_dict to call toDict if needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in ebb502c. Moved the toDict() call into merge_dicts(), which now automatically serializes objects with toDict() method. This keeps the serialization logic centralized and makes _prepare_results_output() cleaner - it now passes Example objects directly to merge_dicts().
| from dspy.utils.dummies import DummyLM | ||
|
|
||
|
|
||
| def test_evaluate_save_as_json_with_history(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot Can we add tests to existing test_evaluate.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in c67ba33. Moved both History tests into test_evaluate.py and removed the separate test_evaluate_with_history.py file. All 34 tests pass.
Co-authored-by: TomeHirata <33407409+TomeHirata@users.noreply.github.com>
Co-authored-by: TomeHirata <33407409+TomeHirata@users.noreply.github.com>
dspy/primitives/example.py
Outdated
| def convert_to_serializable(value): | ||
| if hasattr(value, "toDict"): | ||
| return value.toDict() | ||
| elif hasattr(value, "model_dump"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot can we use isinstance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 2a82f8d. Changed from hasattr(value, "model_dump") to isinstance(value, BaseModel) for more explicit type checking. Also added safe import handling with BaseModel = None fallback if pydantic is not available.
dspy/evaluate/evaluate.py
Outdated
| merge_dicts(example, prediction) | {metric_name: score} | ||
| if prediction_is_dictlike(prediction) | ||
| else dict(example) | {"prediction": prediction, metric_name: score} | ||
| else (example.toDict() if hasattr(example, "toDict") else dict(example)) | {"prediction": prediction, metric_name: score} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot is it possible to call toDict when dict() is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 1d56c29. Simplified the code to always use example.toDict() instead of the conditional (example.toDict() if hasattr(example, "toDict") else dict(example)). Since Example always has the toDict() method, the conditional was unnecessary. This ensures consistent serialization behavior and makes the code cleaner.
Co-authored-by: TomeHirata <33407409+TomeHirata@users.noreply.github.com>
Co-authored-by: TomeHirata <33407409+TomeHirata@users.noreply.github.com>
Plan: Fix Example.toDict() serialization for dspy.History objects
Problem
Example.toDict()fails to serialize examples containingdspy.Historyobjects because theconvert_to_serializablefunction doesn't handle Pydantic models properly. This causes JSON serialization failures inEvaluate.save_as_json().Minimal Changes Plan
Changes Made
1. Updated
Example.toDict()indspy/primitives/example.pyChanged from
hasattr(value, "model_dump")toisinstance(value, BaseModel)for more explicit and Pythonic type checking:This is safer and more explicit than using
hasattr(), following Python best practices for type checking.2. Updated
merge_dicts()indspy/evaluate/evaluate.pyAdded automatic serialization when objects have
toDict()method:This keeps the serialization logic centralized in one place and makes
_prepare_results_output()cleaner.3. Updated
_prepare_results_output()indspy/evaluate/evaluate.pySimplified to always use
toDict()without conditional checks:Previously had:
(example.toDict() if hasattr(example, "toDict") else dict(example))Since Example always has
toDict()method, the conditional check was unnecessary. This simplification makes the code cleaner and ensures consistent serialization behavior.4. Added comprehensive tests
test_example_to_dict_with_historyintests/primitives/test_example.py- Tests basic History serializationtest_evaluate_save_as_json_with_historyintests/evaluate/test_evaluate.py- Tests end-to-end JSON exporttest_evaluate_save_as_csv_with_historyintests/evaluate/test_evaluate.py- Tests CSV export with HistoryTesting Results
✅ All 18 tests in
tests/primitives/test_example.pypass✅ All 34 tests in
tests/evaluate/test_evaluate.pypass✅ All linting checks pass (ruff)
✅ Comprehensive manual verification passed
✅ No security vulnerabilities found (CodeQL)
Security Summary
CodeQL analysis completed with 0 alerts for Python code. No security vulnerabilities were found in the changes.
Reviewer Feedback Addressed
test_evaluate.pyfile for better code organizationmerge_dicts()function for cleaner, more maintainable codehasattr(value, "model_dump")toisinstance(value, BaseModel)for more explicit and Pythonic type checking_prepare_results_output()to always usetoDict()without conditional fallback, making the code cleaner and ensuringtoDict()serialization is always usedOriginal prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.