Serialization / Deserialization for `EvaluatorContext` and separate task execution and evaluator execution

### Description

Motivation:
- Some evalution tasks can be time-consuming / expensive to run. It would be nice to be able to explore what metrics can be derived from evaluation results (`EvaluatorContext`) without re-running the task again
- This also enable data-science-like workflow with evaluation results, as DataFrame can be constructed from evaluation results.

Proposed implementation:
- Add serde schema for `SpanTreeRecordingError` so that `TypeAdapter` can be used on `EvaluatorContext`.
- Split `pydantic_evals/pydantic_evals/dataset.py::_run_task_and_evaluators()` function into two functions, one for running task and one for running evaluators, and make them public.
- (Optional?) A `ContextCaptureEvaluator` that saves `EvaluatorContext` to disk. 

### References

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Serialization / Deserialization for `EvaluatorContext` and separate task execution and evaluator execution #3314

Description

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Serialization / Deserialization for EvaluatorContext and separate task execution and evaluator execution #3314

Description

Description

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Serialization / Deserialization for `EvaluatorContext` and separate task execution and evaluator execution #3314