You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/building-blocks/5-metrics.md
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,9 +9,7 @@ DSPy is a machine learning framework, so you must think about your **automatic m
9
9
10
10
## What is a metric and how do I define a metric for my task?
11
11
12
-
What makes outputs from your system good or bad? Invest in defining metrics and improving them over time incrementally. It's really hard to consistently improve what you aren't able to define.
13
-
14
-
A metric is just a function that will take examples from your data and take the output of your system, and return a score that quantifies how good the output is.
12
+
A metric is just a function that will take examples from your data and take the output of your system, and return a score that quantifies how good the output is. What makes outputs from your system good or bad?
15
13
16
14
For simple tasks, this could be just "accuracy" or "exact match" or "F1 score". This may be the case for simple classification or short-form QA tasks.
17
15
@@ -26,7 +24,7 @@ A DSPy metric is just a function in Python that takes `example` (e.g., from your
26
24
27
25
Your metric should also accept an optional third argument called `trace`. You can ignore this for a moment, but it will enable some powerful tricks if you want to use your metric for optimization.
28
26
29
-
Here's a simple example of a metric that's comparing `example.answer` and `pred.answer`.
27
+
Here's a simple example of a metric that's comparing `example.answer` and `pred.answer`. This particular metric will return a `bool`.
30
28
31
29
```python
32
30
defvalidate_answer(example, pred, trace=None):
@@ -38,7 +36,7 @@ Some people find these utilities (built-in) convenient:
38
36
-`dspy.evaluate.metrics.answer_exact_match`
39
37
-`dspy.evaluate.metrics.answer_passage_match`
40
38
41
-
Your metrics could be more complex, e.g. check for multiple properties.
39
+
Your metrics could be more complex, e.g. check for multiple properties. The metric below will return a `float` if `trace is None` (i.e., if it's used for evaluation or optimization), and will return a `bool` otherwise (i.e., if it's used to bootstrap demonstrations).
0 commit comments