Skip to content

Commit bcf47c8

Browse files
Merge pull request #951 from rpgoldman/teleprompter-details
Added more details to the discussion of optimizers/teleprompters.
2 parents 822f932 + 48c7e90 commit bcf47c8

File tree

6 files changed

+74
-22
lines changed

6 files changed

+74
-22
lines changed

docs/docs/building-blocks/6-optimizers.md

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -27,40 +27,54 @@ DSPy programs consist of multiple calls to LMs, stacked together as [DSPy module
2727

2828
Given a metric, DSPy can optimize all of these three with multi-stage optimization algorithms. These can combine gradient descent (for LM weights) and discrete LM-driven optimization, i.e. for crafting/updating instructions and for creating/validating demonstrations. DSPy Demonstrations are like few-shot examples, but they're far more powerful. They can be created from scratch, given your program, and their creation and selection can be optimized in many effective ways.
2929

30-
In many cases, we found that compiling leads to better prompts than humans write. Not because DSPy optimizers are more creative than humans, but simply because they can try more things, much more systematically, and tune the metrics directly.
30+
In many cases, we found that compiling leads to better prompts than human writing. Not because DSPy optimizers are more creative than humans, but simply because they can try more things, much more systematically, and tune the metrics directly.
3131

3232

3333
## What DSPy Optimizers are currently available?
3434

35+
<!-- The following diagram was generated by: -->
36+
<!-- 1. Running symilar on the teleprompter module to extract the python hierarchy as a Graphviz dot file -->
37+
<!-- 2. Hand-editing the resulting dot file to remove classes that are not teleprompters/optimizers (e.g., classes for data structures manipulated by optimizers). -->
38+
<!-- 3. Using dot to compile the `.dot` file into a PNG -->
39+
<!-- Robert Goldman [2024/05/11:rpg] -->
40+
41+
[Subclasses of Teleprompter](figures/teleprompter-classes.png)
42+
3543
All of these can be accessed via `from dspy.teleprompt import *`.
3644

3745
#### Automatic Few-Shot Learning
3846

39-
1. **`LabeledFewShot`**: Simply constructs few-shot examples from provided labeled Q/A pairs.
47+
These optimizers extend the signature by automatically generating and including **optimized** examples within the prompt sent to the model, implementing few-shot learning.
48+
49+
1. **`LabeledFewShot`**: Simply constructs few-shot examples (demos) from provided labeled input and output data points. Requires `k` (number of examples for the prompt) and `trainset` to randomly select `k` examples from.
50+
51+
2. **`BootstrapFewShot`**: Uses a `teacher` module (which defaults to your program) to generate complete demonstrations for every stage of your program, along with labeled examples in `trainset`. Parameters include `max_labeled_demos` (the number of demonstrations randomly selected from the `trainset`) and `max_bootstrapped_demos` (the number of additional examples generated by the `teacher`). The bootstrapping process employs the metric to validate demonstrations, including only those that pass the metric in the "compiled" prompt. Advanced: Supports using a `teacher` program that is a *different* DSPy program that has compatible structure, for harder tasks.
4052

41-
2. **`BootstrapFewShot`**: Uses your program to self-generate complete demonstrations for every stage of your program. Will simply use the generated demonstrations (if they pass the metric) without any further optimization. Advanced: Supports using a teacher program (a different DSPy program that has compatible structure) and a teacher LM, for harder tasks.
53+
3. **`BootstrapFewShotWithRandomSearch`**: Applies `BootstrapFewShot` several times with random search over generated demonstrations, and selects the best program over the optimization. Parameters mirror those of `BootstrapFewShot`, with the addition of `num_candidate_programs`, which specifies the number of random programs evaluated over the optimization, including candidates of the uncompiled program, `LabeledFewShot` optimized program, `BootstrapFewShot` compiled program with unshuffled examples and `num_candidate_programs` of `BootstrapFewShot` compiled programs with randomized example sets.
4254

43-
3. **`BootstrapFewShotWithRandomSearch`**: Applies `BootstrapFewShot` several times with random search over generated demonstrations, and selects the best program.
55+
4. **`BootstrapFewShotWithOptuna`**: Applies `BootstrapFewShot` with Optuna optimization across demonstration sets, running trials to maximize evaluation metrics and selecting the best demonstrations.
4456

45-
4. **`BootstrapFewShotWithOptuna`**: Applies `BootstrapFewShot` through Optuna hyperparameter optimization across demonstration sets, running trials to maximize evaluation metrics.
57+
5. **`KNNFewShot`**. Selects demonstrations through k-Nearest Neighbors algorithm to pick a diverse set of examples from different clusters. Vectorizes the examples, and then clusters them, using cluster centers with `BootstrapFewShot` for bootstrapping/selection process. This will be useful when there's a lot of data over random spaces: using KNN helps optimize the `trainset` for `BootstrapFewShot`. See [this notebook](https://github.com/stanfordnlp/dspy/blob/main/examples/knn.ipynb) for an example.
4658

4759

4860
#### Automatic Instruction Optimization
4961

50-
4. **`COPRO`**: Generates and refines new instructions for each step, and optimizes them with coordinate ascent.
62+
These optimizers produce optimal instructions for the prompt and, in the case of MIPRO also optimize the set of few-shot demonstrations.
5163

52-
5. **`MIPRO`**: Generates instructions and few-shot examples in each step. The instruction generation is data-aware and demonstration-aware. Uses Bayesian Optimization to effectively search over the space of generation instructions/demonstrations across your modules.
64+
6. **`COPRO`**: Generates and refines new instructions for each step, and optimizes them with coordinate ascent (hill-climbing using the metric function and the `trainset`). Parameters include `depth` which is the number of iterations of prompt improvement the optimizer runs over.
65+
66+
7. **`MIPRO`**: Generates instructions *and* few-shot examples in each step. The instruction generation is data-aware and demonstration-aware. Uses Bayesian Optimization to effectively search over the space of generation instructions/demonstrations across your modules.
5367

5468

5569
#### Automatic Finetuning
5670

71+
This optimizer is used to fine-tune the underlying LLM(s).
72+
5773
6. **`BootstrapFinetune`**: Distills a prompt-based DSPy program into weight updates (for smaller LMs). The output is a DSPy program that has the same steps, but where each step is conducted by a finetuned model instead of a prompted LM.
5874

5975

6076
#### Program Transformations
6177

62-
7. **`KNNFewShot`**. Selects demonstrations through k-Nearest Neighbors algorithm integrating `BootstrapFewShot` for bootstrapping/selection process.
63-
6478
8. **`Ensemble`**: Ensembles a set of DSPy programs and either uses the full set or randomly samples a subset into a single program.
6579

6680

@@ -90,7 +104,7 @@ from dspy.teleprompt import BootstrapFewShotWithRandomSearch
90104

91105
# Set up the optimizer: we want to "bootstrap" (i.e., self-generate) 8-shot examples of your program's steps.
92106
# The optimizer will repeat this 10 times (plus some initial attempts) before selecting its best attempt on the devset.
93-
config = dict(max_bootstrapped_demos=3, max_labeled_demos=3, num_candidate_programs=10, num_threads=4)
107+
config = dict(max_bootstrapped_demos=4, max_labeled_demos=4, num_candidate_programs=10, num_threads=4)
94108

95109
teleprompter = BootstrapFewShotWithRandomSearch(metric=YOUR_METRIC_HERE, **config)
96110
optimized_program = teleprompter.compile(YOUR_PROGRAM_HERE, trainset=YOUR_TRAINSET_HERE)
@@ -115,4 +129,4 @@ To load a program from a file, you can instantiate an object from that class and
115129
```python
116130
loaded_program = YOUR_PROGRAM_CLASS()
117131
loaded_program.load(path=YOUR_SAVE_PATH)
118-
```
132+
```
204 KB
Loading

dspy/teleprompt/bootstrap.py

Lines changed: 32 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import random
22
import threading
3+
from typing import Dict, Optional
34

45
import tqdm
56

@@ -15,7 +16,8 @@
1516

1617
# TODO: Switch here from dsp.Example to dspy.Example. Right now, it's okay because it's internal only (predictors).
1718
# NOTE: Notice the places where we don't shuffle examples. I do like that this one doesn't shuffle.
18-
# Other ones that consider options may want to use both unshuffled and then shuffle a few times, when considering candidates.
19+
# Other ones that consider options may want to use both unshuffled and then shuffle a few times, when
20+
# considering candidates.
1921

2022
# TODO: the max_rounds via branch_idx to get past the cache, not just temperature.
2123
# In principle, we can also sample multiple outputs from the final generation step
@@ -25,25 +27,47 @@
2527
# won't hurt our "best effort" guarantees.)
2628

2729
# TODO: When this bootstraps for another teleprompter like finetune, we want all demos we gather.
28-
# But when it's for direct use we may want to sample ONE demo per predictor--example pair. This is important for "multi-use" modules.
30+
# But when it's for direct use we may want to sample ONE demo per predictor--example pair.
31+
# This is important for "multi-use" modules.
2932

3033
# TODO: Add baselines=[...]
3134

32-
3335
class BootstrapFewShot(Teleprompter):
3436
def __init__(
3537
self,
3638
metric=None,
3739
metric_threshold=None,
38-
teacher_settings={},
40+
teacher_settings: Optional[Dict]=None,
3941
max_bootstrapped_demos=4,
4042
max_labeled_demos=16,
4143
max_rounds=1,
4244
max_errors=5,
4345
):
46+
"""
47+
A Teleprompter class that composes a set of demos/examples to go into a predictor's prompt.
48+
These demos come from a combination of labeled examples in the training set, and bootstrapped demos.
49+
50+
Parameters
51+
----------
52+
metric: Callable
53+
A function that compares an expected value and predicted value, outputting the result of that comparison.
54+
metric_threshold: optional float, default `None`
55+
If the metric yields a numerical value, then check it against this threshold when
56+
deciding whether or not to accept a bootstrap example.
57+
teacher_settings: dict, optional
58+
Settings for the `teacher` model.
59+
max_bootstrapped_demos: int, default 4
60+
Maximum number of bootstrapped demonstrations to include
61+
max_labeled_demos: int, default 16
62+
Maximum number of labeled demonstrations to include.
63+
max_rounds: int, default 1
64+
Number of iterations to attempt generating the required bootstrap examples. If unsuccessful after `max_rounds`, the program ends.
65+
max_errors: int, default 5
66+
Maximum number of errors until program ends.
67+
"""
4468
self.metric = metric
4569
self.metric_threshold = metric_threshold
46-
self.teacher_settings = teacher_settings
70+
self.teacher_settings = {} if teacher_settings is None else teacher_settings
4771

4872
self.max_bootstrapped_demos = max_bootstrapped_demos
4973
self.max_labeled_demos = max_labeled_demos
@@ -90,7 +114,9 @@ def _prepare_predictor_mappings(self):
90114
assert name1 == name2, "Student and teacher must have the same program structure."
91115
assert predictor1.signature.equals(
92116
predictor2.signature,
93-
), f"Student and teacher must have the same signatures. {type(predictor1.signature)} != {type(predictor2.signature)}"
117+
), (f"Student and teacher must have the same signatures. "
118+
f"{type(predictor1.signature)} != {type(predictor2.signature)}"
119+
)
94120
assert id(predictor1) != id(predictor2), "Student and teacher must be different objects."
95121

96122
name2predictor[name1] = None # dict(student=predictor1, teacher=predictor2)

dspy/teleprompt/copro_optimizer.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,17 @@ def _set_signature(self, predictor, updated_signature):
126126
predictor.signature = updated_signature
127127

128128
def compile(self, student, *, trainset, eval_kwargs):
129-
"""student is a program that needs to be optimized, note that it may be zero-shot or already pre-optimized for demos != []"""
129+
"""
130+
optimizes `signature` of `student` program - note that it may be zero-shot or already pre-optimized (demos already chosen - `demos != []`)
131+
132+
parameters:
133+
student: program to optimize and left modified.
134+
trainset: iterable of `Example`s
135+
eval_kwargs: optional, dict
136+
Additional keywords to go into `Evaluate` for the metric.
137+
138+
Returns optimized version of `student`.
139+
"""
130140
module = student.deepcopy()
131141
evaluate = Evaluate(devset=trainset, metric=self.metric, **eval_kwargs)
132142
total_calls = 0

dspy/teleprompt/mipro_optimizer.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,19 +18,21 @@
1818
"""
1919
USAGE SUGGESTIONS:
2020
21-
The following code can be used to compile a optimized signature teleprompter using the MIPRO, and evaluate it on an end task:
21+
The following code can be used to compile a optimized signature teleprompter using MIPRO, and evaluate it on an end task:
2222
23+
``` python
2324
from dspy.teleprompt import MIPRO
2425
2526
teleprompter = MIPRO(prompt_model=prompt_model, task_model=task_model, metric=metric, num_candidates=10, init_temperature=1.0)
2627
kwargs = dict(num_threads=NUM_THREADS, display_progress=True, display_table=0)
2728
compiled_prompt_opt = teleprompter.compile(program, trainset=trainset[:TRAIN_NUM], num_trials=100, max_bootstrapped_demos=3, max_labeled_demos=5, eval_kwargs=kwargs)
2829
eval_score = evaluate(compiled_prompt_opt, devset=evalset[:EVAL_NUM], **kwargs)
30+
```
2931
3032
Note that this teleprompter takes in the following parameters:
3133
32-
* prompt_model: The model used for prompt generation. When unspecified, defaults to the model set in settings (ie. dspy.settings.configure(lm=task_model)).
33-
* task_model: The model used for prompt generation. When unspecified, defaults to the model set in settings (ie. dspy.settings.configure(lm=task_model)).
34+
* prompt_model: The model used for prompt generation. When unspecified, defaults to the model set in settings (i.e., dspy.settings.configure(lm=task_model)).
35+
* task_model: The model used for prompt generation. When unspecified, defaults to the model set in settings (i.e., dspy.settings.configure(lm=task_model)).
3436
* metric: The task metric used for optimization.
3537
* num_candidates: The number of new prompts and sets of fewshot examples to generate and evaluate. Default=10.
3638
* init_temperature: The temperature used to generate new prompts. Higher roughly equals more creative. Default=1.0.

dspy/teleprompt/teleprompt_optuna.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ def objective(self, trial):
5252
display_table=False,
5353
display_progress=True,
5454
)
55-
score, _ = evaluate(program2, return_all_scores=True)
55+
score = evaluate(program2, return_all_scores=False)
5656
trial.set_user_attr("program", program2)
5757
return score
5858

0 commit comments

Comments
 (0)