22sidebar_position : 999
33---
44
5-
6- !!! warning "This page is outdated and may not be fully accurate in DSPy 2.5 and 2.6"
7-
8-
95# DSPy Cheatsheet
106
117This page will contain snippets for frequent usage patterns.
128
13- ## DSPy DataLoaders
14-
15- Import and initializing a DataLoader Object:
16-
17- ``` python
18- import dspy
19- from dspy.datasets import DataLoader
20-
21- dl = DataLoader()
22- ```
23-
24- ### Loading from HuggingFace Datasets
25-
26- ``` python
27- code_alpaca = dl.from_huggingface(" HuggingFaceH4/CodeAlpaca_20K" )
28- ```
29-
30- You can access the dataset of the splits by calling key of the corresponding split:
31-
32- ``` python
33- train_dataset = code_alpaca[' train' ]
34- test_dataset = code_alpaca[' test' ]
35- ```
36-
37- ### Loading specific splits from HuggingFace
38-
39- You can also manually specify splits you want to include as a parameters and it'll return a dictionary where keys are splits that you specified:
40-
41- ``` python
42- code_alpaca = dl.from_huggingface(
43- " HuggingFaceH4/CodeAlpaca_20K" ,
44- split = [" train" , " test" ],
45- )
46-
47- print (f " Splits in dataset: { code_alpaca.keys()} " )
48- ```
49-
50- If you specify a single split then dataloader will return a List of ` dspy.Example ` instead of dictionary:
51-
52- ``` python
53- code_alpaca = dl.from_huggingface(
54- " HuggingFaceH4/CodeAlpaca_20K" ,
55- split = " train" ,
56- )
57-
58- print (f " Number of examples in split: { len (code_alpaca)} " )
59- ```
60-
61- You can slice the split just like you do with HuggingFace Dataset too:
62-
63- ``` python
64- code_alpaca_80 = dl.from_huggingface(
65- " HuggingFaceH4/CodeAlpaca_20K" ,
66- split = " train[:80%]" ,
67- )
68-
69- print (f " Number of examples in split: { len (code_alpaca_80)} " )
70-
71- code_alpaca_20_80 = dl.from_huggingface(
72- " HuggingFaceH4/CodeAlpaca_20K" ,
73- split = " train[20%:80%]" ,
74- )
75-
76- print (f " Number of examples in split: { len (code_alpaca_20_80)} " )
77- ```
78-
79- ### Loading specific subset from HuggingFace
80-
81- If a dataset has a subset you can pass it as an arg like you do with ` load_dataset ` in HuggingFace:
82-
83- ``` python
84- gms8k = dl.from_huggingface(
85- " gsm8k" ,
86- " main" ,
87- input_keys = (" question" ,),
88- )
89-
90- print (f " Keys present in the returned dict: { list (gms8k.keys())} " )
91-
92- print (f " Number of examples in train set: { len (gms8k[' train' ])} " )
93- print (f " Number of examples in test set: { len (gms8k[' test' ])} " )
94- ```
95-
96- ### Loading from CSV
97-
98- ``` python
99- dolly_100_dataset = dl.from_csv(" dolly_subset_100_rows.csv" ,)
100- ```
101-
102- You can choose only selected columns from the csv by specifying them in the arguments:
103-
104- ``` python
105- dolly_100_dataset = dl.from_csv(
106- " dolly_subset_100_rows.csv" ,
107- fields = (" instruction" , " context" , " response" ),
108- input_keys = (" instruction" , " context" )
109- )
110- ```
111-
112- ### Splitting a List of ` dspy.Example `
113-
114- ``` python
115- splits = dl.train_test_split(dataset, train_size = 0.8 ) # `dataset` is a List of dspy.Example
116- train_dataset = splits[' train' ]
117- test_dataset = splits[' test' ]
118- ```
119-
120- ### Sampling from List of ` dspy.Example `
121-
122- ``` python
123- sampled_example = dl.sample(dataset, n = 5 ) # `dataset` is a List of dspy.Example
124- ```
125-
1269## DSPy Programs
12710
12811### dspy.Signature
@@ -131,8 +14,8 @@ sampled_example = dl.sample(dataset, n=5) # `dataset` is a List of dspy.Example
13114class BasicQA (dspy .Signature ):
13215 """ Answer questions with short factoid answers."""
13316
134- question = dspy.InputField()
135- answer = dspy.OutputField(desc = " often between 1 and 5 words" )
17+ question: str = dspy.InputField()
18+ answer: str = dspy.OutputField(desc = " often between 1 and 5 words" )
13619```
13720
13821### dspy.ChainOfThought
@@ -225,13 +108,13 @@ class FactJudge(dspy.Signature):
225108 context = dspy.InputField(desc = " Context for the prediction" )
226109 question = dspy.InputField(desc = " Question to be answered" )
227110 answer = dspy.InputField(desc = " Answer for the question" )
228- factually_correct = dspy.OutputField(desc = " Is the answer factually correct based on the context?" , prefix = " Factual[Yes/No]: " )
111+ factually_correct: bool = dspy.OutputField(desc = " Is the answer factually correct based on the context?" )
229112
230113judge = dspy.ChainOfThought(FactJudge)
231114
232115def factuality_metric (example , pred ):
233116 factual = judge(context = example.context, question = example.question, answer = pred.answer)
234- return int ( factual== " Yes " )
117+ return factual.factually_correct
235118```
236119
237120## DSPy Evaluation
@@ -367,18 +250,6 @@ copro_teleprompter = COPRO(prompt_model=model_to_generate_prompts, metric=your_d
367250compiled_program_optimized_signature = copro_teleprompter.compile(your_dspy_program, trainset = trainset, eval_kwargs = eval_kwargs)
368251```
369252
370- ### MIPRO
371-
372- ``` python
373- from dspy.teleprompt import MIPRO
374-
375- teleprompter = MIPRO(prompt_model = model_to_generate_prompts, task_model = model_that_solves_task, metric = your_defined_metric, num_candidates = num_new_prompts_generated, init_temperature = prompt_generation_temperature)
376-
377- kwargs = dict (num_threads = NUM_THREADS , display_progress = True , display_table = 0 )
378-
379- compiled_program_optimized_bayesian_signature = teleprompter.compile(your_dspy_program, trainset = trainset, num_trials = 100 , max_bootstrapped_demos = 3 , max_labeled_demos = 5 , eval_kwargs = kwargs)
380- ```
381-
382253### MIPROv2
383254
384255Note: detailed documentation can be found [ here] ( api/optimizers/MIPROv2.md ) . ` MIPROv2 ` is the latest extension of ` MIPRO ` which includes updates such as (1) improvements to instruction proposal and (2) more efficient search with minibatching.
@@ -445,20 +316,6 @@ print(f"Evaluate optimized program...")
445316evaluate(optimized_program, devset = devset[:])
446317```
447318
448- ### Signature Optimizer with Types
449-
450- ``` python
451- from dspy.teleprompt.signature_opt_typed import optimize_signature
452- from dspy.evaluate.metrics import answer_exact_match
453- from dspy.functional import TypedChainOfThought
454-
455- compiled_program = optimize_signature(
456- student = TypedChainOfThought(" question -> answer" ),
457- evaluator = Evaluate(devset = devset, metric = answer_exact_match, num_threads = 10 , display_progress = True ),
458- n_iterations = 50 ,
459- ).program
460- ```
461-
462319### KNNFewShot
463320
464321``` python
@@ -484,6 +341,20 @@ your_dspy_program_compiled = fewshot_optuna_optimizer.compile(student=your_dspy_
484341
485342Other custom configurations are similar to customizing the ` dspy.BootstrapFewShot ` optimizer.
486343
344+
345+ ### SIMBA
346+
347+ SIMBA, which stands for Stochastic Introspective Mini-Batch Ascent, is a prompt optimizer that accepts arbitrary DSPy programs and proceeds in a sequence of mini-batches seeking to make incremental improvements to the prompt instructions or few-shot examples.
348+
349+ ``` python
350+ from dspy.teleprompt import SIMBA
351+
352+ simba = SIMBA(metric = your_defined_metric, max_steps = 12 , max_demos = 10 )
353+
354+ optimized_program = simba.compile(student = your_dspy_program, trainset = trainset)
355+ ```
356+
357+
487358## DSPy ` Refine ` and ` BestofN `
488359
489360> ` dspy.Suggest ` and ` dspy.Assert ` are replaced by ` dspy.Refine ` and ` dspy.BestofN ` in DSPy 2.6.
0 commit comments