Merge branch 'main' of https://github.com/stanfordnlp/dspy

sysid · sysid · commit 55c91f51337d · 2024-05-04T08:16:25.000+02:00
diff --git a/README.md b/README.md
@@ -136,15 +136,17 @@ You can find other examples tweeted by [@lateinteraction](https://twitter.com/la
 
 **Some other examples (not exhaustive, feel free to add more via PR):**
 
+
+- [DSPy Optimizers Benchmark on a bunch of different tasks, by Michael Ryan](https://github.com/stanfordnlp/dspy/tree/main/testing/tasks)
+- [Sophisticated Extreme Multi-Class Classification, IReRa, by Karel D’Oosterlinck](https://github.com/KarelDO/xmc.dspy)
+- [Haize Lab's Red Teaming with DSPy](https://blog.haizelabs.com/posts/dspy/) and see [their DSPy code](https://github.com/haizelabs/dspy-redteam)
 - Applying DSPy Assertions
   - [Long-form Answer Generation with Citations, by Arnav Singhvi](https://colab.research.google.com/github/stanfordnlp/dspy/blob/main/examples/longformqa/longformqa_assertions.ipynb)
   - [Generating Answer Choices for Quiz Questions, by Arnav Singhvi](https://colab.research.google.com/github/stanfordnlp/dspy/blob/main/examples/quiz/quiz_assertions.ipynb)
   - [Generating Tweets for QA, by Arnav Singhvi](https://colab.research.google.com/github/stanfordnlp/dspy/blob/main/examples/tweets/tweets_assertions.ipynb)
 - [Compiling LCEL runnables from LangChain in DSPy](https://github.com/stanfordnlp/dspy/blob/main/examples/tweets/compiling_langchain.ipynb)
 - [AI feedback, or writing LM-based metrics in DSPy](https://github.com/stanfordnlp/dspy/blob/main/examples/tweets/tweet_metric.py)
-- [DSPy Optimizers Benchmark on a bunch of different tasks, by Michael Ryan](https://github.com/stanfordnlp/dspy/tree/main/testing/tasks)
 - [Indian Languages NLI with gains due to compiling by Saiful Haq](https://github.com/saifulhaq95/DSPy-Indic/blob/main/indicxlni.ipynb)
-- [Sophisticated Extreme Multi-Class Classification, IReRa, by Karel D’Oosterlinck](https://github.com/KarelDO/xmc.dspy)
 - [DSPy on BIG-Bench Hard Example, by Chris Levy](https://drchrislevy.github.io/posts/dspy/dspy.html)
 - [Using Ollama with DSPy for Mistral (quantized) by @jrknox1977](https://gist.github.com/jrknox1977/78c17e492b5a75ee5bbaf9673aee4641)
 - [Using DSPy, "The Unreasonable Effectiveness of Eccentric Automatic Prompts" (paper) by VMware's Rick Battle & Teja Gollapudi, and interview at TheRegister](https://www.theregister.com/2024/02/22/prompt_engineering_ai_models/)
@@ -153,7 +155,8 @@ You can find other examples tweeted by [@lateinteraction](https://twitter.com/la
   - [Using DSPy to train Gpt 3.5 on HumanEval by Thomas Ahle](https://github.com/stanfordnlp/dspy/blob/main/examples/functional/functional.ipynb)
   - [Building a chess playing agent using DSPy by Franck SN](https://medium.com/thoughts-on-machine-learning/building-a-chess-playing-agent-using-dspy-9b87c868f71e)
 
-TODO: Add links to the state-of-the-art results on Theory of Mind (ToM) by Plastic Labs, the results by Haize Labs for Red Teaming with DSPy, and the DSPy pipeline from Replit.
+
+TODO: Add links to the state-of-the-art results by the University of Toronto on Clinical NLP, on Theory of Mind (ToM) by Plastic Labs, and the DSPy pipeline from Replit.
 
 There are also recent cool examples at [Weaviate's DSPy cookbook](https://github.com/weaviate/recipes/tree/main/integrations/dspy) by Connor Shorten. [See tutorial on YouTube](https://www.youtube.com/watch?v=CEuUG4Umfxs).
 
diff --git a/docs/docs/quick-start/minimal-example.mdx b/docs/docs/quick-start/minimal-example.mdx
@@ -12,7 +12,7 @@ We make use of the [GSM8K dataset](https://huggingface.co/datasets/gsm8k) and th
 
 ## Setup
 
-Before we delve into the example, let's ensure our environment is properly configured. We'll start by importing the necessary modules and configuring our language model:
+Before we jump into the example, let's ensure our environment is properly configured. We'll start by importing the necessary modules and configuring our language model:
 
 ```python
 import dspy
@@ -33,7 +33,7 @@ Let's take a look at what `gsm8k_trainset` and `gsm8k_devset` are:
 print(gsm8k_trainset)
 ```
 
-The `gsm8k_trainset` and `gsm8k_devset` datasets contain a list of Examples with each example having `question` and `answer` field. We'll use these datasets to train and evaluate our model.
+The `gsm8k_trainset` and `gsm8k_devset` datasets contain a list of Examples with each example having `question` and `answer` field.
 
 ## Define the Module
 
@@ -51,7 +51,7 @@ class CoT(dspy.Module):
 
 ## Compile and Evaluate the Model
 
-With our simple program in place, let's move on to optimizing it using the [`BootstrapFewShot`](/api/optimizers/BootstrapFewShot) teleprompter:
+With our simple program in place, let's move on to compiling it with the [`BootstrapFewShot`](/api/optimizers/BootstrapFewShot) teleprompter:
 
 ```python
 from dspy.teleprompt import BootstrapFewShot
@@ -61,9 +61,11 @@ config = dict(max_bootstrapped_demos=4, max_labeled_demos=4)
 
 # Optimize! Use the `gsm8k_metric` here. In general, the metric is going to tell the optimizer how well it's doing.
 teleprompter = BootstrapFewShot(metric=gsm8k_metric, **config)
-optimized_cot = teleprompter.compile(CoT(), trainset=gsm8k_trainset, valset=gsm8k_devset)
+optimized_cot = teleprompter.compile(CoT(), trainset=gsm8k_trainset)
 ```
 
+Note that BootstrapFewShot is not an optimizing teleprompter, i.e. it simple creates and validates examples for steps of the pipeline (in this case, the chain-of-thought reasoning) but does not optimize the metric. Other teleprompters like `BootstrapFewShotWithRandomSearch` and `MIPRO` will apply direct optimization.
+
 ## Evaluate
 
 Now that we have a compiled (optimized) DSPy program, let's move to evaluating its performance on the dev dataset.
diff --git a/dspy/teleprompt/bootstrap.py b/dspy/teleprompt/bootstrap.py
@@ -52,9 +52,8 @@ def __init__(
         self.error_count = 0
         self.error_lock = threading.Lock()
 
-    def compile(self, student, *, teacher=None, trainset, valset=None):
+    def compile(self, student, *, teacher=None, trainset):
         self.trainset = trainset
-        self.valset = valset
 
         self._prepare_student_and_teacher(student, teacher)
         self._prepare_predictor_mappings()
@@ -133,7 +132,7 @@ def _bootstrap(self, *, max_bootstraps=None):
         self.validation = [x for idx, x in enumerate(self.trainset) if idx not in bootstrapped]
         random.Random(0).shuffle(self.validation)
 
-        self.validation = self.valset or self.validation
+        self.validation = self.validation
 
         # NOTE: Can't yet use evaluate because we need to trace *per example*
         # evaluate = Evaluate(program=self.teacher, metric=self.metric, num_threads=12)
diff --git a/skycamp2023.ipynb b/skycamp2023.ipynb
@@ -46,8 +46,8 @@
     "import pkg_resources # Install the package if it's not installed\n",
     "if not \"dspy-ai\" in {pkg.key for pkg in pkg_resources.working_set}:\n",
     "    !pip install -U pip\n",
-    "    # !pip install dspy-ai\n",
-    "    !pip install -e $repo_path\n",
+    "    !pip install dspy-ai==2.1\n",
+    "    # !pip install -e $repo_path\n",
     "\n",
     "!pip install transformers"
    ]
diff --git a/tests/teleprompt/test_bootstrap.py b/tests/teleprompt/test_bootstrap.py
@@ -54,7 +54,7 @@ def test_compile_with_predict_instances():
         metric=simple_metric, max_bootstrapped_demos=1, max_labeled_demos=1
     )
     compiled_student = bootstrap.compile(
-        student, teacher=teacher, trainset=trainset, valset=valset
+        student, teacher=teacher, trainset=trainset
     )
 
     assert compiled_student is not None, "Failed to compile student"
@@ -74,7 +74,7 @@ def test_bootstrap_effectiveness():
         metric=simple_metric, max_bootstrapped_demos=1, max_labeled_demos=1
     )
     compiled_student = bootstrap.compile(
-        student, teacher=teacher, trainset=trainset, valset=valset
+        student, teacher=teacher, trainset=trainset
     )
 
     # Check that the compiled student has the correct demos
@@ -149,7 +149,7 @@ def forward(self, **kwargs):
     )
 
     with pytest.raises(RuntimeError, match="Simulated error"):
-        bootstrap.compile(student, teacher=teacher, trainset=trainset, valset=valset)
+        bootstrap.compile(student, teacher=teacher, trainset=trainset)
 
 
 def test_validation_set_usage():
@@ -171,7 +171,7 @@ def test_validation_set_usage():
         metric=simple_metric, max_bootstrapped_demos=1, max_labeled_demos=1
     )
     compiled_student = bootstrap.compile(
-        student, teacher=teacher, trainset=trainset, valset=valset
+        student, teacher=teacher, trainset=trainset
     )
 
     # Check that validation examples are part of student's demos after compilation

Original file line number	Diff line number	Diff line change
`@@ -54,7 +54,7 @@ def test_compile_with_predict_instances():`
`54`	`54`	`metric=simple_metric, max_bootstrapped_demos=1, max_labeled_demos=1`
`55`	`55`	`)`
`56`	`56`	`compiled_student = bootstrap.compile(`
`57`		`- student, teacher=teacher, trainset=trainset, valset=valset`
	`57`	`+ student, teacher=teacher, trainset=trainset`
`58`	`58`	`)`
`59`	`59`
`60`	`60`	`assert compiled_student is not None, "Failed to compile student"`
`@@ -74,7 +74,7 @@ def test_bootstrap_effectiveness():`
`74`	`74`	`metric=simple_metric, max_bootstrapped_demos=1, max_labeled_demos=1`
`75`	`75`	`)`
`76`	`76`	`compiled_student = bootstrap.compile(`
`77`		`- student, teacher=teacher, trainset=trainset, valset=valset`
	`77`	`+ student, teacher=teacher, trainset=trainset`
`78`	`78`	`)`
`79`	`79`
`80`	`80`	`# Check that the compiled student has the correct demos`
`@@ -149,7 +149,7 @@ def forward(self, **kwargs):`
`149`	`149`	`)`
`150`	`150`
`151`	`151`	`with pytest.raises(RuntimeError, match="Simulated error"):`
`152`		`- bootstrap.compile(student, teacher=teacher, trainset=trainset, valset=valset)`
	`152`	`+ bootstrap.compile(student, teacher=teacher, trainset=trainset)`
`153`	`153`
`154`	`154`
`155`	`155`	`def test_validation_set_usage():`
`@@ -171,7 +171,7 @@ def test_validation_set_usage():`
`171`	`171`	`metric=simple_metric, max_bootstrapped_demos=1, max_labeled_demos=1`
`172`	`172`	`)`
`173`	`173`	`compiled_student = bootstrap.compile(`
`174`		`- student, teacher=teacher, trainset=trainset, valset=valset`
	`174`	`+ student, teacher=teacher, trainset=trainset`
`175`	`175`	`)`
`176`	`176`
`177`	`177`	`# Check that validation examples are part of student's demos after compilation`