Add documentation for provider-side prompt caching with Anthropic and OpenAI (#8970)

Copilot · TomeHirata · web-flow · commit 53c29f5f5a38 · 2025-10-25T19:43:04.000+09:00
* Initial plan

* Add documentation for provider-side prompt caching

Co-authored-by: TomeHirata &lt;33407409+TomeHirata@users.noreply.github.com&gt;

* Remove unnecessary paragraph from prompt caching documentation

Co-authored-by: TomeHirata &lt;33407409+TomeHirata@users.noreply.github.com&gt;

* Simplify prompt caching documentation by consolidating provider sections

Co-authored-by: TomeHirata &lt;33407409+TomeHirata@users.noreply.github.com&gt;

* Remove additional configuration options section

Co-authored-by: TomeHirata &lt;33407409+TomeHirata@users.noreply.github.com&gt;

* Remove duplicated example and redundant explanation

Co-authored-by: TomeHirata &lt;33407409+TomeHirata@users.noreply.github.com&gt;

* Add reference to LiteLLM prompt caching documentation

Co-authored-by: TomeHirata &lt;33407409+TomeHirata@users.noreply.github.com&gt;

---------

Co-authored-by: copilot-swe-agent[bot] &lt;198982749+Copilot@users.noreply.github.com&gt;
Co-authored-by: TomeHirata &lt;33407409+TomeHirata@users.noreply.github.com&gt;
diff --git a/docs/docs/tutorials/cache/index.md b/docs/docs/tutorials/cache/index.md
@@ -48,6 +48,39 @@ Time elapse:  0.000529
 Total usage: {}
 ```
 
+## Using Provider-Side Prompt Caching
+
+In addition to DSPy's built-in caching mechanism, you can leverage provider-side prompt caching offered by LLM providers like Anthropic and OpenAI. This feature is particularly useful when working with modules like `dspy.ReAct()` that send similar prompts repeatedly, as it reduces both latency and costs by caching prompt prefixes on the provider's servers.
+
+You can enable prompt caching by passing the `cache_control_injection_points` parameter to `dspy.LM()`. This works with supported providers like Anthropic and OpenAI. For more details on this feature, see the [LiteLLM prompt caching documentation](https://docs.litellm.ai/docs/tutorials/prompt_caching#configuration).
+
+```python
+import dspy
+import os
+
+os.environ["ANTHROPIC_API_KEY"] = "{your_anthropic_key}"
+lm = dspy.LM(
+    "anthropic/claude-3-5-sonnet-20240620",
+    cache_control_injection_points=[
+        {
+            "location": "message",
+            "role": "system",
+        }
+    ],
+)
+dspy.configure(lm=lm)
+
+# Use with any DSPy module
+predict = dspy.Predict("question->answer")
+result = predict(question="What is the capital of France?")
+```
+
+This is especially beneficial when:
+
+- Using `dspy.ReAct()` with the same instructions
+- Working with long system prompts that remain constant
+- Making multiple requests with similar context
+
 ## Disabling/Enabling DSPy Cache
 
 There are scenarios where you might need to disable caching, either entirely or selectively for in-memory or on-disk caches. For instance: