Add Anthropic prompt caching support #3363

ronakrm · 2025-11-06T23:39:20Z

Summary

Fixes Anthropic prompt caching #1041

This PR adds prompt caching support for Anthropic models, allowing users to cache parts of prompts (system prompts, long context, tools) to reduce costs by ~90% for cached tokens.

This is a simplified, Anthropic-only implementation based on the work in #2560, following the maintainer's suggestion to "launch this for just Anthropic first."

Core Implementation

Added CachePoint class: Simple marker that can be inserted into user prompts to indicate cache boundaries
Implemented cache control in AnthropicModel: Uses BetaCacheControlEphemeralParam to add cache_control to content blocks
Added cache metrics mapping: Automatically tracks cache_write_tokens and cache_read_tokens via genai-prices
CachePoint is passed through for all other models (ignored)

Example Usage

from pydantic_ai import Agent, CachePoint

agent = Agent('anthropic:claude-sonnet-4-5')

result = await agent.run([
      LONG_CONTEXT,      # Long documentation or context
      CachePoint(),      # Mark cache boundary - everything before will be cached
      'Your question here'
  ])

# First request: cache_write_tokens > 0 (writes to cache)
# Subsequent requests: cache_read_tokens > 0 (reads from cache with 90% discount)

Testing

Basic cache control application
Multiple cache points in single prompt
Error handling (CachePoint as first content)
Different content types (images)
Confirmed working with actual Anthropic API calls showing proper cache metrics (can see in Anthropic/Claude console)

Compatibility

Added CachePoint filtering in other model providers (e.g., OpenAI) for graceful degradation
Models that don't support caching simply filter out CachePoint markers

Real-World Test Results

Tested with live Anthropic API:
Request 1 (cache write): cache_write_tokens=3264
Request 2 (cache read): cache_read_tokens=3264
Request 3 (cache read): cache_read_tokens=3264
Total savings: ~5875 token-equivalents

I likely can create a stacking PR to push system prompt caching for Anthropic as well (this needs to update _map_message and related code to just always have a list of blocks, and user-based string system prompts should probably just be detected and mapped into the json format).

tests/models/test_google.py

tests/models/test_huggingface.py

tests/models/test_bedrock.py

examples/pydantic_ai_examples/anthropic_prompt_caching.py

pydantic_ai_slim/pydantic_ai/messages.py

pydantic_ai_slim/pydantic_ai/models/anthropic.py

DouweM · 2025-11-07T20:24:50Z

pydantic_ai_slim/pydantic_ai/models/anthropic.py

+        """Add cache control to the last content block param."""
+        if not params:
+            raise UserError(
+                'CachePoint cannot be the first content in a user message - there must be previous content to attach the CachePoint to.'


Copying in context from https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#what-can-be-cached:

Tools: Tool definitions in the tools array System messages: Content blocks in the system array Text messages: Content blocks in the messages.content array, for both user and assistant turns Images & Documents: Content blocks in the messages.content array, in user turns Tool use and tool results: Content blocks in the messages.content array, in both user and assistant turns

I think we should support inserting a cache point after tool defs and system messages as well.

In the original PR I suggested doing this by supporting CachePoint as the first content in a user message (by adding it to whatever came before it: the system message, tool definition, or the last message of the assistant output), but that doesn't really feel natural from a code perspective.

What do you think about adding anthropic_cache_tools and anthropic_cache_instructions fields to AnthropicModelSettings, and setting cache_control on the relevant parts when set?

Seems reasonable, I'll look into it!

Let's update the message here to make it clear that they are likely looking for one of the 2 settings instead.

tests/models/test_anthropic.py

DouweM · 2025-11-07T20:30:01Z

@ronakrm If you're up for it, I'd welcome Bedrock support in this PR as well. It'll have that one bug (#2560 (comment)) but most users won't hit it, and it's clearly on their side to fix, not ours. Initially I thought we should hold off until they'd fixed it, but I'd rather just get this out for most people who won't hit the issue anyway.

ronakrm · 2025-11-08T02:15:45Z

@ronakrm If you're up for it, I'd welcome Bedrock support in this PR as well. It'll have that one bug (#2560 (comment)) but most users won't hit it, and it's clearly on their side to fix, not ours. Initially I thought we should hold off until they'd fixed it, but I'd rather just get this out for most people who won't hit the issue anyway.

I can take a stab at this, but was a bit concerned about scope-creep causing me to get less excited and delay work on this, and my current inability to test a live Bedrock example. I may first get a full pass on the pure-Anthropic side if that's alright with you.

(Also not sure what you're timelines are for this, but I should be able to make another pass at this in the next few days)

DouweM · 2025-11-10T16:16:17Z

I can take a stab at this, but was a bit concerned about scope-creep causing me to get less excited and delay work on this, and my current inability to test a live Bedrock example. I may first get a full pass on the pure-Anthropic side if that's alright with you.

Sounds good! I can then do Bedrock in a follow up PR.

(Also not sure what you're timelines are for this, but I should be able to make another pass at this in the next few days)

That's great, thanks.

This implementation adds prompt caching support for Anthropic models, allowing users to cache parts of prompts (system prompts, long context, tools) to reduce costs by ~90% for cached tokens. Key changes: - Add CachePoint class to mark cache boundaries in prompts - Implement cache control in AnthropicModel using BetaCacheControlEphemeralParam - Add cache metrics mapping (cache_creation_input_tokens → cache_write_tokens) - Add comprehensive tests for CachePoint functionality - Add working example demonstrating prompt caching usage - Add CachePoint filtering in OpenAI models for compatibility The implementation is Anthropic-only (removed Bedrock complexity from original PR pydantic#2560) for a cleaner, more maintainable solution. Related to pydantic#2560 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Fix TypedDict mutation in anthropic.py using cast() - Handle CachePoint in otel message conversion (skip for telemetry) - Add CachePoint handling in all model providers for compatibility - Models without caching support (Bedrock, Gemini, Google, HuggingFace, OpenAI) now filter out CachePoint markers All pyright type checks now pass.

Adding CachePoint handling pushed method complexity over the limit (16 > 15). Added noqa: C901 to suppress the complexity warning.

- Add test_cache_point_in_otel_message_parts to cover CachePoint in otel conversion - Add test_cache_control_unsupported_param_type to cover unsupported param error - Use .get() for TypedDict access to avoid type checking errors - Add type: ignore for testing protected method - Restore pragma: lax no cover on google.py file_data handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add test_cache_point_filtering for OpenAI, Bedrock, Google, and Hugging Face - Tests verify CachePoint is filtered out without errors - Achieves 100% coverage for CachePoint code paths 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

This commit addresses maintainer feedback on the Anthropic prompt caching PR: - Add anthropic_cache_tools field to cache last tool definition - Add anthropic_cache_instructions field to cache system prompts - Rewrite existing CachePoint tests to use snapshot() assertions - Add comprehensive tests for new caching settings - Remove standalone example file, add docs section instead - Move imports to top of test files - Remove ineffective Google CachePoint test - Add "Supported by: Anthropic" to CachePoint docstring - Add Anthropic docs link in cache_control method Tests are written but snapshots not yet generated (will be done in next commit).

…ture

ronakrm · 2025-11-13T04:45:37Z

So I think the current Python 3.11 + lowest-direct test failure is unrelated to this PR.

The Python 3.11 (lowest-direct) CI failure is not caused by the CachePoint feature but maybe a pre-existing/old Python 3.11 issue.

The test test_openai_responses.py crashes with Fatal Python error: Illegal instruction only on Python 3.11 with --resolution lowest-direct. This test passes on main branch but fails on this PR branch.

Via CC debugging, it seems like:

This PR adds CachePoint to the UserContent TypeAlias, expanding it from 6 to 7 types
This changes Python 3.11's module import timing due to PEP 659 optimizations
The timing change causes numpy/matplotlib C extensions to load at a slightly different moment
The oldest compatible versions (from lowest-direct) contain CPU instruction incompatibilities that trigger "illegal instruction" errors

I don't think the pure Python changes here can cause "illegal instruction" errors, this is probably always a C extension issue? The crash seems timing-dependent: main branch imports work, but this branch's slightly different timing exposes the C extension bug (affecting the parallel test exec via pytest-xdist worker processes)

Unfortunately I can't reproduce this locally because I don't have CUDA, and --resolution lowest-direct tries to build vllm 0.1.3 which requires CUDA:

RuntimeError: Cannot find CUDA_HOME. CUDA must be available in order to build the package.

Claude recommends:

Skip Python 3.11 lowest-direct as a known CI limitation
Pin minimum versions for numpy/matplotlib on Python 3.11 (Probably have to test to confirm this would resolve the issue)
Merge as-is since all other test configurations pass and the code is correct

@DouweM Any thoughts or recommendations?

- Add test_cache_point_with_streaming to verify CachePoint works with run_stream() - Add test_cache_point_with_unsupported_type to verify error handling for non-cacheable content types - Add test_cache_point_in_user_prompt to verify CachePoint is filtered in OpenTelemetry conversion - Fix test_cache_point_filtering in test_google.py to properly test _map_user_prompt method - Enhance test_cache_point_filtering in test_openai.py to directly test both Chat and Responses models - Add test_cache_point_filtering_responses_model for OpenAI Responses API These tests increase diff coverage from 68% to 98% (100% for all production code). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

@DouweM

- Move CachePoint imports to top of test files (test_bedrock.py, test_huggingface.py) - Add documentation link for cacheable_types in anthropic.py Addresses feedback from @DouweM in PR pydantic#3363 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

DouweM · 2025-11-13T20:24:17Z

Skip Python 3.11 lowest-direct as a known CI limitation

@ronakrm Done! 1df9ca6

Is this ready for review again or did you still have some changes planned?

- Add explicit list[ModelMessage] type annotations in test_instrumented.py - Fix pyright ignore comment placement in test_openai.py - Remove unnecessary type ignore comments Fixes CI pyright errors reported on Python 3.10 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

docs/models/anthropic.md

pydantic_ai_slim/pydantic_ai/models/anthropic.py

DouweM · 2025-11-13T23:32:03Z

pydantic_ai_slim/pydantic_ai/models/anthropic.py

+
+        # Add cache_control to the last tool if enabled
+        if tools and model_settings.get('anthropic_cache_tools'):
+            last_tool = cast(dict[str, Any], tools[-1])


Why do we have to cast it? I'd rather change the type of BetaToolUnionParam to not be a union, so that we can be sure it's a (typed)dict here.

BetaToolUnionParam is an upstream anthropic package type, I think this is the best we can do for now?

(This cast was unneeded, but the one at ~L700 is, comment added)

DouweM · 2025-11-13T23:34:26Z

pydantic_ai_slim/pydantic_ai/models/anthropic.py

+        """Add cache control to the last content block param."""
+        if not params:
+            raise UserError(
+                'CachePoint cannot be the first content in a user message - there must be previous content to attach the CachePoint to.'


Let's update the message here to make it clear that they are likely looking for one of the 2 settings instead.

DouweM · 2025-11-13T23:34:48Z

pydantic_ai_slim/pydantic_ai/models/anthropic.py

+        # Only certain types support cache_control
+        # See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#what-can-be-cached
+        cacheable_types = {'text', 'tool_use', 'server_tool_use', 'image', 'tool_result'}
+        last_param = cast(dict[str, Any], params[-1])  # Cast to dict for mutation


This didn't work without the cast?

pydantic_ai_slim/pydantic_ai/models/bedrock.py

Co-authored-by: Douwe Maan <me@douwe.me>

- Rename anthropic_cache_tools to anthropic_cache_tool_definitions for clarity - Add backticks to docstrings for code identifiers (cache_control, tools) - Improve error message to mention alternative cache settings - Remove unnecessary cast for BetaToolUnionParam (line 441) - Add explanatory comment for necessary cast of BetaContentBlockParam (line 703) - Update Bedrock comment to link to issue pydantic#3418 All tests pass with these changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

These mock tool functions are never actually called during tests, so their return statements don't need coverage. Achieves 100% coverage for test_anthropic.py. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add proper mkdocs cross-reference links to anthropic_cache_instructions - Add proper mkdocs cross-reference links to anthropic_cache_tool_definitions - Link to model settings documentation section Per maintainer feedback on PR pydantic#3363. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Replace three separate sections (each with individual examples) with: - A concise bulleted list of the three caching methods - One comprehensive example showing all three methods combined This reduces repetition and makes the documentation more scannable. Per maintainer feedback on PR pydantic#3363. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

pydantic_ai_slim/pydantic_ai/models/anthropic.py

DouweM · 2025-11-14T15:26:33Z

@ronakrm Thanks Ronak!

ronakrm force-pushed the anthropic-prompt-caching-only branch 2 times, most recently from 791999d to 5b5cb9f Compare November 7, 2025 04:26

DouweM self-assigned this Nov 7, 2025

DouweM requested changes Nov 7, 2025

View reviewed changes

DouweM added the awaiting author revision label Nov 7, 2025

DouweM mentioned this pull request Nov 7, 2025

Anthropic prompt caching #1041

Closed

DouweM mentioned this pull request Nov 10, 2025

Add Bedrock/Anthropic prompt caching #2560

Closed

This comment was marked as spam.

Sign in to view

ronakrm force-pushed the anthropic-prompt-caching-only branch 2 times, most recently from b1f6d6c to 7ef071f Compare November 12, 2025 21:54

ronakrm and others added 11 commits November 12, 2025 14:12

Add complexity noqa comment to openai._map_user_prompt

247e936

Adding CachePoint handling pushed method complexity over the limit (16 > 15). Added noqa: C901 to suppress the complexity warning.

linting

4824eeb

Generate inline snapshots for CachePoint tests

7e02ac4

Fix test_anthropic_empty_content_filtering for new _map_message signa…

3a0de37

…ture

Fix leftover conflict marker in test_google.py

2ea2a63

Add type ignore comments for protected method calls in tests

57d051a

ronakrm force-pushed the anthropic-prompt-caching-only branch from 7ef071f to 57d051a Compare November 12, 2025 22:18

Fix doc examples: wrap await in async functions and use single quotes

92509fe

ronakrm and others added 3 commits November 12, 2025 23:10

Merge branch 'main' into anthropic-prompt-caching-only

e70956e

Small lint in test_openai.py

6f29370

ronakrm marked this pull request as ready for review November 13, 2025 21:48

DouweM mentioned this pull request Nov 13, 2025

Anthropic prompt caching on Bedrock #3418

Open

DouweM requested changes Nov 13, 2025

View reviewed changes

ronakrm and others added 5 commits November 13, 2025 15:48

Update docs/models/anthropic.md

f274699

Co-authored-by: Douwe Maan <me@douwe.me>

Update pydantic_ai_slim/pydantic_ai/models/bedrock.py

9408b58

Co-authored-by: Douwe Maan <me@douwe.me>

Merge branch 'main' into anthropic-prompt-caching-only

11e7ab7

ronakrm requested a review from DouweM November 14, 2025 01:17

ronakrm and others added 3 commits November 13, 2025 17:26

Merge branch 'main' into anthropic-prompt-caching-only

cd64b45

DouweM reviewed Nov 14, 2025

View reviewed changes

pydantic_ai_slim/pydantic_ai/models/anthropic.py Outdated Show resolved Hide resolved

Update pydantic_ai_slim/pydantic_ai/models/anthropic.py

d1c0b56

DouweM changed the title ~~Add Anthropic prompt caching support with CachePoint~~ Add Anthropic prompt caching support Nov 14, 2025

DouweM enabled auto-merge (squash) November 14, 2025 15:26

DouweM merged commit dec2611 into pydantic:main Nov 14, 2025
56 of 58 checks passed

Add Anthropic prompt caching support #3363

Add Anthropic prompt caching support #3363

Conversation

ronakrm commented Nov 6, 2025 • edited by DouweM Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core Implementation

Example Usage

Testing

Compatibility

Real-World Test Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DouweM Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

ronakrm Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

DouweM Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DouweM commented Nov 7, 2025

Uh oh!

ronakrm commented Nov 8, 2025

Uh oh!

DouweM commented Nov 10, 2025

Uh oh!

This comment was marked as spam.

Uh oh!

ronakrm commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DouweM commented Nov 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DouweM Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

ronakrm Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DouweM Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

DouweM Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DouweM commented Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ronakrm commented Nov 6, 2025 •

edited by DouweM

Loading

ronakrm commented Nov 13, 2025 •

edited

Loading

ronakrm Nov 14, 2025 •

edited

Loading