diff --git a/docs/user-guide/concepts/model-providers/amazon-bedrock.md b/docs/user-guide/concepts/model-providers/amazon-bedrock.md index e7c3f028..9cd05c03 100644 --- a/docs/user-guide/concepts/model-providers/amazon-bedrock.md +++ b/docs/user-guide/concepts/model-providers/amazon-bedrock.md @@ -332,16 +332,52 @@ For detailed information about supported models, minimum token requirements, and #### System Prompt Caching -System prompt caching allows you to reuse a cached system prompt across multiple requests: +System prompt caching allows you to reuse a cached system prompt across multiple requests. Strands supports two approaches for system prompt caching: + +**Provider-Agnostic Approach (Recommended)** + +Use SystemContentBlock arrays to define cache points that work across all model providers: + +```python +from strands import Agent +from strands.types.content import SystemContentBlock + +# Define system content with cache points +system_content = [ + SystemContentBlock( + text="You are a helpful assistant that provides concise answers. " + "This is a long system prompt with detailed instructions..." + "..." * 1600 # needs to be at least 1,024 tokens + ), + SystemContentBlock(cachePoint={"type": "default"}) +] + +# Create an agent with SystemContentBlock array +agent = Agent(system_prompt=system_content) + +# First request will cache the system prompt +response1 = agent("Tell me about Python") +print(f"Cache write tokens: {response1.metrics.accumulated_usage.get('cacheWriteInputTokens')}") +print(f"Cache read tokens: {response1.metrics.accumulated_usage.get('cacheReadInputTokens')}") + +# Second request will reuse the cached system prompt +response2 = agent("Tell me about JavaScript") +print(f"Cache write tokens: {response2.metrics.accumulated_usage.get('cacheWriteInputTokens')}") +print(f"Cache read tokens: {response2.metrics.accumulated_usage.get('cacheReadInputTokens')}") +``` + +**Legacy Bedrock-Specific Approach** + +For backwards compatibility, you can still use the Bedrock-specific `cache_prompt` configuration: ```python from strands import Agent from strands.models import BedrockModel -# Using system prompt caching with BedrockModel +# Using legacy system prompt caching with BedrockModel bedrock_model = BedrockModel( model_id="anthropic.claude-sonnet-4-20250514-v1:0", - cache_prompt="default" + cache_prompt="default" # This approach is deprecated ) # Create an agent with the model @@ -349,20 +385,13 @@ agent = Agent( model=bedrock_model, system_prompt="You are a helpful assistant that provides concise answers. " + "This is a long system prompt with detailed instructions... " - # Add enough text to reach the minimum token requirement for your model ) -# First request will cache the system prompt -response1 = agent("Tell me about Python") -print(f"Cache write tokens: {response1.metrics.accumulated_usage.get('cacheWriteInputTokens')}") -print(f"Cache read tokens: {response1.metrics.accumulated_usage.get('cacheReadInputTokens')}") - -# Second request will reuse the cached system prompt -response2 = agent("Tell me about JavaScript") -print(f"Cache write tokens: {response2.metrics.accumulated_usage.get('cacheWriteInputTokens')}") -print(f"Cache read tokens: {response2.metrics.accumulated_usage.get('cacheReadInputTokens')}") +response = agent("Tell me about Python") ``` +> **Note**: The `cache_prompt` configuration is deprecated in favor of the provider-agnostic SystemContentBlock approach. The new approach enables caching across all model providers through a unified interface. + #### Tool Caching Tool caching allows you to reuse a cached tool definition across multiple requests: