Skip to content

Commit 8270f51

Browse files
authored
Add multi-modal example (#93)
1 parent 7c5bd08 commit 8270f51

File tree

7 files changed

+178
-0
lines changed

7 files changed

+178
-0
lines changed

docs/assets/multimodal/whale_1.png

132 KB
Loading

docs/assets/multimodal/whale_2.png

137 KB
Loading
1.44 MB
Loading

docs/assets/multimodal/whale_3.png

143 KB
Loading

docs/examples/python/multimodal.md

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# Multi-modal - Strands Agents for Image Generation and Evaluation
2+
3+
This [example][example_code] demonstrates how to create a multi-agent system for generating and evaluating images. It shows how Strands agents can work with multimodal content thorugh a workflow between specialized agents.
4+
5+
## Overview
6+
7+
| Feature | Description |
8+
| ------------------ | -------------------------------------- |
9+
| **Tools Used** | generate_image, image_reader |
10+
| **Complexity** | Intermediate |
11+
| **Agent Type** | Multi-Agent System (2 Agents) |
12+
| **Interaction** | Command Line Interface |
13+
| **Key Focus** | Multimodal Content Processing |
14+
15+
## Tool Overview
16+
17+
The multimodal example utilizes two tools to work with image content.
18+
19+
1. The [`generate_image`](https://github.com/strands-agents/tools/blob/main/src/strands_tools/generate_image.py) tool enables the creation of images based on text prompts, allowing the agent to generate visual content from textual descriptions.
20+
2. The [`image_reader`](https://github.com/strands-agents/tools/blob/main/src/strands_tools/image_reader.py) tool provides the capability to analyze and interpret image content, enabling the agent to "see" and describe what's in the images.
21+
22+
Together, these tools create a complete pipeline for both generating and evaluating visual content through natural language interactions.
23+
24+
## Code Structure and Implementation
25+
26+
### Agent Initialization
27+
28+
The example creates two specialized agents, each with a specific role in the image generation and evaluation process.
29+
30+
```python
31+
from strands import Agent, tool
32+
from strands_tools import generate_image, image_reader
33+
34+
# Artist agent that generates images based on prompts
35+
artist = Agent(tools=[generate_image],system_prompt=(
36+
"You will be instructed to generate a number of images of a given subject. Vary the prompt for each generated image to create a variety of options."
37+
"Your final output must contain ONLY a comma-separated list of the filesystem paths of generated images."
38+
))
39+
40+
# Critic agent that evaluates and selects the best image
41+
critic = Agent(tools=[image_reader],system_prompt=(
42+
"You will be provided with a list of filesystem paths, each containing an image."
43+
"Describe each image, and then choose which one is best."
44+
"Your final line of output must be as follows:"
45+
"FINAL DECISION: <path to final decision image>"
46+
))
47+
```
48+
49+
### Using the Multimodal Agents
50+
51+
The example demonstrates a simple workflow where the agents collaborate to generate and select images:
52+
53+
```python
54+
# Generate multiple images using the artist agent
55+
result = artist("Generate 3 images of a dog")
56+
57+
# Pass the image paths to the critic agent for evaluation
58+
critic(str(result))
59+
```
60+
61+
This workflow shows how agents can be chained together, with the output of one agent becoming the input for another, creating a pipeline for multimodal content processing.
62+
63+
## Key Features and Capabilities
64+
65+
### 1. Image Generation
66+
67+
The artist agent can generate multiple variations of images based on a text prompt:
68+
69+
- **Basic Generation**:
70+
```
71+
Generate 3 images of a dog
72+
```
73+
74+
- **Styled Generation**:
75+
```
76+
Generate 2 images of a mountain landscape in watercolor style
77+
```
78+
79+
- **Conceptual Generation**:
80+
```
81+
Generate 4 images representing the concept of freedom
82+
```
83+
84+
### 2. Image Evaluation
85+
86+
The critic agent can analyze images and provide detailed descriptions and evaluations of the images, ultimately outputting a final choice.
87+
88+
### Example Command and Results
89+
90+
**Command**:
91+
92+
```
93+
Generate 3 images of a whale
94+
```
95+
96+
**Response**:
97+
```
98+
I'll generate 3 different images of whales for you with varied prompts.
99+
Tool #1: generate_image
100+
101+
Tool #2: generate_image
102+
103+
Tool #3: generate_image
104+
output/a_majestic_blue_whale_swimming.png, output/humpback_whale_breaching_from_the.png, output/orca_whale_pod_swimming_together.png
105+
106+
I'll describe each of these whale images and determine which is best. Let me examine them one by one.
107+
Tool #1: image_reader
108+
This first image shows what appears to be a blue whale at the surface of the ocean [...]
109+
Tool #2: image_reader
110+
The second image showcases a humpback whale in a spectacular breaching display during what appears to be sunset or sunrise [...]
111+
Tool #3: image_reader
112+
The third image shows a pod of orca whales (also known as killer whales) swimming together [...]
113+
114+
Comparison and final decision:
115+
116+
All three images are remarkable whale photographs, each with unique strengths:
117+
118+
1. The blue whale image offers an intimate, close-up perspective with beautiful atmospheric lighting and a serene mood.
119+
120+
2. The humpback whale breaching image captures a dynamic, powerful moment with spectacular sunset colors creating an extraordinary visual impact.
121+
122+
3. The orca pod image showcases multiple whales in their social grouping against a dramatic arctic backdrop, emphasizing their habitat and community.
123+
124+
While each image is impressive, the humpback whale breaching at sunset stands out for its perfect combination of action, timing, lighting, and composition. The contrast between the dark whale and the golden sky, the dynamic motion captured at precisely the right moment, and the breathtaking sunset setting make this image particularly remarkable.
125+
126+
FINAL DECISION: output/humpback_whale_breaching_from_the.png
127+
```
128+
129+
During its execution, the `artist` agent used the following prompts (which can be seen in [traces](../../user-guide/observability-evaluation/traces.md) or [logs](../../user-guide/observability-evaluation/logs.md)) to generate each image:
130+
131+
"A majestic blue whale swimming in deep ocean waters, sunlight filtering through the surface, photorealistic"
132+
133+
![output/a_majestic_blue_whale_swimming.png](../../assets/multimodal/whale_1.png)
134+
135+
"Humpback whale breaching from the water, dramatic splash, against sunset sky, wildlife photography"
136+
137+
![output/humpback_whale_breaching_from_the.png](../../assets/multimodal/whale_2.png)
138+
139+
"Orca whale pod swimming together in arctic waters, aerial view, detailed, pristine environment"
140+
141+
![output/orca_whale_pod_swimming_together.png](../../assets/multimodal/whale_3.png)
142+
143+
And the `critic` agent selected the humpback whale as the best image:
144+
145+
![output/humpback_whale_breaching_from_the.png](../../assets/multimodal/whale_2_large.png)
146+
147+
148+
## Extending the Example
149+
150+
Here are some ways you could extend this example:
151+
152+
1. **Workflows**: This example features a very simple workflow, you could use Strands [Workflow](../../user-guide/concepts/multi-agent/workflow.md) capabilities for more elaborate media production pipelines.
153+
2. **Image Editing**: Extend the `generate_image` tool to accept and modify input images.
154+
3. **User Feedback Loop**: Allow users to provide feedback on the selection to improve future generations
155+
4. **Integration with Other Media**: Extend the system to work with other media types, such as video with Amazon Nova models.
156+
157+
[example_code]: {{ docs_repo }}/docs/examples/python/multimodal.py

docs/examples/python/multimodal.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
from strands import Agent, tool
2+
from strands_tools import generate_image, image_reader
3+
4+
5+
artist = Agent(tools=[generate_image],system_prompt=(
6+
"You will be instructed to generate a number of images of a given subject. Vary the prompt for each generated image to create a variety of options."
7+
"Your final output must contain ONLY a comma-separated list of the filesystem paths of generated images."
8+
))
9+
10+
11+
12+
critic = Agent(tools=[image_reader],system_prompt=(
13+
"You will be provided with a list of filesystem paths, each containing an image."
14+
"Describe each image, and then choose which one is best."
15+
"Your final line of output must be as follows:"
16+
"FINAL DECISION: <path to final decision image>"
17+
))
18+
19+
result = artist("Generate 3 images of a dog")
20+
critic(str(result))

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,7 @@ nav:
130130
- Multi Agents: examples/python/multi_agent_example/multi_agent_example.md
131131
- Meta Tooling: examples/python/meta_tooling.md
132132
- MCP: examples/python/mcp_calculator.md
133+
- Multi-modal: examples/python/multimodal.md
133134
- Contribute ❤️: https://github.com/strands-agents/sdk-python/blob/main/CONTRIBUTING.md
134135
- API Reference:
135136
- Agent: api-reference/agent.md

0 commit comments

Comments
 (0)