Skip to content
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
6412a5d
feat(gepa): add tool description optimization for multi-agent systems
Ju-usc Oct 10, 2025
cf0be4f
style: fix ruff formatting (trailing whitespace)
Ju-usc Oct 10, 2025
aa53fe2
style: apply ruff formatting fixes
Ju-usc Oct 10, 2025
045c6cf
feat(gepa): implement tool-specific proposer for tool descriptions
Ju-usc Oct 10, 2025
c4f2041
docs(gepa): clean up multi-agent example code
Ju-usc Oct 10, 2025
260ca80
refactor(gepa): simplify tool reflective dataset with ReAct context r…
Ju-usc Oct 11, 2025
04f7e3d
fix(gepa): unify custom proposer routing for tools
Ju-usc Oct 12, 2025
f92e184
docs(gepa): clarify tool reflection prompt
Ju-usc Oct 12, 2025
7178869
test: streamline GEPA tool optimization tests
Ju-usc Oct 12, 2025
e34703b
fix(gepa): streamline tool proposer formatting
Ju-usc Oct 12, 2025
3f05311
test(gepa): drop legacy dummy tool fixture
Ju-usc Oct 12, 2025
4df9ce5
docs(gepa): add tool-specific reflection prompt and metric example
Ju-usc Oct 12, 2025
4296ccf
docs(gepa): fix implementation details with accurate code flow
Ju-usc Oct 13, 2025
ea1204a
docs(gepa): remove backward compatibility note
Ju-usc Oct 13, 2025
48d5cd6
docs(gepa): improve usage examples with optimization visualization
Ju-usc Oct 13, 2025
548d9b6
docs(gepa): add design rationale comments for tool context sharing
Ju-usc Oct 13, 2025
e61d0a1
docs(gepa): add tool optimization links to overview and parameter docs
Ju-usc Oct 13, 2025
5c95412
docs(gepa): refine tool optimization scenarios and remove implementat…
Ju-usc Oct 13, 2025
19d7717
docs(gepa): clarify future work section in code comments
Ju-usc Oct 13, 2025
9ce5fe4
refactor(gepa): unify ReAct optimization as single module
Ju-usc Oct 24, 2025
91331d0
test(gepa): add end-to-end ReAct module optimization test
Ju-usc Oct 24, 2025
3418b59
fix(gepa): enable arg description optimization for ReAct tools
Ju-usc Oct 24, 2025
b26d39a
chore: remove legacy test_gepa_tool_optimization.py
Ju-usc Oct 24, 2025
2791b5c
fix: restore accidentally removed score mismatch warning
Ju-usc Oct 24, 2025
8e63c62
test: update fixture after arg description optimization fix
Ju-usc Oct 25, 2025
7a9d2f3
fix(test): use JSON-based hashing for cross-version fixture stability
Ju-usc Oct 25, 2025
cd0de57
refactor(gepa): rename optimize_tool_descriptions to optimize_react_c…
Ju-usc Oct 26, 2025
67bb739
docs(gepa): improve 'What is optimize_react_components?' section
Ju-usc Oct 26, 2025
b3026a7
docs(gepa): replace outdated tool-specific prompt with actual ReAct o…
Ju-usc Oct 26, 2025
4e107aa
docs(gepa): simplify 'How It Works' section with accurate routing beh…
Ju-usc Oct 26, 2025
78547e7
docs(gepa): remove outdated Implementation Details section
Ju-usc Oct 26, 2025
7fa829b
docs(gepa): replace theoretical scenarios with real user pain points
Ju-usc Oct 26, 2025
da0e7bc
docs(gepa): fix usage examples reference to match updated scenarios
Ju-usc Oct 26, 2025
e51158d
docs(gepa): update inspect section to show all 4 ReAct components wit…
Ju-usc Oct 26, 2025
776ab9b
docs(gepa): rewrite Section 8 with accurate custom proposer behavior …
Ju-usc Oct 26, 2025
ec6bb7b
fix(gepa): fix top-level ReAct module lookup and remove tool name san…
Ju-usc Oct 27, 2025
b6cc67b
refactor(gepa): unify ReAct module key handling and use constant
Ju-usc Oct 28, 2025
1206f38
test(gepa): add ReAct module detection tests for nested structures
Ju-usc Oct 28, 2025
333cbbf
test(gepa): add comprehensive ReAct detection and reconstruction tests
Ju-usc Oct 28, 2025
a50552a
test(gepa): add reflective dataset tests for multi-agent trajectory v…
Ju-usc Oct 28, 2025
965b157
test(gepa): verify tool arg descriptions propagate to args schema
Ju-usc Oct 29, 2025
5ddc6d3
fix(gepa): propagate arg_desc updates to tool.args for prompt rendering
Ju-usc Oct 29, 2025
2269de5
test(gepa): remove fixture-based test and unused dependencies
Ju-usc Oct 29, 2025
17456f0
test(gepa): remove unused fixture file
Ju-usc Oct 29, 2025
c884c18
style: fix ruff linting issues (import formatting, whitespace, bare e…
Ju-usc Oct 31, 2025
82dee25
refactor(test): rename setup_spy_for_base_program to setup_capture_fo…
Ju-usc Oct 31, 2025
ca84b9d
docs(gepa): clarify why Tool.func uses placeholder lambda in proposer
Ju-usc Oct 31, 2025
2eb8986
refactor(gepa): make all ReAct components optional with None default …
Ju-usc Oct 31, 2025
9f37ac1
docs(gepa): clarify 'LM' as 'reflection LM' in comments for precision
Ju-usc Oct 31, 2025
bd4cdac
refactor(gepa): refine reflection prompt to guide concise, focused Re…
Ju-usc Oct 31, 2025
0ad4077
docs(gepa): revise ReAct metric example to be general and extensible
Ju-usc Oct 31, 2025
ef5563e
docs(gepa): replace custom proposer example with reference to ReActMo…
Ju-usc Oct 31, 2025
1b10b65
docs(gepa): make custom proposer section more approachable and clear
Ju-usc Oct 31, 2025
675a0cd
docs(gepa): update ReAct reflection prompt to match current implement…
Ju-usc Nov 1, 2025
4a4d209
feat(gepa): warn when ReAct modules detected but optimization disabled
Ju-usc Nov 3, 2025
d84842f
test(gepa): fix DummyLM configuration and remove exception swallowing
Ju-usc Nov 9, 2025
bb28f5f
test(gepa): add failing tests for generic tool optimization
Ju-usc Nov 9, 2025
a590e46
refactor(gepa): rename optimize_react_components to enable_tool_optim…
Ju-usc Nov 9, 2025
6aceaf5
refactor(gepa): extract nested function to private method
Ju-usc Nov 9, 2025
7a5bf05
feat(gepa): detect tool-using predictors via type checking
Ju-usc Nov 9, 2025
12b01ed
test(gepa): update ReAct tests for predictor-name-based keys
Ju-usc Nov 10, 2025
265896c
test(gepa): use explicit predictor keys in tool optimization tests
Ju-usc Nov 10, 2025
fe19dac
feat(gepa): extract tools from runtime traces
Ju-usc Nov 10, 2025
38dd7cb
feat(gepa): detect tool-using predictors at compile time
Ju-usc Nov 10, 2025
7f05a73
refactor(gepa): use predictor identity for ReAct detection
Ju-usc Nov 10, 2025
0a6016d
test(gepa): refactor ReAct tests to use dynamic predictor names
Ju-usc Nov 10, 2025
a635768
refactor(gepa): generalize proposer to support both ReAct and tool mo…
Ju-usc Nov 10, 2025
e35603a
refactor(gepa): eliminate create-delete pattern in base_program build
Ju-usc Nov 10, 2025
ecb3726
refactor(gepa): eliminate ReAct coupling in build_program
Ju-usc Nov 11, 2025
d3693c9
refactor(gepa): apply code cleanup principles consistently
Ju-usc Nov 11, 2025
a086646
refactor(gepa): unify config extraction patterns
Ju-usc Nov 11, 2025
0cecb75
refactor(gepa): remove verbose logs and consolidate comments
Ju-usc Nov 11, 2025
9592c50
docs(gepa): clarify ReAct trace workaround with TODO
Ju-usc Nov 12, 2025
76d7af5
test(gepa): remove deprecated ReAct-specific tests and refactor tool …
Ju-usc Nov 13, 2025
ac66e05
feat(gepa): add assertion for ReAct two-predictor design
Ju-usc Nov 13, 2025
3ec4ada
test(gepa): add DSPy ReAct design docs and improve test consistency
Ju-usc Nov 13, 2025
b679ba2
fix(test): remove trailing whitespace and extra blank lines
Ju-usc Nov 13, 2025
02aa151
refactor(gepa): clarify tool proposer output field descriptions
Ju-usc Nov 14, 2025
d37e433
Merge branch 'main' into feature/tool-description-optimization
Ju-usc Nov 14, 2025
d8b7c66
refactor(gepa): treat args as canonical for tool arg descriptions
Ju-usc Nov 14, 2025
f62a68e
refactor(gepa): tolerate missing arg descriptions when applying tool …
Ju-usc Nov 14, 2025
e031409
refactor(gepa): use args as sole source of tool arg descriptions
Ju-usc Nov 14, 2025
a133545
test(gepa): drop arg_desc expectations from tool optimization tests
Ju-usc Nov 14, 2025
b1e4f3d
refactor(gepa): refine reflection prompts for tool optimization
Ju-usc Nov 19, 2025
7f81e88
refactor(gepa): improve tool extraction robustness and observability
Ju-usc Nov 19, 2025
f267ccc
refactor(gepa): simplify initialization logic
Ju-usc Nov 19, 2025
28ceb70
refactor(gepa): remove ReAct trace workaround
Ju-usc Nov 19, 2025
d8275ef
chore(gepa): clean up whitespace and style changes from tool optimiza…
Ju-usc Nov 19, 2025
deeb010
chore(gepa): clean up whitespace and style changes from tool optimiza…
Ju-usc Nov 19, 2025
4bcc714
chore: restore .gitignore to match main
Ju-usc Nov 19, 2025
4b872d7
docs(gepa): document tool optimization flag in overview
Ju-usc Nov 19, 2025
5129586
docs(gepa): clarify enable_tool_optimization and custom proposers
Ju-usc Nov 19, 2025
ebe4221
docs(gepa): update tool module optimization prompt to match actual code
Ju-usc Nov 20, 2025
2133b0b
docs(gepa): update How Tool Optimization Works section
Ju-usc Nov 20, 2025
9c05b6a
docs(gepa): update When to Use Tool Optimization section
Ju-usc Nov 20, 2025
ec9241b
docs(gepa): update custom proposers section for tool optimization
Ju-usc Nov 20, 2025
46d8f5e
docs(gepa): update usage examples with correct tool patterns and inte…
Ju-usc Nov 20, 2025
5d33fc6
docs(gepa): remove redundant metrics section
Ju-usc Nov 20, 2025
b564029
refactor(gepa): use absolute import for ToolModuleProposer
Ju-usc Nov 20, 2025
13209f5
docs(gepa): update tool optimization doc link
Ju-usc Nov 20, 2025
09990a6
docs(gepa): replace eval() example with get_weather tool
Ju-usc Nov 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 196 additions & 0 deletions docs/docs/api/optimizers/GEPA/GEPA_Advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -443,3 +443,199 @@ gepa = dspy.GEPA(
auto="medium"
)
```

## Tool Description Optimization

### What is optimize_tool_descriptions?

The `optimize_tool_descriptions` parameter enables GEPA to optimize tool descriptions in addition to signature instructions. This is particularly valuable for ReAct agents and other tool-using systems, where the quality of tool descriptions directly impacts the agent's ability to select appropriate tools for each task.

Unlike signature instructions that guide reasoning strategies, tool descriptions serve a fundamentally different purpose: they help agents decide **which tool to use** in a given situation. GEPA recognizes this categorical difference and applies a specialized reflection prompt tailored for tool selection decisions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which tool to use, when to use it, and how to use it. All three are captured by the description.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid the word "fundamentally". One can imagine that all of tool descriptions can (and many times do) simply included in the system prompt itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a corresponding entry in GEPA Overview, that links to this file/section.


### Default Behavior

By default, GEPA only optimizes signature instructions (`optimize_tool_descriptions=False`):

```python
# Default behavior: only signature optimization
gepa = dspy.GEPA(
metric=my_metric,
reflection_lm=dspy.LM(model="gpt-5", temperature=1.0, max_tokens=32000, api_key=api_key),
# optimize_tool_descriptions=False # This is the default
auto="medium"
)
optimized_program = gepa.compile(student, trainset=examples)
```

### When to Use optimize_tool_descriptions

Consider enabling `optimize_tool_descriptions=True` when:

- **Building ReAct agents**: ReAct agents rely on tool descriptions to make action selection decisions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One should consider using this, when they use dspy.Tool anywhere in the DSPy program. Here are a few scenarios for using dspy.Tool:

- **Multi-agent systems**: Systems with nested agents and delegated tools benefit from holistic optimization
- **Poor tool selection**: Your agent frequently selects wrong tools or overlooks appropriate ones
- **Complex tool sets**: When managing many tools with overlapping capabilities
- **Domain-specific tools**: Tools requiring specialized knowledge or context for proper usage

### How It Works

When enabled, GEPA:

1. **Discovers all tools**: Traverses your program including nested sub-modules to find all `dspy.Tool` instances
2. **Categorizes components**: Separates tools (identified by `tool:` prefix) from signature instructions
3. **Routes components appropriately**:
- Signature instructions → Default or custom instruction proposer
- Tool descriptions → ToolProposer (receives ReAct's reflective data with tool-specific annotation)
4. **Optimizes holistically**: Treats tool descriptions as first-class components in the optimization process

### Implementation Details

**Reflective Dataset Construction:**

GEPA's approach to tool optimization is elegantly simple:

1. **ReAct predictors** generate reflective examples containing:
- Inputs: `question`, `trajectory` (full agent execution trace with thoughts, tool calls, observations)
- Generated Outputs: Agent's next action/tool selection decisions
- Feedback: Task outcome and evaluation from the metric

2. **Tools copy ReAct's data** with annotation:
- Each tool receives ReAct's complete reflective examples (same full trajectory context)
- Feedback is prefixed: `[Optimizing tool: 'tool_name'] {original_feedback}`
- This focuses the reflection LM on improving that specific tool's description

3. **Reflection LM sees full context**:
- How the agent reasoned before selecting the tool
- What other tools were available and considered
- Whether the tool selection was successful
- Full multi-step trajectories showing tool composition patterns

This design allows the reflection LM to understand tool usage in context, leading to descriptions that clarify when and how each tool should be used

### Usage Examples

#### Basic ReAct Agent

```python
import dspy

def search_web(query: str) -> str:
"""Search the web for information."""
# Implementation here
return search_results

def calculate(expression: str) -> float:
"""Evaluate a mathematical expression."""
# Implementation here
return result

# Create ReAct agent with tools
search_tool = dspy.Tool(search_web, name="search", desc="Search the web")
calc_tool = dspy.Tool(calculate, name="calculator", desc="Do math")

agent = dspy.ReAct("question -> answer", tools=[search_tool, calc_tool])

# Enable tool optimization
gepa = dspy.GEPA(
metric=my_metric,
reflection_lm=dspy.LM(model="gpt-5", temperature=1.0, max_tokens=32000, api_key=api_key),
optimize_tool_descriptions=True, # Enable tool optimization
auto="medium"
)

optimized_agent = gepa.compile(agent, trainset=train_examples, valset=val_examples)
```

#### Multi-Agent System

For systems with nested agents, GEPA automatically discovers and optimizes all tools:

```python
import dspy

def search_web(query: str) -> str:
"""Search the web."""
# Implementation here
return results

def calculate(expression: str) -> float:
"""Evaluate math expression."""
# Implementation here
return result

# Define tools
search_tool = dspy.Tool(search_web, name="search", desc="Searches web")
calc_tool = dspy.Tool(calculate, name="calculator", desc="Does math")

class ResearchAssistant(dspy.Module):
def __init__(self):
super().__init__()
# Sub-agent with search tool
self.researcher = dspy.ReAct("query -> findings", tools=[search_tool])

# Delegation tool wraps sub-agent
def delegate_research(query: str) -> str:
return self.researcher(query=query).findings

research_tool = dspy.Tool(delegate_research, name="research", desc="Research things")

# Main agent with calculator and research delegation
self.assistant = dspy.ReAct("question -> answer", tools=[research_tool, calc_tool])

def forward(self, question):
return self.assistant(question=question)

# GEPA optimizes ALL tools (calculator, research, search) together
gepa = dspy.GEPA(
metric=my_metric,
reflection_lm=dspy.LM(model="gpt-5", temperature=1.0, max_tokens=32000, api_key=api_key),
optimize_tool_descriptions=True,
auto="medium"
)

optimized_system = gepa.compile(ResearchAssistant(), trainset=train, valset=val)
```

### Inspecting Optimized Tool Descriptions

After optimization, tool descriptions are automatically updated in your program. Access them directly through your module structure:

```python
optimized_agent = gepa.compile(agent, trainset=train, valset=val)

# Access tools directly - descriptions are already updated
print(optimized_agent.tools["search"].desc)
print(optimized_agent.tools["calculator"].desc)
```

For multi-agent systems, access nested tools through your module hierarchy:

```python
optimized_system = gepa.compile(ResearchAssistant(), trainset=train, valset=val)

# Access tools at different levels
print(optimized_system.researcher.tools["search"].desc) # Sub-agent tool
print(optimized_system.assistant.tools["research"].desc) # Main agent tool
print(optimized_system.assistant.tools["calculator"].desc)
```

### Compatibility with Custom Instruction Proposers

Tool optimization works seamlessly with custom instruction proposers. When both are provided:

- Signature instructions → Custom instruction proposer
- Tool descriptions → Built-in `ToolProposer`

```python
from dspy.teleprompt.gepa.instruction_proposal import MultiModalInstructionProposer

gepa = dspy.GEPA(
metric=my_metric,
reflection_lm=dspy.LM(model="gpt-5", temperature=1.0, max_tokens=32000, api_key=api_key),
instruction_proposer=MultiModalInstructionProposer(), # For signatures
optimize_tool_descriptions=True, # Enables ToolProposer for tools
auto="medium"
)
```

**Note:** Tool optimization is fully backward compatible. Existing programs without tools, or with `optimize_tool_descriptions=False`, continue to work exactly as before.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to inform users about backward compatibility here. It should be implicit that there should be no behaviour changes for any program not containing dspy.Tool.

21 changes: 20 additions & 1 deletion dspy/teleprompt/gepa/gepa.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,9 @@ def metric(
warn_on_score_mismatch: GEPA (currently) expects the metric to return the same module-level score when
called with and without the pred_name. This flag (defaults to True) determines whether a warning is
raised if a mismatch in module-level and predictor-level score is detected.
optimize_tool_descriptions: Whether to optimize tool descriptions for modules with tools
(e.g., ReAct agents). When enabled, tool descriptions are included in the optimization
process alongside signature instructions. Default is False.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link to GEPA Advanced/Tool section

seed: The random seed to use for reproducibility. Default is 0.
gepa_kwargs: (Optional) provide additional kwargs to be passed to [gepa.optimize](https://github.com/gepa-ai/gepa/blob/main/src/gepa/api.py) method

Expand Down Expand Up @@ -328,6 +331,7 @@ def __init__(
wandb_init_kwargs: dict[str, Any] | None = None,
track_best_outputs: bool = False,
warn_on_score_mismatch: bool = True,
optimize_tool_descriptions: bool = False,
use_mlflow: bool = False,
# Reproducibility
seed: int | None = 0,
Expand Down Expand Up @@ -390,6 +394,7 @@ def __init__(
self.wandb_api_key = wandb_api_key
self.wandb_init_kwargs = wandb_init_kwargs
self.warn_on_score_mismatch = warn_on_score_mismatch
self.optimize_tool_descriptions = optimize_tool_descriptions
self.use_mlflow = use_mlflow

if track_best_outputs:
Expand Down Expand Up @@ -518,11 +523,25 @@ def feedback_fn(
rng=rng,
reflection_lm=self.reflection_lm,
custom_instruction_proposer=self.custom_instruction_proposer,
warn_on_score_mismatch=self.warn_on_score_mismatch
warn_on_score_mismatch=self.warn_on_score_mismatch,
optimize_tool_descriptions=self.optimize_tool_descriptions
)

# Instantiate GEPA with the simpler adapter-based API
base_program = {name: pred.signature.instructions for name, pred in student.named_predictors()}

if self.optimize_tool_descriptions:
tool_descriptions = {}
for _, module in student.named_sub_modules():
if hasattr(module, "tools"):
for tool_name, tool in module.tools.items():
tool_key = f"tool:{tool_name}"
if tool_key not in tool_descriptions:
tool_descriptions[tool_key] = tool.desc
if tool_descriptions:
logger.info(f"Including {len(tool_descriptions)} tool descriptions for optimization")
base_program.update(tool_descriptions)

gepa_result: GEPAResult = optimize(
seed_candidate=base_program,
trainset=trainset,
Expand Down
Loading
Loading