Skip to content

Conversation

@Ju-usc
Copy link

@Ju-usc Ju-usc commented Oct 10, 2025

Summary

Addresses #8706 which requested GEPA to optimize tool descriptions. This PR expands on that to enable comprehensive ReAct module optimization with joint optimization of all four ReAct components: react instructions, extract instructions, tool descriptions, and tool argument descriptions.

When optimize_react_components=True, GEPA discovers all dspy.ReAct modules in your program (including nested multi-agent systems) and uses a specialized reflection prompt to jointly optimize how agents reason, select tools, and extract answers from execution trajectories. All ReAct components are optimized together based on shared execution traces, enabling the reflection LM to generate cohesive instructions since it sees how components work together (not optimized in isolation). This addresses the ReAct trajectory prefix duplication issue (gepa-ai/gepa#97).

Fully backward compatible - Default optimize_react_components=False preserves existing behavior.

Issue

Closes #8706 - Original request was to enable GEPA to optimize tool descriptions. This PR expands on that to optimize all four ReAct components jointly (react instructions, extract instructions, tool descriptions, and tool argument descriptions) for more effective agent optimization.

Changes

Core Implementation

  • Add optimize_react_components parameter to GEPA (default False for backward compatibility)
  • Unified ReAct module optimization - Treats each dspy.ReAct as one module with react/extract/tools as subcomponents, respecting both GEPA's module-level abstraction and DSPy's ReAct module design
  • Efficient reflective dataset - Single trajectory per ReAct execution shared across all components (eliminates duplicate trajectory formatting for separate components)
  • ReActModuleProposer with dynamic signatures - Specialized proposer that generates output fields for each tool/parameter, enabling selective optimization
  • ReAct module discovery - Traverse program via named_sub_modules() to find all dspy.ReAct instances (supports deeply nested multi-agent architectures)
  • Component serialization - Serialize ReAct modules as JSON configs containing react/extract instructions and tool schemas
  • Intelligent routing - Direct ReAct components to ReActModuleProposer, regular predictors to default/custom proposers
  • Component updates - Apply optimized react/extract instructions, tool descriptions, and tool argument descriptions back to ReAct modules (propagates arg_desc to tool.args for prompt rendering)

Testing

  • 8 comprehensive tests covering:
    • Single ReAct module detection
    • Multi-ReAct workflow discovery (mixed ReAct + non-ReAct modules)
    • Nested orchestrator-worker patterns (hierarchical agents)
    • Program building with optimized components
    • Reflective dataset creation with trajectory feedback
  • All tests validate: Discovery logic, JSON serialization, component routing, program reconstruction

Documentation

  • GEPA_Advanced.md - Complete ReAct optimization guide:
    • What gets optimized (4 components with selective optimization)
    • When to use (5 common failure patterns)
    • How it works (discovery → serialization → optimization → application)
    • Usage examples (basic agent + multi-agent system)
    • Custom proposer interface with reference implementation
  • overview.md - Brief introduction linking to advanced guide
  • Reflection prompt documentation - Explains progressive optimization philosophy

Usage Example

Basic ReAct Agent

import dspy

# Define tools
def search_web(query: str) -> str:
    return f"Search results for: {query}"

search_tool = dspy.Tool(search_web, name="search", desc="Searches")

# Create agent
agent = dspy.ReAct("question -> answer", tools=[search_tool])

# Optimize with GEPA
gepa = dspy.GEPA(
    metric=my_metric,  # Your evaluation metric
    reflection_lm=dspy.LM(model="gpt-5"),
    optimize_react_components=True,
    auto="medium"
)

optimized = gepa.compile(agent, trainset=trainset, valset=valset)

Multi-Agent System

import dspy

def search_web(query: str) -> str:
    return f"Search results for: {query}"

def calculate(expression: str) -> float:
    return eval(expression)

search_tool = dspy.Tool(search_web, name="search", desc="Searches")
calc_tool = dspy.Tool(calculate, name="calculator", desc="Computes")

class ResearchAssistant(dspy.Module):
    def __init__(self):
        super().__init__()
        self.researcher = dspy.ReAct("query -> findings", tools=[search_tool])
        
        def delegate_research(query: str) -> str:
            return self.researcher(query=query).findings
        
        research_tool = dspy.Tool(delegate_research, name="research", desc="Delegates")
        self.assistant = dspy.ReAct("question -> answer", tools=[research_tool, calc_tool])
    
    def forward(self, question):
        return self.assistant(question=question)

# Optimizes ALL ReAct modules and tools
gepa = dspy.GEPA(
    metric=my_metric,
    reflection_lm=dspy.LM(model="gpt-5"),
    optimize_react_components=True,
    auto="medium"
)

optimized = gepa.compile(ResearchAssistant(), trainset=trainset, valset=valset)

Key Features

Joint Optimization:

  • React instruction and tool descriptions are optimized together based on execution traces
  • The reflection LM sees how components work together (not optimized in isolation)
  • Can generate more cohesive instructions across all components

Selective Optimization:

  • Reflection LM returns None for components that should stay unchanged
  • Only components that need improvement are updated
  • Enables progressive refinement across GEPA iterations

Multi-Agent Support:

  • Automatically discovers nested ReAct modules
  • Optimizes parent and sub-agent modules cohesively
  • Handles complex delegation patterns

Ju-usc added 3 commits October 9, 2025 20:07
- Add optimize_tool_descriptions parameter (default False) to GEPA
- Extract tool descriptions from all nested modules via named_sub_modules()
- Apply optimized descriptions in DspyAdapter.build_program()
- Enables holistic optimization of tools across main and subagent modules
- Tests: 4 new tests, all 16 pass (4 new + 12 existing)
@Ju-usc
Copy link
Author

Ju-usc commented Oct 10, 2025

Apologies for accidentally closing #8927

Thank you for the thorough review, @LakshyAAAgrawal! I'll address your feedback:

  1. Since tools are categorically different from prompts, they should use a different reflection meta prompt. The default reflection meta prompt is shown here https://dspy.ai/api/optimizers/GEPA/GEPA_Advanced/#default-implementation, whereas I assume that the tool must use somewhat different meta prompt. Can you implement a propose_new_texts method that mimics the default_proposer shown in the link above for all prompts, but calls to a tool description specific prompt/signature for tool evolution.
  2. Can you also add some description to the documentation, explaining that this feature is beneficial for React agents.
  3. (This is not a requirement to merge the PR) Would it be possible to add a simple and short tutorial demonstrating the use and performance improvement via tool evolution?

I'll start working on items 1 and 2 and update the PR soon. Please let me know if you have any specific preferences for the tutorial format!

@LakshyAAAgrawal
Copy link
Collaborator

Thanks a lot! For the tutorial, I think you can follow the current GEPA tutorial format (load a dataset, show an example from the dataset, build a dspy program, evaluate the baseline program on testset, run GEPA with new optimization settings, show the optimized programs' prompts and tool descriptions, and finally evaluate the optimized program).

Hopefully we should be able to see a nice and large gain on agentic tasks with this amazing contribution by you!

- Add ToolProposer with GenerateImprovedToolDescription signature
- Implement routing logic to separate tools from signatures
- Tools use ToolProposer, signatures use custom or parent default
- Backward compatible: preserves existing custom_instruction_proposer behavior
- Add test verifying routing splits components correctly
- Define tool functions outside class for clarity
- Match structure of simple ReAct example
- Add clear comments explaining architecture
- Make code more readable and maintainable
@Ju-usc Ju-usc force-pushed the feature/tool-description-optimization branch from 197f077 to c4f2041 Compare October 10, 2025 09:38
@Ju-usc
Copy link
Author

Ju-usc commented Oct 10, 2025

Hi @LakshyAAAgrawal,

I've implemented the tool-specific proposer as requested! Here's what's included:

1. Tool-Specific Proposer Implementation

  • Added GenerateImprovedToolDescriptionFromFeedback signature with a specialized reflection prompt
  • Implemented ToolProposer and SingleComponentToolProposer following the MultiModalInstructionProposer pattern
  • Routing logic in DspyAdapter that directs tools to ToolProposer and signatures to custom/default proposers
  • Fully backward compatible with existing custom instruction proposers

2. Documentation

  • Added comprehensive section to GEPA_Advanced.md
  • Explains when to use tool optimization (ReAct agents, multi-agent systems)
  • Includes usage examples for both simple and nested agent architectures
  • Documents how to inspect optimized tool descriptions

Reflection Prompt Design:
The tool-specific prompt is intentionally open-ended to avoid prescriptive patterns that might lead to local minima. It asks the LM to identify patterns in successful/unsuccessful tool usage and extract domain-specific information, without suggesting specific heuristics.

Before I create a short tutorial (item #3), would you have any feedback on:

  • The reflection prompt design - is it general enough? Any improvements you'd suggest?
  • The implementation approach - does the routing logic make sense?
  • The documentation - anything unclear or missing?

Any feedback would be helpful before I invest time in the tutorial. Thank you!

@Ju-usc
Copy link
Author

Ju-usc commented Oct 11, 2025

wait there is a bug in the implementation working on it to fix. Also test has to be fixed.

…euse

Tools now copy ReAct's reflective data with tool-specific annotation
instead of complex trajectory extraction. This 15-line approach reuses
ReAct's existing context (thoughts, tool calls, observations) and adds
focused annotation for each tool.

Implementation:
- Tools receive full ReAct reflective examples (same trajectory context)
- Feedback prefixed: [Optimizing tool: 'X'] for focused optimization
- Reflection LM sees complete multi-step execution traces per tool

Benefits:
- Simpler: 15 lines vs 70+ line extraction approach
- Reuses code: No duplicate trajectory formatting logic
- Same context: Tools see full ReAct execution traces
- Clean: Removed all debug output

Tests:
- 4 focused tests following GEPA patterns (removed 1 redundant)
- 226KB fixture with 34 LM + 6 reflection calls
- All tests passing with gpt-5-nano traces

Documentation:
- Updated GEPA_Advanced.md with implementation details
- Explains reflective dataset construction approach

The `optimize_tool_descriptions` parameter enables GEPA to optimize tool descriptions in addition to signature instructions. This is particularly valuable for ReAct agents and other tool-using systems, where the quality of tool descriptions directly impacts the agent's ability to select appropriate tools for each task.

Unlike signature instructions that guide reasoning strategies, tool descriptions serve a fundamentally different purpose: they help agents decide **which tool to use** in a given situation. GEPA recognizes this categorical difference and applies a specialized reflection prompt tailored for tool selection decisions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which tool to use, when to use it, and how to use it. All three are captured by the description.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid the word "fundamentally". One can imagine that all of tool descriptions can (and many times do) simply included in the system prompt itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a corresponding entry in GEPA Overview, that links to this file/section.


Consider enabling `optimize_tool_descriptions=True` when:

- **Building ReAct agents**: ReAct agents rely on tool descriptions to make action selection decisions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One should consider using this, when they use dspy.Tool anywhere in the DSPy program. Here are a few scenarios for using dspy.Tool:

)
```

**Note:** Tool optimization is fully backward compatible. Existing programs without tools, or with `optimize_tool_descriptions=False`, continue to work exactly as before.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to inform users about backward compatibility here. It should be implicit that there should be no behaviour changes for any program not containing dspy.Tool.

raised if a mismatch in module-level and predictor-level score is detected.
optimize_tool_descriptions: Whether to optimize tool descriptions for modules with tools
(e.g., ReAct agents). When enabled, tool descriptions are included in the optimization
process alongside signature instructions. Default is False.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link to GEPA Advanced/Tool section

)

self.propose_new_texts = custom_propose_new_texts
elif self.optimize_tool_descriptions:
Copy link
Collaborator

@LakshyAAAgrawal LakshyAAAgrawal Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge case: What should happen when user tries to provide both a custom proposer, and enables optimize_tool_descriptions

# Handle signature components - replicate proposer's default behavior
sig_texts = {}
if sig_components:
from gepa.strategies.instruction_proposal import InstructionProposalSignature
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slight deviation from this PR, but would be a large enhancement (feel free to ignore):

  1. Create 2 fields, self.instruction_proposal_signature and self.tool_proposer, which are initialized to the default InstructionProposalSignature and ToolProposerSignature.
  2. Take an argument from dspy.GEPA that can override the default signature values.

# Second pass: Process tools by copying ReAct data with annotation
react_module_name = None
for name in ret_d.keys():
if "react" in name.lower():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this robust? Might it be better to use isinstance or some other way?

Your task is to write a better description for this tool.
Read the examples carefully and identify patterns in when the tool was used successfully versus when it was misused or overlooked. Identify any domain-specific information about the tool's capabilities or appropriate usage that may not be available to the assistant in the future. The assistant may have developed effective patterns for tool selection - if so, ensure the tool description supports those patterns.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tool use. Also suggest identifying any failure modes of the tool?

@LakshyAAAgrawal
Copy link
Collaborator

Dear @Ju-usc,

This is a great PR. Thanks a lot! I have tried to be overly critical and made too many nits. Feel free to ignore if you disagree with something. Let me know if you'd like me to address anything!

Regarding the meta prompt, overall I think it looks great. However, I suggest that as you build the tutorial, you may find that the reflection prompt needs tweaking, or the content exposed in reflective_dataset for the tool may be lacking or need improvement. This is going to be an empirical exercise, which will guide what works in the reflection meta prompts. ! Looking forward to the tutorial on this too!

You may already have thoughts about what you'd like to show in the tutorial, but if not, you may consider building off (https://kargarisaac.medium.com/building-and-optimizing-multi-agent-rag-systems-with-dspy-and-gepa-2b88b5838ce2) by @kargarisaac.

- Add GenerateImprovedToolDescriptionFromFeedback signature documentation
- Include tool-aware metric example showing trajectory access
- Document tool prefix annotation in feedback
- Note component_selector applies to both signatures and tools
- Fix 'fundamentally' language per reviewer feedback
- Separate Pass 1 (predictor examples) and Pass 2 (tool aggregation)
- Clarify Generated Outputs includes full trajectory for ReAct
- Fix feedback annotation format to [Tool 'name' from 'predictor_key']
- Add Component Identification & Proposer Routing section
- Explain dual-proposer independence (custom proposer doesn't affect tool proposer)
- Use consistent terminology: 'predictor' and 'signature instructions'
- Configure DummyLM with proper ReAct response format (next_thought, next_tool_name, next_tool_args)
- Remove try/except blocks that silently swallowed exceptions
- Add explanatory comments for why compile should now succeed
- Increase DummyLM repetitions (10→20) to support GEPA iterations

Addresses review feedback from @LakshyAAAgrawal requesting removal of
unexplained exception handling that masked real bugs.

All 8 tests now pass deterministically without silent failures.
- Add 4 core tests for tool optimization beyond ReAct
- test_detect_single_tool: single Tool input field
- test_detect_tool_list: multiple tools with ordering
- test_skip_predictor_without_tools: negative case (passing)
- test_update_tool_and_predictor: reconstruction path

Tests use class-based signatures (required for type detection).
Currently failing (TDD approach) - implementation next.
…ization

Rename flag to reflect generalization beyond ReAct modules:
- optimize_react_components → enable_tool_optimization
- Update documentation to mention custom predictors using dspy.Tool
- Update warning message to use new flag name

This prepares for upcoming feature: generic tool optimization for any
predictor using dspy.Tool, not just dspy.ReAct modules.
Move build_propose_new_texts() from nested function in __init__ to
_build_propose_new_texts() private method per maintainer feedback.

Also simplify LM context handling by using unified context manager
pattern instead of if/else branching (18 lines → 6 lines).

Changes:
- Extract _build_propose_new_texts() as private class method
- Simplify LM context: use 'with dspy.context(lm=self.reflection_lm or dspy.settings.lm)'
- Clean up __init__ (110+ lines nested function → 1 line method call)

Benefits:
- Cleaner class structure (easier to scan __init__)
- Methods testable in isolation
- Reduced code duplication (-26 lines net)
- Addresses maintainer feedback: 'move helper function out as private method'
@LakshyAAAgrawal
Copy link
Collaborator

@Ju-usc, if Option 2 will be detrimental to optimization quality, I think we should preserve option 1, i.e., allow for full generality, but leverage additional information from dspy.ReAct when available.

But the real question is that is there a way to achieve the same optimization quality even with option 2, the generalization.

- Add type-based detection for predictors using dspy.Tool
- Initialize tool-using predictors with JSON structure
- Add inline helper function is_tool_field() for recursive type checking
- Handle Union/Optional types containing Tool
- Enable generic tool optimization beyond dspy.ReAct
- Move inline imports to top of file
- Rename module_path → predictor_name for clarity
- Update all assertions to use full predictor names (e.g., extract.predict)
- Update feedback_map keys to match predictor names
- Simplify multi-agent test assertions (20+ lines → 10 lines)

All 8 ReAct optimization tests now passing with new key structure.
- Replace unpacking pattern with explicit predictor names
- Remove duplicate inline imports (already at top)
- Use TOOL_MODULE_PREFIX:pred consistently across tests
- Improve test docstrings for clarity

All 3 tool tests still passing (1 skipped intentionally).
Runtime tool discovery:
- Import Tool type for isinstance() checks
- Initialize tools_by_predictor dict to collect unique tools
- Add extract_tools_from_value() recursive helper function
- Extract tools from predictor trace inputs during iteration
- Handle single Tool, list[Tool], dict[str, Tool] structures
- Serialize tools to candidate JSON after all traces processed

Implements runtime tool discovery (Change 2).
Captures dynamically injected tools from actual usage patterns.
- Import TOOL_MODULE_PREFIX constant
- Detect predictors with dspy.Tool input fields
- Create prefixed keys: tool_module:{predictor_name}
- Use actual predictor name as JSON config key

Pairs with tool extraction (fe19dac). Together they implement
compile-time detection + runtime extraction for generic tool modules.
- Find extract/react predictors by object identity (not paths)
- Use actual predictor names as JSON config keys
- Module key uses extract_predictor_name for consistency
- Clearer comments about dynamic predictor names

More robust than path-based matching. Config keys are now actual
predictor names (e.g., "multi_agent.react", "multi_agent.extract.predict")
instead of generic "react"/"extract".
- Add get_predictor_name() helper using object identity
- Remove all hardcoded predictor name strings
- Update mock_optimized_react_module() to accept react_module parameter
- Use expected_* naming convention for clarity
- All 11 tests passing with fully dynamic approach
…dules

- Rename ReActModuleProposer → ToolModuleProposer
- Rename signature to GenerateImprovedToolModuleDescriptionsFromFeedback
- Make base signature generic (current_predictor_instruction)
- Dynamically add extract fields only for ReAct modules
- Use prefix checks (REACT_MODULE_PREFIX) for reliable type detection
- Support both 1-predictor (tool) and 2-predictor (ReAct) modules
- Update routing to handle both TOOL_MODULE_PREFIX and REACT_MODULE_PREFIX
- Clean variable names: primary_predictor_key, extract_predictor_key
- Update all docstrings to reflect tool-using modules (not just ReAct)
@chenmoneygithub
Copy link
Collaborator

chenmoneygithub commented Nov 10, 2025

@Ju-usc @LakshyAAAgrawal From my perspective the quality should be on par if we capture the tool trace correctly, because ReAct itself is just a way of interacting with tools. But definitely need to run experiments to find it out.

There is one prerequisite we need to build - capture tool usage in dspy.settings.trace. I see two viable options here:

  1. Use the callback to inject the tool trace into the dspy.settings.trace, which is only enabled when running GPEA + optimize_tool=True.
  2. Explicitly add tool calling inputs/outputs to dspy.settings.trace

I like 1 better because that doesn;t affect existing behavior. In general, I would like to avoid shipping an optimization algorithm that has many special logic for dspy.ReAct, which will create a maintenance blackhole for us in the long run. For example, it is subject to broken without awareness when ReAct code is changed.

@Ju-usc This work is very challenging, but super cool. We really appreciate your contribution here.

Ju-usc and others added 16 commits November 10, 2025 03:59
- Process ReAct modules first, then individual predictors
- Skip predictors already part of module configs (check inside JSON)
- Remove redundant base_program.pop() calls
- No duplicate enable_tool_optimization checks
Replace ReAct-specific logic with generic approach:

Before:
- isinstance(ReAct) checks
- Direct access to module.react/module.extract/module.tools
- Separate if/elif branches for instruction updates

After:
- Program-level __dict__ traversal to find tools
- Unified aggregation: plain strings → module config overrides
- Single application loop (no duplication)

Why __dict__ traversal:
Tools can be declared as single attributes (self.tool), lists
(self.tools=[...]), or dicts (self.tools={...}), and nested in
any dspy.Module. Traversing __dict__ finds all tools regardless
of how they're structured, without coupling to specific module types.

This makes the code resilient to ReAct internal changes and works
for any module using dspy.Tool.
- Use tuple syntax for startswith() (more Pythonic)
- Remove unnecessary try-except for JSON parsing (we control the source)

These follow the same principles applied in build_program refactor.
- Use isinstance(v, str) for predictor filtering (type-based)
- Use .get("tools", {}) for tools extraction (more Pythonic)

Both changes make the code more consistent and resilient to
config structure changes.
Remove ~25 debug/info logs per maintainer feedback:
- Internal routing/processing logs
- Trace processing details
- Reflective example breakdowns
- Config building verbosity

Consolidate multi-line comments into concise single lines while
preserving important context (WHY, not WHAT).
Document that this is a workaround for ReAct's multiple predictor
calls with partial trajectories. After PR stanfordnlp#8999 merges, we should
test if we can remove this and use extract predictor trace directly.
Fail fast with clear error if DSPy's ReAct design changes (missing extract.predict).
Better than silently skipping broken modules.
- Add header note documenting DSPy's two-predictor ReAct design
- Remove test_react_trace_aggregation (was testing DSPy internals)
- Move test tool fixtures to top for reuse
- Fix test_selective_optimization style:
  - Simplify docstring to one-liner
  - Remove verbose inline comments
  - Fix assertion to use program.tools reference (clearer)
- Add consistent GEPA iteration comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Allow GEPA to update tool descriptions and tool error responses

3 participants