Skip to content

Commit f3b7f6f

Browse files
authored
Merge pull request #32 from meta-pytorch/pankit-dev-1
[RFC 002] Discoverability of environment tools by agents
2 parents 9f03488 + 97756ad commit f3b7f6f

File tree

1 file changed

+335
-0
lines changed

1 file changed

+335
-0
lines changed

rfcs/002-actions-as-tool-calls.md

Lines changed: 335 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,335 @@
1+
# RFC: Support multiple tool calls via Action wrapper abstraction
2+
3+
**Status**: In Review
4+
**Created**: 10/15/2025
5+
**Authors**: @Darktex, @pankit-eng
6+
**RFC ID**: 002
7+
8+
## Summary
9+
10+
This RFC proposes treating environment actions as tool calls, introducing a standardized pattern where each action represents a discrete, named operation with typed parameters. This approach aligns OpenEnv with modern LLM agent frameworks while maintaining type safety and providing better introspection capabilities for agent training and debugging.
11+
12+
Instead of arbitrary `Action` subclasses with domain-specific fields, actions would follow a tool-call pattern with a `tool_name` and structured `parameters`, making the framework more composable and easier to integrate with tool-using agents.
13+
14+
## Motivation
15+
16+
### Problem Statement
17+
18+
Current action design in OpenEnv treats actions as dataclasses:
19+
20+
```python
21+
@dataclass
22+
class CodeAction(Action):
23+
code: str
24+
25+
@dataclass
26+
class BashAction(Action):
27+
command: str
28+
cwd: Optional[str] = None
29+
```
30+
31+
This approach has several limitations:
32+
33+
1. **Lack of Introspection**: No standard way to discover what actions an environment supports
34+
2. **LLM Integration Friction**: Modern LLM agents use tool-calling patterns with JSON schemas, requiring translation layers
35+
3. **Inconsistent Patterns**: Each environment invents its own action structure without standardization
36+
37+
### Goals
38+
39+
1. **Standardize Action Structure**: Define a consistent pattern for representing actions as tool calls
40+
2. **Enable Tool Discovery**: Provide APIs to introspect available tools in an environment
41+
3. **Improve LLM Integration**: Native compatibility with tool-calling patterns used by Claude, GPT-4, and other models
42+
4. **Maintain Type Safety**: Preserve strong typing while adopting the tool-call pattern
43+
5. **Support Multi-Tool Environments**: Enable environments that expose multiple tools naturally
44+
45+
## Design
46+
47+
### Architecture Overview
48+
49+
```
50+
┌─────────────────────────────────────────────────────────┐
51+
│ Agent/RL Code │
52+
│ │
53+
│ # Tool discovery │
54+
│ tools = env.tools() │
55+
│ # -> [ToolDefinition(name="execute_code", ...)] │
56+
│ │
57+
│ # Execute tool call │
58+
│ action = ToolCallAction( │
59+
│ tool_name="execute_code", │
60+
│ parameters={"code": "print('Hello')"} │
61+
│ ) │
62+
│ observation = env.step(action) │
63+
└─────────────────────────────────────────────────────────┘
64+
│ HTTP
65+
66+
┌─────────────────────────────────────────────────────────┐
67+
│ Environment (Docker Container) │
68+
│ │
69+
│ class PythonCodeActEnv(Environment): │
70+
│ │
71+
│ @tool("execute_code") │
72+
│ def execute_code(self, code: str) -> CodeResult: │
73+
│ return self._executor.run(code) │
74+
│ │
75+
│ def step(self, action: ToolCallAction): │
76+
│ tool_fn = self._get_tool(action.tool_name) │
77+
│ result = tool_fn(**action.parameters) │
78+
│ return self._make_observation(result) │
79+
└─────────────────────────────────────────────────────────┘
80+
```
81+
82+
### Core Abstractions
83+
84+
#### 1. ToolCallAction
85+
86+
```python
87+
from typing import Any, Dict
88+
from dataclasses import dataclass, field
89+
90+
@dataclass(kw_only=True)
91+
class ToolCallAction(Action):
92+
"""Action representing a tool call with name and parameters.
93+
94+
This is the standard action type for tool-based environments.
95+
Environments can support multiple tools by dispatching based on tool_name.
96+
"""
97+
98+
tool_name: str
99+
parameters: Dict[str, Any] = field(default_factory=dict)
100+
```
101+
102+
#### 2. ToolDefinition
103+
104+
```python
105+
from typing import Any, Callable, Dict, List
106+
from dataclasses import dataclass
107+
108+
@dataclass
109+
class ToolParameter:
110+
"""Definition of a tool parameter."""
111+
112+
name: str
113+
type: str # JSON Schema type: "string", "number", "boolean", "object", "array"
114+
description: str
115+
required: bool = True
116+
default: Any = None
117+
118+
@dataclass
119+
class ToolDefinition:
120+
"""Specification of a tool that can be called in an environment.
121+
122+
This follows the format used by Claude, OpenAI, and other LLM providers
123+
for function calling, making it easy to pass directly to model APIs.
124+
"""
125+
126+
name: str
127+
description: str
128+
parameters: List[ToolParameter]
129+
130+
def to_json_schema(self) -> Dict[str, Any]:
131+
"""Convert to JSON Schema format for LLM tool calling."""
132+
return {
133+
"name": self.name,
134+
"description": self.description,
135+
"input_schema": {
136+
"type": "object",
137+
"properties": {
138+
p.name: {
139+
"type": p.type,
140+
"description": p.description,
141+
}
142+
for p in self.parameters
143+
},
144+
"required": [p.name for p in self.parameters if p.required],
145+
},
146+
}
147+
```
148+
149+
#### 3. Enhanced Environment Interface
150+
151+
```python
152+
from typing import List, Optional
153+
154+
class Environment(ABC):
155+
"""Base class for all environment servers."""
156+
157+
@abstractmethod
158+
def reset(self) -> Observation:
159+
"""Reset the environment and return initial observation."""
160+
pass
161+
162+
@abstractmethod
163+
def step(self, action: Action) -> Observation:
164+
"""Take a step in the environment."""
165+
pass
166+
167+
@property
168+
@abstractmethod
169+
def state(self) -> State:
170+
"""Get current environment state."""
171+
pass
172+
173+
def tools(self) -> List[ToolDefinition]:
174+
"""Return list of available tools in this environment.
175+
176+
For backward compatibility, environments that don't implement
177+
tool-based actions can return an empty list.
178+
"""
179+
return []
180+
```
181+
182+
### Key Design Decisions
183+
184+
#### Decision 1: Unified Action Type vs. Per-Tool Action Classes
185+
186+
**Chosen Approach**: Use a single `ToolCallAction` class with `tool_name` and `parameters` fields rather than creating separate action classes per tool.
187+
188+
**Rationale**:
189+
- **Simplicity**: Single action type is easier to understand and work with
190+
- **Flexibility**: Adding new tools doesn't require new action classes
191+
- **LLM Compatibility**: Matches the structure used by for MCP tool calling
192+
- **Type Safety**: JSON Schema validation can still enforce parameter types
193+
- **Composability**: Multi-tool environments work naturally
194+
195+
**Trade-offs**:
196+
- Advantages:
197+
- Less boilerplate (no action class per tool)
198+
- Natural support for dynamic tool sets
199+
- Disadvantages:
200+
- Tool Parameters are `Dict[str, Any]` instead of strongly-typed fields
201+
202+
#### Decision 2: Tool Discovery via `tools()` Method
203+
204+
**Chosen Approach**: Add a `tools()` method to the `Environment` base class that returns `List[ToolDefinition]`.
205+
206+
**Rationale**:
207+
- **Introspection**: Agents can discover what actions are available
208+
- **LLM Integration**: Tool definitions can be passed directly to LLM APIs
209+
- **Documentation**: Self-documenting environments via decorator pattern for declaring tools.
210+
211+
212+
## Examples
213+
214+
### Example 1: Simple Single-Tool Environment
215+
216+
```python
217+
from core.env_server import Environment, Observation, State, ToolCallAction
218+
from core.tools import PyExecutor
219+
220+
class PythonCodeActEnv(Environment):
221+
"""Environment for executing Python code via tool calls."""
222+
223+
def __init__(self):
224+
self._executor = PyExecutor()
225+
self._state = CodeState()
226+
227+
@tool("execute_code", "Execute Python code and return stdout, stderr, and exit code")
228+
def execute_code(self, code: str) -> Dict[str, Any]:
229+
"""Execute Python code.
230+
231+
Args:
232+
code: Python code to execute
233+
234+
Returns:
235+
Dict with stdout, stderr, and exit_code keys
236+
"""
237+
result = self._executor.run(code)
238+
return {
239+
"stdout": result.stdout,
240+
"stderr": result.stderr,
241+
"exit_code": result.exit_code,
242+
}
243+
244+
def reset(self) -> Observation:
245+
self._state = CodeState(episode_id=str(uuid.uuid4()))
246+
return CodeObservation(stdout="", stderr="", exit_code=0)
247+
248+
def step(self, action: Action) -> Observation:
249+
if not isinstance(action, ToolCallAction):
250+
raise ValueError(f"Expected ToolCallAction, got {type(action)}")
251+
252+
# Dispatch to tool method
253+
if action.tool_name == "execute_code":
254+
result = self.execute_code(**action.parameters)
255+
reward = 1 if result["exit_code"] == 0 else -1
256+
self._state.step_count += 1
257+
return CodeObservation(reward=reward, **result)
258+
else:
259+
raise ValueError(f"Unknown tool: {action.tool_name}")
260+
261+
@property
262+
def state(self) -> State:
263+
return self._state
264+
```
265+
266+
267+
### Example 2: Client-Side Usage with LLM
268+
269+
```python
270+
from anthropic import Anthropic
271+
from envs.coding_env import CodingEnv
272+
273+
# Initialize environment
274+
env = CodingEnv.from_docker_image("coding-env:latest")
275+
276+
# Get available tools
277+
tools = env.tools() # Returns List[ToolDefinition]
278+
279+
# Convert to Claude's tool format
280+
claude_tools = [tool.to_json_schema() for tool in tools]
281+
282+
# Initialize Claude client
283+
client = Anthropic()
284+
285+
# Agent loop
286+
observation = env.reset()
287+
messages = [{"role": "user", "content": "Calculate fibonacci(10)"}]
288+
289+
while not observation.done:
290+
# Get model response with tools
291+
response = client.messages.create(
292+
model="claude-3-5-sonnet-20241022",
293+
messages=messages,
294+
tools=claude_tools,
295+
)
296+
297+
# If model wants to use a tool
298+
if response.stop_reason == "tool_use":
299+
tool_use = response.content[0]
300+
301+
# Create action from tool call
302+
action = ToolCallAction(
303+
tool_name=tool_use.name,
304+
parameters=tool_use.input,
305+
tool_call_id=tool_use.id,
306+
)
307+
308+
# Execute in environment
309+
observation = env.step(action)
310+
311+
# Add tool result to messages
312+
messages.append({
313+
"role": "assistant",
314+
"content": response.content,
315+
})
316+
messages.append({
317+
"role": "user",
318+
"content": [{
319+
"type": "tool_result",
320+
"tool_use_id": tool_use.id,
321+
"content": str(observation),
322+
}],
323+
})
324+
print(observation.reward)
325+
else:
326+
break
327+
328+
env.close()
329+
```
330+
331+
## References
332+
333+
- [Anthropic Tool Use Documentation](https://docs.anthropic.com/claude/docs/tool-use)
334+
- [OpenAI Function Calling](https://platform.openai.com/docs/guides/function-calling)
335+
- RFC 001: OpenEnv Framework Specification

0 commit comments

Comments
 (0)