Skip to content

Commit 9f03488

Browse files
authored
Merge pull request #26 from meta-pytorch/env_code
Merging this to move quicker. Will refactor.
2 parents 64d4b10 + 4e752b2 commit 9f03488

File tree

1 file changed

+225
-0
lines changed

1 file changed

+225
-0
lines changed

rfcs/001-openenv-spec.md

Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
# RFC: OpenEnv Framework Spec for agent execution environments
2+
3+
**Status**: In Review
4+
**Created**: 10/14/2025
5+
**Authors**: @Darktex, @pankit-eng, @jspisak, @zkwentz
6+
**RFC ID:** 001
7+
8+
## Summary
9+
10+
An e2e framework for creating, deploying and using isolated execution environments for agentic RL training, built using Gymnasium style APIs.It provides a clean client-server architecture where environments run as FastAPI servers in Docker containers, and clients interact with them via type-safe HTTP APIs.
11+
12+
## Motivation
13+
14+
### Problem Statement
15+
16+
Building execution environments for AI agents, code execution, or computational tasks typically involves:
17+
- Complex setup and dependency management
18+
- Security concerns with code execution
19+
- Difficulty in scaling and deploying environments
20+
- Lack of standardized interfaces between environments and clients of environments
21+
22+
### Goals
23+
24+
1. **Simplicity**: Simple APIs to interact with the environment from RL training code
25+
2. **Type Safety**: Strongly-typed actions, observations, and state
26+
3. **Isolation**: Each environment runs in its own Docker container
27+
4. **Observability**: Leverage side-car container pattern to observe actions, observation tuples for an RL training eposide.
28+
29+
30+
## Design
31+
32+
### Architecture Overview
33+
34+
```
35+
┌─────────────────────────────────────────────────────────┐
36+
│ RL code(Client Application) │
37+
│ RL code(Client Application) │
38+
│ ┌────────────────┐ ┌──────────────────┐ │
39+
│ │ Environment │ │ Environment │ │
40+
│ │ Client │ │ Client │ │
41+
│ │ (HTTPEnvClient)│ │ (HTTPEnvClient) │ │
42+
│ └────────┬───────┘ └────────┬─────────┘ │
43+
└───────────┼───────────────────────────────┼─────────────┘
44+
│ HTTP (reset, step, state) │ HTTP
45+
│ │
46+
┌───────────▼───────────────────────────────▼─────────────┐
47+
│ Docker Containers (Isolated) │
48+
│ ┌──────────────────────┐ ┌──────────────────────┐ │
49+
│ │ FastAPI Server │ │ FastAPI Server │ │
50+
│ │ Environment │ │ Environment │ │
51+
│ │ Logic │ │ Logic │ │
52+
│ └──────────────────────┘ └──────────────────────┘ │
53+
└─────────────────────────────────────────────────────────┘
54+
```
55+
56+
### Core Abstractions(Already available on the master)
57+
58+
#### 1. Environment (Server-Side)
59+
60+
```python
61+
class Environment(ABC):
62+
"""Base class for all environments."""
63+
64+
@abstractmethod
65+
def reset(self) -> Observation:
66+
"""Initialize new episode."""
67+
68+
@abstractmethod
69+
def step(self, action: Action) -> Observation:
70+
"""Execute action and return observation."""
71+
72+
@property
73+
@abstractmethod
74+
def state(self) -> State:
75+
"""Get current episode state."""
76+
```
77+
78+
**Design Rationale**:
79+
- Familiar interface for RL/environment practitioners
80+
- Clear separation between action execution (step) and state management
81+
- Abstract base class enforces contract across all environments
82+
83+
#### 2. HTTPEnvClient (Client-Side)
84+
85+
```python
86+
class HTTPEnvClient(Generic[ActT, ObsT]):
87+
"""Base class for HTTP environment clients."""
88+
89+
def reset(self) -> StepResult[ObsT]:
90+
"""Reset environment."""
91+
92+
def step(self, action: ActT) -> StepResult[ObsT]:
93+
"""Execute action."""
94+
95+
def state(self) -> State:
96+
"""Get current state."""
97+
98+
def close(self) -> None:
99+
"""Cleanup resources by signaling to the provider."""
100+
```
101+
102+
**Design Rationale**:
103+
104+
The HTTPEnvClient serves as the primary interface for users to interact with environments, designed with several key principles:
105+
106+
- This base class handles all HTTP communication(resp, req) with the environment
107+
- This base class handles all HTTP communication(resp, req) with the environment
108+
- Generic types (`Generic[ActT, ObsT]`) provide compile-time type safety
109+
- Each environment's concrete client class implements parsing step, observation, and state responses from the server into corresponding data models for the respective response.
110+
- Each environment's concrete client class implements parsing step, observation, and state responses from the server into corresponding data models for the respective response.
111+
- Example: `CodingEnv(HTTPEnvClient[CodeAction, CodeObservation])`
112+
- `state()` method provides visibility into episode metadata
113+
- Explicit `close()` ensures proper resource cleanup
114+
115+
#### 3. Container Providers
116+
117+
```python
118+
class ContainerProvider(ABC):
119+
"""Abstract base for container orchestration."""
120+
121+
@abstractmethod
122+
def start_container(self, image: str, ...) -> str:
123+
"""Start container and return base URL."""
124+
125+
@abstractmethod
126+
def stop_container(self) -> None:
127+
"""Stop and remove container."""
128+
129+
@abstractmethod
130+
def wait_for_ready(self, base_url: str, timeout_s: float) -> None:
131+
"""Wait for container to be ready."""
132+
```
133+
134+
**Design Rationale**:
135+
- Pluggable architecture supports multiple platforms (local Docker, K8s, other orchestration providers)
136+
- Provider abstraction decouples client from deployment details and management with easy integration with existing orchestration solutions
137+
- Provider abstraction decouples client from deployment details and management with easy integration with existing orchestration solutions
138+
- Consistent interface across all providers
139+
- Higher level RL frameworks can implement their own container providers to integrate with their existing orchestration solutions.
140+
- Higher level RL frameworks can implement their own container providers to integrate with their existing orchestration solutions.
141+
142+
### Key Design Decisions
143+
144+
In this RFC, we want to align on four decisions that will shape the overall design of the framework.
145+
146+
#### Decision 1: Baseline API Set
147+
148+
**Chosen Approach**: Define three core APIs as the baseline interface for this framework: `step`, `reset`, and `state`.
149+
150+
**Rationale**:
151+
- **`reset()`**: Initializes a new episode and returns initial observation, providing a clean starting point for agent interactions
152+
- **`step(action)`**: Executes an action and returns an observation, forming the core interaction loop
153+
- **`state()`**: Provides visibility into the current episode state and metadata
154+
155+
These three APIs establish the minimum viable interface for environment interaction and are sufficient for basic RL training workflows. They align with established patterns from Gymnasium and similar frameworks, making them immediately familiar to practitioners.
156+
157+
**Scope**: This RFC focuses exclusively on these baseline APIs. Additional APIs (e.g., `render()`, `seed()`, `close()`, `tools()` and environment-specific utilities) will be explored in follow-up RFCs.
158+
159+
#### Decision 2: Environment-Computed Rewards
160+
161+
**Chosen Approach**: Rewards are computed inside the environment and returned as part of the observation.
162+
163+
**Rationale**:
164+
- **Encapsulation**: Reward logic stays with the environment where domain knowledge resides
165+
- **Consistency**: Ensures reward computation is deterministic and reproducible across different client implementations
166+
- **Flexibility**: Environments can use internal state and context not visible to clients for reward computation
167+
- **Standard Pattern**: Aligns with Gymnasium/Gym conventions where rewards are returned from `step()`
168+
169+
The `Observation` base class includes a `reward` field that environments populate:
170+
171+
```python
172+
@dataclass(kw_only=True)
173+
class Observation:
174+
"""Base class for all environment observations."""
175+
done: bool = False
176+
reward: Union[bool, int, float, None] = None
177+
metadata: Dict[str, Any] = field(default_factory=dict)
178+
```
179+
180+
This design enables environments to compute rewards based on:
181+
- Action outcomes (e.g., exit codes, success/failure)
182+
- Internal state transitions
183+
- Multi-step trajectories
184+
- Domain-specific metrics
185+
186+
Clients receive fully-formed observations with rewards already computed, simplifying the client-side RL loop.
187+
188+
#### Decision 3: HTTP-Based Communication
189+
190+
**Chosen Approach**: Use HTTP/REST for client-server communication
191+
192+
**Rationale**:
193+
- HTTP based RPC is universal and well-understood than other alternatives like grpc or thrift
194+
- Easy to debug with standard tools (curl, Postman)
195+
- Supports language-agnostic clients
196+
- FastAPI provides excellent developer experience
197+
198+
#### Decision 4: Docker-Based runtime isolation and packaging
199+
200+
**Chosen Approach**: Each environment runs in its own Docker container
201+
202+
**Rationale**:
203+
- Strong isolation boundaries compared to process-based isolation
204+
- Reproducible environments with packaged dependencies
205+
- Easy dependency management via Dockerfile
206+
- Industry-standard tooling
207+
208+
209+
### Example Environments
210+
211+
**Purpose**: Test infrastructure, demonstrate patterns, verify deployments
212+
213+
#### Coding Environment
214+
215+
Executes Python code in a sandboxed environment:
216+
217+
```python
218+
from envs.coding_env import CodeAction, CodingEnv
219+
220+
client = CodingEnv.from_docker_image("coding-env:latest")
221+
result = client.step(CodeAction(code="print('Hello, World!')"))
222+
print(result.observation.stdout) # "Hello, World!\n"
223+
print(result.observation.exit_code) # 0
224+
client.close()
225+
```

0 commit comments

Comments
 (0)