You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rfcs/001-openenv-spec.md
+31-2Lines changed: 31 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -156,7 +156,36 @@ These three APIs establish the minimum viable interface for environment interact
156
156
157
157
**Scope**: This RFC focuses exclusively on these baseline APIs. Additional APIs (e.g., `render()`, `seed()`, `close()`, `tools()` and environment-specific utilities) will be explored in follow-up RFCs.
158
158
159
-
#### Decision 2: HTTP-Based Communication
159
+
#### Decision 2: Environment-Computed Rewards
160
+
161
+
**Chosen Approach**: Rewards are computed inside the environment and returned as part of the observation.
162
+
163
+
**Rationale**:
164
+
-**Encapsulation**: Reward logic stays with the environment where domain knowledge resides
165
+
-**Consistency**: Ensures reward computation is deterministic and reproducible across different client implementations
166
+
-**Flexibility**: Environments can use internal state and context not visible to clients for reward computation
167
+
-**Standard Pattern**: Aligns with Gymnasium/Gym conventions where rewards are returned from `step()`
168
+
169
+
The `Observation` base class includes a `reward` field that environments populate:
170
+
171
+
```python
172
+
@dataclass(kw_only=True)
173
+
classObservation:
174
+
"""Base class for all environment observations."""
0 commit comments