You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rfcs/000-project-phases.md
+11-4Lines changed: 11 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,20 +14,27 @@ We recommend starting here to get a mental model of what's in here already, what
14
14
This project aims at standardizing environments for both training and evaluation. In the training space, this means also standardizing reward pipelines, while in the eval space this means helping with reproducibility where a model can be shipped with a complete set of agentic evals that can be easily run by others.
15
15
16
16
### The problem with abstraction boundaries
17
-
Ideally, we would draw a boundary between environments and everything else (orchestration, resource allocation, RPCs, etc). We will try to do this as much as possible, but we will have to create additional interfaces so that if folks want to cross this boundary, they can. This will likely be necessary for things like reward pipelines that call reward models (which will very likely need to RPC to GPU machines), as well as for agentic evals like Tau where the eval itself involve two agents interacting with one another (and sending many RPCs).
17
+
Ideally, we would draw a boundary between environments and everything else (orchestration, resource allocation, RPCs, etc). We will try to do this as much as possible, but we will have to create additional interfaces so that if folks want to cross this boundary, they can. This will likely be necessary for things like:
18
+
- Reward pipelines that call reward models (which will very likely need to RPC to GPU machines)
19
+
- Agentic evals like Tau where the eval itself involves two agents interacting with one another (and sending many RPCs)
20
+
- Container provider interfaces to support different deployment targets (Docker, Kubernetes, cloud providers, etc.)
18
21
19
22
## Phases
20
23
We plan to build things incrementally, from inside out, adding and expanding only whenever necessary.
21
24
22
25
We will group development from now till version 1.0 into three phases.
23
26
24
27
In the **first phase** of this project, we will focus **exclusively** on the narrowest definition of environments, without even worrying about rewards nor evals. Instead, the focus in this phase (and in the RFCs you find in this directory) is going to be on:
25
-
1. Establishing a convention on what is an environment and where we draw the "environment" box.
26
-
2. Nailing our tools support
27
-
3. Landing the basics of _sandboxing_, _versioning_, _binary distribution_, _dependency management_.
28
+
1. Establishing a convention on what is an environment and where we draw the "environment" box (RFC 001).
29
+
2. Landing the basics of _sandboxing_, _versioning_, _binary distribution_, _dependency management_ (RFC 002).
30
+
3. Nailing our tools support through MCP (Model Context Protocol) integration for both remote and local tools (RFC 003).
31
+
4. Defining a unified action interface for all environment types (RFC 004).
32
+
5. Exploring RPC communication patterns beyond HTTP for long-running sessions (particularly for interpreted languages like Python, Bash, Ruby, etc.). Coming in an upcoming RFC.
28
33
29
34
We will conclude this phase with version 0.3.
30
35
36
+
**Note on versioning**: We're using 0.3 increments for each phase to leave room for minor releases and patches within each phase. This gives us flexibility to ship iteratively while working toward each phase's goals.
37
+
31
38
In the **second phase** of this project, we will add rewards. Reward pipelines are crucial to get right since that is one of the main levels that ML engineers have to improve the model (much more than the general algorithm) so all the questions around versioning and deps that we tackled in the first phase will start being useful right away. We will introduce a way to RPC outside the Environment boundary: we expect this to require active discussion.
0 commit comments