[Breaking change] Going all-in on MCP #169

Darktex · 2025-11-08T20:03:27Z

Summary

I have been thinking about revising our abstractions to really go all-in on MCP and reduce the friction between Prod and Training. If accepted, our RFCs (and code)
will change meaningfully.

We are not at this stage yet. This PR introduces a single consolidated RFC (env_tools_rfc.md) that proposes significant architectural changes. It's meant to
spark discussion and alignment before we commit to implementation.

Key Changes Proposed

Environment = MCP Servers + Simulation Layer: Environment wraps and manages MCP servers directly, with a simulation layer (HTTP) that disappears in production
Dual protocol architecture: HTTP for simulation control (reset/step), MCP for agent-tool interaction (present in both training and prod)
Tool registry with sim/prod mapping: Explicit registry maps production tools to simulation equivalents, with HF Hub integration for community-contributed
mappings
Git-based checkpointing: Automatic transactional state management with async sidecar for performance
Event queue as first-class citizen: Empty queue = static env, populated = dynamic env
Data + evals included in Environment: Clear simulation boundary, everything needed for training lives in Environment abstraction

See rfcs/env_tools_rfc.md for full details, rationale, and design decisions.

Progression Plan

Phase 1 (Current): Initial proposal as single RFC explaining the delta
Phase 2: Update existing RFCs (000-004) to reflect new design decisions
Phase 3: Revise project timelines and release plan to accommodate changes

Discussion Points

Tool registry approach and HF Hub integration strategy
Whether to host sim tools directly on HF Hub (currently marked as "under discussion")
Checkpointing transactionality limitations and annotations
Migration path for existing environments

Feedback welcome on all aspects - this is the right time to course-correct before implementation begins.

init27 · 2025-11-10T01:31:04Z

@Darktex this is really detailed and thorough! Thank you so much.

Instead of Qs, I will detail my thoughts and POV below and how they were answered as I read through this:

Open Qs that I was brainstorming before reading:

How to standardize the registery-I was already brainstorming about the Expedia like example before and was wondering if there is a simple way to create a registry when dealing with such examples
Same Q above-reducing friction for env builders by having a skeleton that can be-reused which will also be addressed from above.
RealTime Vs Simulation Time: I was interested in the "employee" example quoted as well-how do we tackle real time vs sim time.

All of these have been well addressed by the RFC.

Some more Qs I have now:

Assuming we are dealing with small model RL which might not be good at tool calling etc and might not need tool calling-are we limiting these to sim envs? (not a pushback, clarifying)

It would be really cool to whiteboard an environment + model training as we start Phase 1

Darktex · 2025-11-10T18:37:42Z

Thanks for the kind words!

Assuming we are dealing with small model RL which might not be good at tool calling etc and might not need tool calling-are we limiting these to sim envs? (not a pushback, clarifying)

I don't think so. At the end of the day, all you are doing in practice is using fastmcp and a @tool decorator on top of Python functions, so it's super lightweight to migrate Python functions to MCP tools. Whether or not the model has been primed for MCP specifically, it will always need a way of listing all the (valid) actions that it can take, and a standardized way of executing them, providing params etc. So in practice it will always need some MCP-like thing.

The only case where this is not true is if your policy is not a LLM but it's instead a "normal" RL policy where your output layer is fixed to have exactly N outputs, where N is the number of legal moves. Something like AlphaGo. But for models like that, OpenEnvs is not gonna be very useful, because you cannot take AlphaGo and start playing Pokemon anyway... So I think that's a reasonable tradeoff to take: assume we build for LLMs.

It would be really cool to whiteboard an environment + model training as we start Phase 1
ACK. Coming up!!!

zkwentz · 2025-11-11T13:05:15Z

Ack'ing this for now. Led to bigger convo on work chat.

Darktex added 3 commits November 6, 2025 15:37

100% human made

2ff0489

First pass of Claude and my answers

1df782e

Second pass

67aee0e

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 8, 2025

Darktex requested review from init27, pankit-eng and zkwentz November 8, 2025 20:03

Remove old diagrams

befb403

zkwentz mentioned this pull request Nov 10, 2025

Add tool calling action types per RFC 004. #149

Open

jspisak added enhancement New feature or request RFC labels Nov 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Breaking change] Going all-in on MCP #169

[Breaking change] Going all-in on MCP #169

Uh oh!

Darktex commented Nov 8, 2025

Uh oh!

init27 commented Nov 10, 2025

Uh oh!

Darktex commented Nov 10, 2025

Uh oh!

zkwentz commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Breaking change] Going all-in on MCP #169

Are you sure you want to change the base?

[Breaking change] Going all-in on MCP #169

Uh oh!

Conversation

Darktex commented Nov 8, 2025

Summary

Key Changes Proposed

Progression Plan

Discussion Points

Uh oh!

init27 commented Nov 10, 2025

Uh oh!

Darktex commented Nov 10, 2025

Uh oh!

zkwentz commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants