Update README.md

init27 · init27 · commit ad78967e6584 · 2025-10-22T18:40:29.000-07:00
diff --git a/examples/grpo_blackjack/README.md b/examples/grpo_blackjack/README.md
@@ -74,17 +74,6 @@ env.close()                       # Cleanup
 
 Change one environment variable → train on different games!
 
-### GRPO: Group Relative Policy Optimization
-
-GRPO is more efficient than PPO (used by ChatGPT):
-
-| Algorithm | Models Needed | Memory | Speed |
-|-----------|---------------|--------|-------|
-| PPO (ChatGPT) | 3 (Policy, Reference, Value) | High | Slower |
-| **GRPO (DeepSeek R1)** | **2 (Policy, Reference)** | **Lower** | **Faster** |
-
-Key insight: Sample the model multiple times per question, compute group statistics → no Value Model needed!
-
 ### Forge: PyTorch-Native Agentic RL
 
 Forge handles all distributed systems complexity: