Skip to content

Commit ad78967

Browse files
committed
Update README.md
1 parent 17a71ee commit ad78967

File tree

1 file changed

+0
-11
lines changed

1 file changed

+0
-11
lines changed

examples/grpo_blackjack/README.md

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -74,17 +74,6 @@ env.close() # Cleanup
7474

7575
Change one environment variable → train on different games!
7676

77-
### GRPO: Group Relative Policy Optimization
78-
79-
GRPO is more efficient than PPO (used by ChatGPT):
80-
81-
| Algorithm | Models Needed | Memory | Speed |
82-
|-----------|---------------|--------|-------|
83-
| PPO (ChatGPT) | 3 (Policy, Reference, Value) | High | Slower |
84-
| **GRPO (DeepSeek R1)** | **2 (Policy, Reference)** | **Lower** | **Faster** |
85-
86-
Key insight: Sample the model multiple times per question, compute group statistics → no Value Model needed!
87-
8877
### Forge: PyTorch-Native Agentic RL
8978

9079
Forge handles all distributed systems complexity:

0 commit comments

Comments
 (0)