You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/envs/browsergym_env/README.md
+16-6Lines changed: 16 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,13 +5,23 @@ BrowserGym is a unified framework for web-based agent tasks that provides access
5
5
6
6
## Why BrowserGym?
7
7
8
-
**Complete Pipeline**: Train on MiniWoB++ → Evaluate on WebArena/VisualWebArena
9
-
-**MiniWoB++**: 100+ simple tasks for training (works immediately, no setup!)
10
-
-**WebArena**: 812 realistic tasks for evaluation (requires backend setup)
11
-
-**VisualWebArena**: Visual navigation tasks
12
-
-**WorkArena**: Enterprise automation tasks
8
+
BrowserGym provides a complete pipeline for developing web agents: train on simple tasks, then evaluate on realistic websites.
13
9
14
-
**Key Advantage**: MiniWoB tasks work out-of-the-box with no external infrastructure needed!
10
+
**What are these benchmarks?**
11
+
12
+
-**MiniWoB++ (Training)**: 100+ synthetic web tasks like "click this button", "fill out this form", "select from dropdown". Each task is a simple webpage with a clear objective. Fast resets, randomized variations, dense rewards. Perfect for learning basic web navigation skills. **No external setup needed** - tasks run in isolated browser sessions.
13
+
14
+
-**WebArena (Evaluation)**: 812 tasks on real websites (e-commerce, forums, GitLab, Wikipedia). Tasks like "find the cheapest laptop and add to cart" or "create a merge request for bug #123". Multi-step, requires reasoning, sparse rewards. Tests if your agent can handle actual websites. **Requires running 7 backend services** (shopping site, GitLab instance, etc).
15
+
16
+
-**VisualWebArena**: Similar to WebArena but requires visual understanding - agents need to interpret images, identify UI elements visually, handle multimodal content.
17
+
18
+
-**WorkArena**: Enterprise software tasks (CRM, project management, business workflows). Tests automation on corporate-style applications.
19
+
20
+
**The training → evaluation pipeline:**
21
+
1. Train on MiniWoB (simple, controlled, fast iterations)
22
+
2. Evaluate on WebArena (complex, realistic, measures real-world capability)
23
+
24
+
**Key advantage**: You can start training immediately with MiniWoB. No need to set up infrastructure just to test if your code works.
0 commit comments