Skip to content

Commit 86c26ce

Browse files
Clarify what BrowserGym benchmarks are
Expanded 'Why BrowserGym?' section to better explain: - MiniWoB: synthetic web tasks (click buttons, forms) in isolated browsers - WebArena: real websites with multi-step tasks (e-commerce, GitLab) - VisualWebArena: requires visual understanding - WorkArena: enterprise software automation Added concrete examples and emphasized training vs evaluation use cases.
1 parent 11875bd commit 86c26ce

File tree

1 file changed

+16
-6
lines changed

1 file changed

+16
-6
lines changed

src/envs/browsergym_env/README.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,23 @@ BrowserGym is a unified framework for web-based agent tasks that provides access
55

66
## Why BrowserGym?
77

8-
**Complete Pipeline**: Train on MiniWoB++ → Evaluate on WebArena/VisualWebArena
9-
- **MiniWoB++**: 100+ simple tasks for training (works immediately, no setup!)
10-
- **WebArena**: 812 realistic tasks for evaluation (requires backend setup)
11-
- **VisualWebArena**: Visual navigation tasks
12-
- **WorkArena**: Enterprise automation tasks
8+
BrowserGym provides a complete pipeline for developing web agents: train on simple tasks, then evaluate on realistic websites.
139

14-
**Key Advantage**: MiniWoB tasks work out-of-the-box with no external infrastructure needed!
10+
**What are these benchmarks?**
11+
12+
- **MiniWoB++ (Training)**: 100+ synthetic web tasks like "click this button", "fill out this form", "select from dropdown". Each task is a simple webpage with a clear objective. Fast resets, randomized variations, dense rewards. Perfect for learning basic web navigation skills. **No external setup needed** - tasks run in isolated browser sessions.
13+
14+
- **WebArena (Evaluation)**: 812 tasks on real websites (e-commerce, forums, GitLab, Wikipedia). Tasks like "find the cheapest laptop and add to cart" or "create a merge request for bug #123". Multi-step, requires reasoning, sparse rewards. Tests if your agent can handle actual websites. **Requires running 7 backend services** (shopping site, GitLab instance, etc).
15+
16+
- **VisualWebArena**: Similar to WebArena but requires visual understanding - agents need to interpret images, identify UI elements visually, handle multimodal content.
17+
18+
- **WorkArena**: Enterprise software tasks (CRM, project management, business workflows). Tests automation on corporate-style applications.
19+
20+
**The training → evaluation pipeline:**
21+
1. Train on MiniWoB (simple, controlled, fast iterations)
22+
2. Evaluate on WebArena (complex, realistic, measures real-world capability)
23+
24+
**Key advantage**: You can start training immediately with MiniWoB. No need to set up infrastructure just to test if your code works.
1525

1626
## Quick Start - Training (MiniWoB)
1727

0 commit comments

Comments
 (0)