-
Notifications
You must be signed in to change notification settings - Fork 1
Chore/restructuring #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
evals/lib/webarena_adapter.py
Outdated
| def __init__(self, config_dir: Optional[Path] = None): | ||
| """ | ||
| Initialize task loader. | ||
| Args: | ||
| config_dir: Path to WebArena config_files directory. | ||
| Defaults to submodules/webarena/config_files/ | ||
| """ | ||
| if config_dir is None: | ||
| # Go from evals/lib/ to project root, then to submodules/webarena/config_files | ||
| project_root = Path(__file__).parent.parent.parent | ||
| webarena_dir = project_root / 'submodules' / 'webarena' | ||
| config_dir = webarena_dir / 'config_files' | ||
|
|
||
| self.config_dir = Path(config_dir) | ||
|
|
||
| if not self.config_dir.exists(): | ||
| raise FileNotFoundError(f"Config directory not found: {self.config_dir}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update WebArenaTaskLoader default path to new data location
The new WebArena utilities default to loading tasks from submodules/webarena/config_files, but that directory no longer exists in the repo after the restructure (only evals/webarena/... was added). Instantiating WebArenaTaskLoader() immediately raises FileNotFoundError, so run_webarena.py, test_webarena_integration.py, and the list/run helpers cannot load any tasks. The loader should default to the new in-repo path (or accept a parameter) instead of referencing a removed submodule.
Useful? React with 👍 / 👎.
This restructuring organizes the repository to support different deployment types and evaluation frameworks, with all git submodules in a dedicated submodules/ folder. Changes: - Moved all submodules to submodules/ directory: - kernel-images → submodules/kernel-images - browser-operator-core → submodules/browser-operator-core - webarena → submodules/webarena - Updated .gitmodules to point all submodules to submodules/ directory - Updated all Dockerfiles to use submodules/ paths: - Dockerfile.devtools - Dockerfile.kernel-cloud - deployments/cloudrun/Dockerfile - deployments/local/Dockerfile - deployments/local-webarena/Dockerfile - Updated Makefiles to initialize submodules from submodules/ directory: - deployments/local/Makefile - deployments/local-webarena/Makefile - Moved WebArena config files to evals/webarena/config_files/ - Copied 812 benchmark task configs from submodule - Fixed WebArenaTaskLoader to try new location first with fallback: - evals/lib/webarena_adapter.py - Fixed EvalLoader to support evals/native/data/ structure: - evals/lib/eval_loader.py (path resolution for restructured evals) - Updated documentation to reflect new structure: - CLAUDE.md (main technical docs) - evals/CLAUDE.md (evals-specific docs) The restructuring supports three deployment types: - deployments/local/ - Local development - deployments/local-webarena/ - Local with WebArena - deployments/cloudrun/ - Google Cloud Run All submodules are now properly registered and will download to submodules/ when running 'git submodule update --init'.
…submodule version The kernel-images submodule doesn't contain start-chromium.sh. All deployment types use custom patched versions in deployments/*/scripts/. Changes: - deployments/local/Dockerfile: Remove duplicate COPY from submodule - deployments/local-webarena/Dockerfile: Remove duplicate COPY from submodule - deployments/cloudrun/Dockerfile: Use local scripts/start-chromium.sh This fixes the build error: failed to compute cache key: start-chromium.sh: not found
Both run-local.sh scripts were checking for kernel-images at the old root-level location instead of submodules/kernel-images. Changes: - deployments/local/run-local.sh: Update paths to submodules/kernel-images - deployments/local-webarena/run-local.sh: Update paths to submodules/kernel-images This fixes the error: Error: kernel-images submodule not found or incomplete
Removes version suffix for cleaner naming. The v2 version is now the primary login script using YAML-based task definitions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
No description provided.