Skip to content

Commit 2a8fdd9

Browse files
seanzhougooglecopybara-github
authored andcommitted
chore: Add computer use sample agent
PiperOrigin-RevId: 820407078
1 parent 37a153e commit 2a8fdd9

File tree

4 files changed

+452
-0
lines changed

4 files changed

+452
-0
lines changed
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Computer Use Agent
2+
3+
This directory contains a computer use agent that can operate a browser to complete user tasks. The agent uses Playwright to control a Chromium browser and can interact with web pages by taking screenshots, clicking, typing, and navigating.
4+
5+
This agent is to demo the usage of ComputerUseToolset.
6+
7+
8+
## Overview
9+
10+
The computer use agent consists of:
11+
- `agent.py`: Main agent configuration using Google's gemini-2.5-computer-use-preview-10-2025 model
12+
- `playwright.py`: Playwright-based computer implementation for browser automation
13+
- `requirements.txt`: Python dependencies
14+
15+
## Setup
16+
17+
### 1. Install Python Dependencies
18+
19+
Install the required Python packages from the requirements file:
20+
21+
```bash
22+
uv pip install -r internal/samples/computer_use/requirements.txt
23+
```
24+
25+
### 2. Install Playwright Dependencies
26+
27+
Install Playwright's system dependencies for Chromium:
28+
29+
```bash
30+
playwright install-deps chromium
31+
```
32+
33+
### 3. Install Chromium Browser
34+
35+
Install the Chromium browser for Playwright:
36+
37+
```bash
38+
playwright install chromium
39+
```
40+
41+
## Usage
42+
43+
### Running the Agent
44+
45+
To start the computer use agent, run the following command from the project root:
46+
47+
```bash
48+
adk web internal/samples
49+
```
50+
51+
This will start the ADK web interface where you can interact with the computer_use agent.
52+
53+
### Example Queries
54+
55+
Once the agent is running, you can send queries like:
56+
57+
```
58+
find me a flight from SF to Hawaii on next Monday, coming back on next Friday. start by navigating directly to flights.google.com
59+
```
60+
61+
The agent will:
62+
1. Open a browser window
63+
2. Navigate to the specified website
64+
3. Interact with the page elements to complete your task
65+
4. Provide updates on its progress
66+
67+
### Other Example Tasks
68+
69+
- Book hotel reservations
70+
- Search for products online
71+
- Fill out forms
72+
- Navigate complex websites
73+
- Research information across multiple pages
74+
75+
## Technical Details
76+
77+
- **Model**: Uses Google's `gemini-2.5-computer-use-preview-10-2025` model for computer use capabilities
78+
- **Browser**: Automated Chromium browser via Playwright
79+
- **Screen Size**: Configured for 600x800 resolution
80+
- **Tools**: Uses ComputerUseToolset for screen capture, clicking, typing, and scrolling
81+
82+
## Troubleshooting
83+
84+
If you encounter issues:
85+
86+
1. **Playwright not found**: Make sure you've run both `playwright install-deps chromium` and `playwright install chromium`
87+
2. **Dependencies missing**: Verify all packages from `requirements.txt` are installed
88+
3. **Browser crashes**: Check that your system supports Chromium and has sufficient resources
89+
4. **Permission errors**: Ensure your user has permission to run browser automation tools
90+
91+
## Notes
92+
93+
- The agent operates in a controlled browser environment
94+
- Screenshots are taken to help the agent understand the current state
95+
- The agent will provide updates on its actions as it works
96+
- Be patient as complex tasks may take some time to complete
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
from google.adk import Agent
16+
from google.adk.models.google_llm import Gemini
17+
from google.adk.tools.computer_use.computer_use_toolset import ComputerUseToolset
18+
from typing_extensions import override
19+
20+
from .playwright import PlaywrightComputer
21+
22+
root_agent = Agent(
23+
model='gemini-2.5-computer-use-preview-10-2025',
24+
name='hello_world_agent',
25+
description=(
26+
'computer use agent that can operate a browser on a computer to finish'
27+
' user tasks'
28+
),
29+
instruction="""
30+
you are a computer use agent
31+
""",
32+
tools=[
33+
ComputerUseToolset(computer=PlaywrightComputer(screen_size=(1280, 936)))
34+
],
35+
)

0 commit comments

Comments
 (0)