Skip to content

Commit 7f5e4cd

Browse files
committed
Added API for managing tabs
1 parent 81afe0e commit 7f5e4cd

File tree

5 files changed

+410
-3
lines changed

5 files changed

+410
-3
lines changed

Dockerfile.local

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ FROM --platform=linux/arm64 node:18-alpine AS eval-server-builder
1717
WORKDIR /workspace
1818

1919
# Copy eval server from browser-operator-core submodule
20-
COPY browser-operator-core/eval-server/nodejs /workspace/eval-server
20+
COPY eval-server/nodejs /workspace/eval-server
2121

2222
WORKDIR /workspace/eval-server
2323

eval-server/nodejs/CLAUDE.md

Lines changed: 163 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,11 +180,173 @@ The server supports runtime LLM configuration via the `configure_llm` JSON-RPC m
180180
}
181181
```
182182

183+
### Tab Management
184+
185+
The evaluation server supports managing browser tabs via REST API endpoints and Chrome DevTools Protocol (CDP).
186+
187+
#### Tab Identification
188+
189+
Each browser tab is identified by a **composite client ID** in the format: `baseClientId:tabId`
190+
191+
- `baseClientId`: The persistent identifier for the DevTools client (e.g., `9907fd8d-92a8-4a6a-bce9-458ec8c57306`)
192+
- `tabId`: The Chrome target ID for the specific tab (e.g., `482D56EE57B1931A3B9D1BFDAF935429`)
193+
194+
#### API Endpoints
195+
196+
**List All Clients and Tabs**
197+
```bash
198+
GET /clients
199+
```
200+
201+
Returns all registered clients with their active tabs, connection status, and readiness state.
202+
203+
Response format:
204+
```json
205+
[
206+
{
207+
"id": "baseClientId",
208+
"name": "Client Name",
209+
"description": "Client Description",
210+
"tabCount": 3,
211+
"tabs": [
212+
{
213+
"tabId": "482D56EE57B1931A3B9D1BFDAF935429",
214+
"compositeClientId": "baseClientId:tabId",
215+
"connected": true,
216+
"ready": true,
217+
"connectedAt": "2025-01-15T10:30:00.000Z",
218+
"remoteAddress": "::ffff:172.18.0.1"
219+
}
220+
]
221+
}
222+
]
223+
```
224+
225+
**List Tabs for Specific Client**
226+
```bash
227+
GET /clients/{clientId}/tabs
228+
```
229+
230+
Returns all tabs for a specific client identified by `baseClientId`.
231+
232+
**Open New Tab**
233+
```bash
234+
POST /tabs/open
235+
Content-Type: application/json
236+
237+
{
238+
"clientId": "baseClientId:tabId",
239+
"url": "https://example.com",
240+
"background": false
241+
}
242+
```
243+
244+
Opens a new tab in the browser associated with the specified client.
245+
246+
Response format:
247+
```json
248+
{
249+
"clientId": "baseClientId:tabId",
250+
"tabId": "newTabId",
251+
"compositeClientId": "baseClientId:newTabId",
252+
"url": "https://example.com",
253+
"status": "opened"
254+
}
255+
```
256+
257+
**Close Tab**
258+
```bash
259+
POST /tabs/close
260+
Content-Type: application/json
261+
262+
{
263+
"clientId": "baseClientId:tabId",
264+
"tabId": "targetTabId"
265+
}
266+
```
267+
268+
Closes the specified tab.
269+
270+
Response format:
271+
```json
272+
{
273+
"clientId": "baseClientId:tabId",
274+
"tabId": "targetTabId",
275+
"status": "closed",
276+
"success": true
277+
}
278+
```
279+
280+
#### Implementation Architecture
281+
282+
**Direct CDP Approach (Current)**
283+
284+
Tab management is implemented using direct Chrome DevTools Protocol (CDP) communication:
285+
286+
1. Server discovers the CDP WebSocket endpoint via `http://localhost:9223/json/version`
287+
2. For each command (open/close), a new WebSocket connection is established to the CDP endpoint
288+
3. Commands are sent using JSON-RPC 2.0 format:
289+
- `Target.createTarget` - Opens new tab
290+
- `Target.closeTarget` - Closes existing tab
291+
4. WebSocket connection is closed after receiving the response
292+
293+
Key implementation files:
294+
- `src/lib/EvalServer.js` - Contains `sendCDPCommand()`, `openTab()`, and `closeTab()` methods
295+
- `src/api-server.js` - REST API endpoints that delegate to EvalServer methods
296+
297+
**Alternative Approach Considered**
298+
299+
An RPC-based approach was initially considered where:
300+
- API server sends JSON-RPC request to DevTools client via WebSocket
301+
- DevTools client executes CDP commands locally
302+
- Response is sent back via JSON-RPC
303+
304+
This was rejected in favor of direct CDP communication for simplicity and reduced latency.
305+
306+
#### Chrome Setup
307+
308+
The browser must be started with remote debugging enabled:
309+
```bash
310+
chromium --remote-debugging-port=9223
311+
```
312+
313+
The CDP endpoint is accessible at:
314+
- HTTP: `http://localhost:9223/json/version`
315+
- WebSocket: `ws://localhost:9223/devtools/browser/{browserId}`
316+
317+
#### Current Limitations
318+
319+
**⚠️ Known Issue: WebSocket Timeout**
320+
321+
Tab opening and closing functionality is currently experiencing a WebSocket timeout issue:
322+
323+
- Symptom: `sendCDPCommand()` times out after 10 seconds with no response
324+
- Error: `CDP command timeout: Target.createTarget`
325+
- Status: Under investigation
326+
- Debugging approach: Added extensive logging to track WebSocket lifecycle events
327+
328+
The CDP endpoint is correctly discovered and accessible, but WebSocket messages are not being received. This may be related to:
329+
- WebSocket handshake issues
330+
- CDP protocol version mismatch
331+
- Network/proxy configuration
332+
- Chrome process state
333+
334+
**Workaround**: Until this issue is resolved, tab management via the API is not functional. Manual CDP testing is required to diagnose the root cause.
335+
336+
#### Future Enhancements
337+
338+
- Automatic tab registration in ClientManager when DevTools connects
339+
- Tab lifecycle events (opened, closed, navigated)
340+
- Bulk tab operations
341+
- Tab metadata (title, URL, favicon)
342+
- Tab grouping and organization
343+
183344
### Configuration
184345

185346
All configuration is managed through environment variables and `src/config.js`. Key settings:
186347
- Server port and host
187348
- OpenAI API configuration
188349
- RPC timeouts
189350
- Logging levels and directories
190-
- Maximum concurrent evaluations
351+
- Maximum concurrent evaluations
352+
- CDP endpoint (default: localhost:9223)

eval-server/nodejs/examples/with-http-wrapper.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ const evalServer = new EvalServer({
1919
console.log('🔧 Creating HTTP wrapper...');
2020
const httpWrapper = new HTTPWrapper(evalServer, {
2121
port: 8080,
22-
host: '127.0.0.1'
22+
host: '0.0.0.0'
2323
});
2424

2525

eval-server/nodejs/src/api-server.js

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,22 @@ class APIServer {
118118
result = await this.triggerEvaluation(JSON.parse(body));
119119
break;
120120

121+
case '/tabs/open':
122+
if (method !== 'POST') {
123+
this.sendError(res, 405, 'Method not allowed');
124+
return;
125+
}
126+
result = await this.openTab(JSON.parse(body));
127+
break;
128+
129+
case '/tabs/close':
130+
if (method !== 'POST') {
131+
this.sendError(res, 405, 'Method not allowed');
132+
return;
133+
}
134+
result = await this.closeTab(JSON.parse(body));
135+
break;
136+
121137
case '/v1/responses':
122138
if (method !== 'POST') {
123139
this.sendError(res, 405, 'Method not allowed');
@@ -286,6 +302,53 @@ class APIServer {
286302

287303
}
288304

305+
async openTab(payload) {
306+
const { clientId, url = 'about:blank', background = false } = payload;
307+
308+
if (!clientId) {
309+
throw new Error('Client ID is required');
310+
}
311+
312+
// Since we use direct CDP, we don't need the client to be connected
313+
// Just extract the baseClientId (first part before colon if composite, or the whole ID)
314+
const baseClientId = clientId.split(':')[0];
315+
316+
const result = await this.evaluationServer.openTab(baseClientId, { url, background });
317+
318+
return {
319+
clientId: baseClientId,
320+
tabId: result.tabId,
321+
compositeClientId: result.compositeClientId,
322+
url: result.url || url,
323+
status: 'opened'
324+
};
325+
}
326+
327+
async closeTab(payload) {
328+
const { clientId, tabId } = payload;
329+
330+
if (!clientId) {
331+
throw new Error('Client ID is required');
332+
}
333+
334+
if (!tabId) {
335+
throw new Error('Tab ID is required');
336+
}
337+
338+
// Since we use direct CDP, we don't need the client to be connected
339+
// Just extract the baseClientId
340+
const baseClientId = clientId.split(':')[0];
341+
342+
const result = await this.evaluationServer.closeTab(baseClientId, { tabId });
343+
344+
return {
345+
clientId: baseClientId,
346+
tabId,
347+
status: 'closed',
348+
success: result.success !== false
349+
};
350+
}
351+
289352
/**
290353
* Handle OpenAI Responses API compatible requests with nested model format
291354
*/

0 commit comments

Comments
 (0)