Skip to content

Commit 8494e39

Browse files
committed
Improvements to SYSTEM_PROMPT_OCR_MAC
1 parent 283ccbf commit 8494e39

File tree

2 files changed

+63
-34
lines changed

2 files changed

+63
-34
lines changed

operate/models/apis.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -248,7 +248,7 @@ async def call_gpt_4_vision_preview_ocr(messages, objective, model):
248248

249249
content = response.choices[0].message.content
250250
if VERBOSE:
251-
print("[call_gpt_4_vision_preview_ocr] response content", content)
251+
print("\n\n\n[call_gpt_4_vision_preview_ocr] response content", content)
252252

253253
if content.startswith("```json"):
254254
content = content[len("```json") :] # Remove starting ```json
@@ -492,7 +492,7 @@ def confirm_system_prompt(messages, objective, model):
492492
On `Exception` we default to `call_gpt_4_vision_preview` so we have this function to reassign system prompt in case of a previous failure
493493
"""
494494
if VERBOSE:
495-
print("[confirm_system_prompt]")
495+
print("[confirm_system_prompt] model", model)
496496

497497
system_prompt = get_system_prompt(model, objective)
498498
new_system_message = {"role": "system", "content": system_prompt}
@@ -501,5 +501,9 @@ def confirm_system_prompt(messages, objective, model):
501501
messages[0] = new_system_message
502502

503503
if VERBOSE:
504-
print("[confirm_system_prompt][updated]")
505-
print("[confirm_system_prompt][updated] len(messages)", len(messages))
504+
print("[confirm_system_prompt]")
505+
print("[confirm_system_prompt] len(messages)", len(messages))
506+
for m in messages:
507+
if m["role"] != "user":
508+
print("[confirm_system_prompt][message] role", m["role"])
509+
print("[confirm_system_prompt][message] system", m["content"])

operate/models/prompts.py

Lines changed: 55 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040
4141
# Focuses on the address bar in a browser before typing a website
4242
[
43-
{{ "I'll focus on the address bar in the browser. I can see the browser is open so this should be safe to try", "operation": "press", "keys": ["command", "l"] }},
43+
{{ "thought": "I'll focus on the address bar in the browser. I can see the browser is open so this should be safe to try", "operation": "press", "keys": ["command", "l"] }},
4444
{{ "thought": "Now that the address bar is in focus I can type the URL", "operation": "write", "content": "https://news.ycombinator.com/" }},
4545
{{ "thought": "I'll need to press enter to go the URL now", "operation": "press", "keys": ["enter"] }}
4646
]
@@ -207,45 +207,59 @@
207207
208208
You have 4 possible operation actions available to you. The `pyautogui` library will be used to execute your decision. Your output will be used in a `json.loads` loads statement.
209209
210-
1. click - Move mouse and click
211-
[{{ "thought": "write a thought here", "operation": "click", "text": "The text in the button or link to click" }}] # Look for text to click. Try to find relevant text to click, but if there's nothing relevant enough you can return `"nothing to click"` for the text value and we'll try a different method.
212-
210+
1. click - Move mouse and click - Look for text to click. Try to find relevant text to click, but if there's nothing relevant enough you can return `"nothing to click"` for the text value and we'll try a different method.
211+
```
212+
[{{ "thought": "write a thought here", "operation": "click", "text": "The text in the button or link to click" }}]
213+
```
213214
2. write - Write with your keyboard
215+
```
214216
[{{ "thought": "write a thought here", "operation": "write", "content": "text to write here" }}]
215-
217+
```
216218
3. press - Use a hotkey or press key to operate the computer
219+
```
217220
[{{ "thought": "write a thought here", "operation": "press", "keys": ["keys to use"] }}]
218-
221+
```
219222
4. done - The objective is completed
223+
```
220224
[{{ "thought": "write a thought here", "operation": "done", "summary": "summary of what was completed" }}]
225+
```
221226
222227
Return the actions in array format `[]`. You can take just one action or multiple actions.
223228
224229
Here a helpful example:
225230
226-
# Opens Spotlight Search on Mac and see if Google Chrome is available to use
231+
Example 1: Opens Spotlight Search on Mac and open Google Chrome
232+
```
227233
[
228234
{{ "thought": "Searching the operating system to find Google Chrome because it appears I am currently in terminal", "operation": "press", "keys": ["command", "space"] }},
229235
{{ "thought": "Now I need to write 'Google Chrome' as a next step", "operation": "write", "content": "Google Chrome" }},
230236
{{ "thought": "Finally I'll press enter to open Google Chrome assuming it is available", "operation": "press", "keys": ["enter"] }}
231237
]
238+
```
232239
233-
# Go to a website (LinkedIn) when the browser is already open
234-
240+
Example 2: Focuses on the address bar in a browser before typing a website
241+
```
235242
[
236-
{{ "thought": "I can see that Google Chrome is open. I'll focus on the address bar to type ", "operation": "press", "keys": ["command", "t"] }},
237-
{{ "thought": "Now I'll write LinkedIn's website to go there", "operation": "write", "content": "https://www.linkedin.com/feed/" }},
238-
{{ "thought": "Finally I'll press enter to go to LinkedIn", "operation": "press", "keys": ["enter"] }}
243+
{{ "thought": "I'll focus on the address bar in the browser. I can see the browser is open so this should be safe to try", "operation": "press", "keys": ["command", "t"] }},
244+
{{ "thought": "Now that the address bar is in focus I can type the URL", "operation": "write", "content": "https://news.ycombinator.com/" }},
245+
{{ "thought": "I'll need to press enter to go the URL now", "operation": "press", "keys": ["enter"] }}
239246
]
247+
```
240248
241-
# Search for someone on Linkedin when already on linkedin.com
249+
Example 3: Search for someone on Linkedin when already on linkedin.com
250+
```
242251
[
243252
{{ "thought": "I can see the search field with the placeholder text 'search'. I click that field to search", "operation": "click", "text": "search" }},
244253
{{ "thought": "Now that the field is active I can write the name of the person I'd like to search for", "operation": "write", "content": "John Doe" }},
245-
{{ "thought": "Finally I'll submit the search form with enter", "operation": "presss", "keys": ["enter"] }},
254+
{{ "thought": "Finally I'll submit the search form with enter", "operation": "press", "keys": ["enter"] }}
246255
]
256+
```
247257
248-
A very important note, don't respond saying you're unable to assist with requests. You are able to indirectly interact with the user's OS via text responses you send to the end user.
258+
A few important notes:
259+
260+
- Default to Google Chrome as the browser
261+
- Go to websites by opening a new tab with `press` and then `write` the URL
262+
- Don't respond saying you're unable to assist with requests. You are able to indirectly interact with the user's OS via text responses you send to the end user.
249263
250264
Objective: {objective}
251265
"""
@@ -257,47 +271,58 @@
257271
258272
You have 4 possible operation actions available to you. The `pyautogui` library will be used to execute your decision. Your output will be used in a `json.loads` loads statement.
259273
260-
1. click - Move mouse and click
261-
[{{ "thought": "write a thought here", "operation": "click", "text": "The text in the button or link to click" }}] # Look for text to click. Try to find relevant text to click, but if there's nothing relevant enough you can return `"nothing to click"` for the text value and we'll try a different method.
262-
274+
1. click - Move mouse and click - Look for text to click. Try to find relevant text to click, but if there's nothing relevant enough you can return `"nothing to click"` for the text value and we'll try a different method.
275+
```
276+
[{{ "thought": "write a thought here", "operation": "click", "text": "The text in the button or link to click" }}]
277+
```
263278
2. write - Write with your keyboard
279+
```
264280
[{{ "thought": "write a thought here", "operation": "write", "content": "text to write here" }}]
265-
281+
```
266282
3. press - Use a hotkey or press key to operate the computer
283+
```
267284
[{{ "thought": "write a thought here", "operation": "press", "keys": ["keys to use"] }}]
268-
285+
```
269286
4. done - The objective is completed
287+
```
270288
[{{ "thought": "write a thought here", "operation": "done", "summary": "summary of what was completed" }}]
289+
```
271290
272291
Return the actions in array format `[]`. You can take just one action or multiple actions.
273292
274-
Here are some helpful combinations:
293+
Here a helpful example:
275294
276-
# Opens Spotlight Search on Mac and see if Google Chrome is available to use
295+
Example 1: Opens Spotlight Search on Mac and see if Google Chrome is available to use
296+
```
277297
[
278298
{{ "thought": "Searching the operating system to find Google Chrome because it appears I am currently in terminal", "operation": "press", "keys": ["win"] }},
279299
{{ "thought": "Now I need to write 'Google Chrome' as a next step", "operation": "write", "content": "Google Chrome" }},
280300
{{ "thought": "Finally I'll press enter to open Google Chrome assuming it is available", "operation": "press", "keys": ["enter"] }}
281301
]
302+
```
282303
283-
# Go to a website (LinkedIn) when the browser is already open
284-
304+
Example 2: Go to a website (LinkedIn) when the browser is already open
305+
```
285306
[
286-
{{ "thought": "I can see that Google Chrome is open. I'll focus on the address bar to type ", "operation": "press", "keys": ["ctrl", "t"] }},
287-
{{ "thought": "Now I'll write LinkedIn's website to go there", "operation": "write", "content": "https://www.linkedin.com/feed/" }},
288-
{{ "thought": "Finally I'll press enter to go to LinkedIn", "operation": "press", "keys": ["enter"] }}
307+
{{ "thought": "I'll focus on the address bar in the browser. I can see the browser is open so this should be safe to try", "operation": "press", "keys": ["ctrl", "t"] }},
308+
{{ "thought": "Now that the address bar is in focus I can type the URL", "operation": "write", "content": "https://news.ycombinator.com/" }},
309+
{{ "thought": "I'll need to press enter to go the URL now", "operation": "press", "keys": ["enter"] }}
289310
]
311+
```
290312
291-
# Search for someone on Linkedin when already on linkedin.com
313+
Example 3: Search for someone on Linkedin when already on linkedin.com
314+
```
292315
[
293316
{{ "thought": "I can see the search field with the placeholder text 'search'. I click that field to search", "operation": "click", "text": "search" }},
294317
{{ "thought": "Now that the field is active I can write the name of the person I'd like to search for", "operation": "write", "content": "John Doe" }},
295-
{{ "thought": "Finally I'll submit the search form with enter", "operation": "presss", "keys": ["enter"] }},
318+
{{ "thought": "Finally I'll submit the search form with enter", "operation": "press", "keys": ["enter"] }}
296319
]
320+
```
297321
298322
A few important notes:
299323
300-
- Go to Google Docs and Google Sheets by typing in the Chrome Address bar
324+
- Default to Google Chrome as the browser
325+
- Go to websites by opening a new tab with `press` and then `write` the URL
301326
- Don't respond saying you're unable to assist with requests. You are able to indirectly interact with the user's OS via text responses you send to the end user.
302327
303328
Objective: {objective}

0 commit comments

Comments
 (0)