Skip to content

Commit b1cc4fa

Browse files
committed
Update readme.md
1 parent 24e6884 commit b1cc4fa

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,9 @@ operate -m gemini-pro-vision
9494
### Optical Character Recognition Mode `-m gpt-4-with-ocr`
9595
The Self-Operating Computer Framework now integrates Optical Character Recognition (OCR) capabilities with the `gpt-4-with-ocr` mode. This mode gives GPT-4 a hash map of clickable elements by coordinates. GPT-4 can decide to `click` elements by text and then the code references the hash map to get the coordinates for that element GPT-4 wanted to click.
9696

97-
Based on recent tests, OCR performs better than `som` and vanilla GPT-4 so we made it the default for the project. To use the OCR mode you can simply write `operate` or `operate -m gpt-4-with-ocr` will also work.
97+
Based on recent tests, OCR performs better than `som` and vanilla GPT-4 so we made it the default for the project. To use the OCR mode you can simply write:
98+
99+
`operate` or `operate -m gpt-4-with-ocr` will also work.
98100

99101
### Set-of-Mark Prompting `-m gpt-4-with-som`
100102
The Self-Operating Computer Framework now supports Set-of-Mark (SoM) Prompting with the `gpt-4-with-som` command. This new visual prompting method enhances the visual grounding capabilities of large multimodal models.

0 commit comments

Comments
 (0)