add qwen vl to readme

何涛 · 何涛 · commit a9351f2d30bf · 2025-02-14T11:56:50.000+08:00
diff --git a/README.md b/README.md
@@ -20,7 +20,7 @@ ome
 
 ## Key Features
 - **Compatibility**: Designed for various multimodal models.
-- **Integration**: Currently integrated with **GPT-4o, o1, Gemini Pro Vision, Claude 3 and LLaVa.**
+- **Integration**: Currently integrated with **GPT-4o, o1, Gemini Pro Vision, Claude 3, Qwen-VL and LLaVa.**
 - **Future Plans**: Support for additional models.
 
 ## Demo
@@ -76,6 +76,13 @@ Use Claude 3 with Vision to see how it stacks up to GPT-4-Vision at operating a
 operate -m claude-3
 ```
 
+#### Try qwen `-m qwen-vl`
+Use Qwen-vl with Vision to see how it stacks up to GPT-4-Vision at operating a computer. Navigate to the [Qwen dashboard](https://bailian.console.aliyun.com/) to get an API key and run the command below to try it. 
+
+```
+operate -m qwen-vl
+```
+
 #### Try LLaVa Hosted Through Ollama `-m llava`
 If you wish to experiment with the Self-Operating Computer Framework using LLaVA on your own machine, you can with Ollama!   
 *Note: Ollama currently only supports MacOS and Linux. Windows now in Preview*   
diff --git a/operate/main.py b/operate/main.py
@@ -3,7 +3,7 @@
 """
 import argparse
 from operate.utils.style import ANSI_BRIGHT_MAGENTA
-from operate.run_operate import main
+from operate.operate import main
 
 
 def main_entry():
diff --git a/operate/models/apis.py b/operate/models/apis.py
@@ -168,7 +168,8 @@ async def call_qwen_vl_with_ocr(messages, objective, model):
         vision_message = {
             "role": "user",
             "content": [
-                {"type": "text", "text": user_prompt},
+                {"type": "text",
+                 "text": f"{user_prompt}**REMEMBER** Only output json format, do not append any other text."},
                 {
                     "type": "image_url",
                     "image_url": {"url": f"data:image/jpeg;base64,{img_base64}"},
diff --git a/operate/operate.py b/operate/operate.py