OthersideAI
diff --git a/‎README.md‎
Lines changed: 12 additions & 23 deletions b/‎README.md‎
Lines changed: 12 additions & 23 deletions
diff --git a/‎evaluate.py‎
Lines changed: 42 additions & 32 deletions b/‎evaluate.py‎
Lines changed: 42 additions & 32 deletions
diff --git a/‎operate/config.py‎
Lines changed: 19 additions & 1 deletion b/‎operate/config.py‎
Lines changed: 19 additions & 1 deletion
@@ -12,14 +12,14 @@
 </div>
 
 <!--
-:rotating_light: **OUTAGE NOTIFICATION: gpt-4-vision-preview**
+:rotating_light: **OUTAGE NOTIFICATION: gpt-4o**
 **This model is currently experiencing an outage so the self-operating computer may not work as expected.**
 -->
 
 
 ## Key Features
 - **Compatibility**: Designed for various multimodal models.
-- **Integration**: Currently integrated with **GPT-4v, Gemini Pro Vision, and LLaVa.**
+- **Integration**: Currently integrated with **GPT-4o, Gemini Pro Vision, Claude 3 and LLaVa.**
 - **Future Plans**: Support for additional models.
 
 ## Ongoing Development
@@ -45,7 +45,7 @@ pip install self-operating-computer
 ```
 operate
 ```
-3. **Enter your OpenAI Key**: If you don't have one, you can obtain an OpenAI key [here](https://platform.openai.com/account/api-keys)
+3. **Enter your OpenAI Key**: If you don't have one, you can obtain an OpenAI key [here](https://platform.openai.com/account/api-keys). If you need you change your key at a later point, run `vim .env` to open the `.env` and replace the old key. 
 
 <div align="center">
   <img src="https://github.com/OthersideAI/self-operating-computer/blob/main/readme/key.png" width="300"  style="margin: 10px;"/>
@@ -58,24 +58,6 @@ operate
   <img src="https://github.com/OthersideAI/self-operating-computer/blob/main/readme/terminal-access-2.png" width="300"  style="margin: 10px;"/>
 </div>
 
-### Alternatively installation with `.sh`
-
-1. **Clone the repo** to a directory on your computer:
-```
-git clone https://github.com/OthersideAI/self-operating-computer.git
-```
-2. **Cd into directory**:
-
-```
-cd self-operating-computer
-```
-
-3. **Run the installation script**: 
-
-```
-./run.sh
-```
-
 ## Using `operate` Modes
 
 ### Multimodal Models  `-m`
@@ -88,7 +70,14 @@ operate -m gemini-pro-vision
 
 **Enter your Google AI Studio API key when terminal prompts you for it** If you don't have one, you can obtain a key [here](https://makersuite.google.com/app/apikey) after setting up your Google AI Studio account. You may also need [authorize credentials for a desktop application](https://ai.google.dev/palm_docs/oauth_quickstart). It took me a bit of time to get it working, if anyone knows a simpler way, please make a PR.
 
-### Locally Hosted LLaVA Through Ollama
+#### Try Claude `-m claude-3`
+Use Claude 3 with Vision to see how it stacks up to GPT-4-Vision at operating a computer. Navigate to the [Claude dashboard](https://console.anthropic.com/dashboard) to get an API key and run the command below to try it. 
+
+```
+operate -m claude-3
+```
+
+#### Try LLaVa Hosted Through Ollama `-m llava`
 If you wish to experiment with the Self-Operating Computer Framework using LLaVA on your own machine, you can with Ollama!   
 *Note: Ollama currently only supports MacOS and Linux*   
 
@@ -187,5 +176,5 @@ Stay updated with the latest developments:
 - This project is compatible with Mac OS, Windows, and Linux (with X server installed).
 
 ## OpenAI Rate Limiting Note
-The ```gpt-4-vision-preview``` model is required. To unlock access to this model, your account needs to spend at least \$5 in API credits. Pre-paying for these credits will unlock access if you haven't already spent the minimum \$5.   
+The ```gpt-4o``` model is required. To unlock access to this model, your account needs to spend at least \$5 in API credits. Pre-paying for these credits will unlock access if you haven't already spent the minimum \$5.   
 Learn more **[here](https://platform.openai.com/docs/guides/rate-limits?context=tier-one)**
@@ -25,7 +25,8 @@
 Guideline: {guideline}
 """
 
-SCREENSHOT_PATH = os.path.join('screenshots', 'screenshot.png')
+SCREENSHOT_PATH = os.path.join("screenshots", "screenshot.png")
+
 
 # Check if on a windows terminal that supports ANSI escape codes
 def supports_ansi():
@@ -37,6 +38,7 @@ def supports_ansi():
     is_a_tty = hasattr(sys.stdout, "isatty") and sys.stdout.isatty()
     return supported_platform and is_a_tty
 
+
 if supports_ansi():
     # Standard green text
     ANSI_GREEN = "\033[32m"
@@ -62,8 +64,8 @@ def supports_ansi():
     ANSI_YELLOW = ""
     ANSI_RED = ""
     ANSI_BRIGHT_MAGENTA = ""
-    
-    
+
+
 def format_evaluation_prompt(guideline):
     prompt = EVALUATION_PROMPT.format(guideline=guideline)
     return prompt
@@ -72,88 +74,95 @@ def format_evaluation_prompt(guideline):
 def parse_eval_content(content):
     try:
         res = json.loads(content)
-        
+
         print(res["reason"])
-        
+
         return res["guideline_met"]
     except:
-        print("The model gave a bad evaluation response and it couldn't be parsed. Exiting...")
+        print(
+            "The model gave a bad evaluation response and it couldn't be parsed. Exiting..."
+        )
         exit(1)
 
 
 def evaluate_final_screenshot(guideline):
-    '''Load the final screenshot and return True or False if it meets the given guideline.'''
+    """Load the final screenshot and return True or False if it meets the given guideline."""
     with open(SCREENSHOT_PATH, "rb") as img_file:
         img_base64 = base64.b64encode(img_file.read()).decode("utf-8")
 
-        eval_message = [{
-            "role": "user",
-            "content": [
-                {"type": "text", "text": format_evaluation_prompt(guideline)},
-                {
-                    "type": "image_url",
-                    "image_url": {"url": f"data:image/jpeg;base64,{img_base64}"},
-                },
-            ],
-        }]
-        
+        eval_message = [
+            {
+                "role": "user",
+                "content": [
+                    {"type": "text", "text": format_evaluation_prompt(guideline)},
+                    {
+                        "type": "image_url",
+                        "image_url": {"url": f"data:image/jpeg;base64,{img_base64}"},
+                    },
+                ],
+            }
+        ]
+
         response = openai.chat.completions.create(
-            model="gpt-4-vision-preview",
+            model="gpt-4o",
             messages=eval_message,
             presence_penalty=1,
             frequency_penalty=1,
             temperature=0.7,
-            max_tokens=300,
         )
 
         eval_content = response.choices[0].message.content
-        
+
         return parse_eval_content(eval_content)
 
 
 def run_test_case(objective, guideline, model):
-    '''Returns True if the result of the test with the given prompt meets the given guideline for the given model.'''
+    """Returns True if the result of the test with the given prompt meets the given guideline for the given model."""
     # Run `operate` with the model to evaluate and the test case prompt
-    subprocess.run(['operate', '-m', model, '--prompt', f'"{objective}"'], stdout=subprocess.DEVNULL)
-    
+    subprocess.run(
+        ["operate", "-m", model, "--prompt", f'"{objective}"'],
+        stdout=subprocess.DEVNULL,
+    )
+
     try:
         result = evaluate_final_screenshot(guideline)
-    except(OSError):
+    except OSError:
         print("[Error] Couldn't open the screenshot for evaluation")
         return False
-    
+
     return result
 
 
 def get_test_model():
     parser = argparse.ArgumentParser(
         description="Run the self-operating-computer with a specified model."
     )
-    
+
     parser.add_argument(
         "-m",
         "--model",
         help="Specify the model to evaluate.",
         required=False,
         default="gpt-4-with-ocr",
     )
-    
+
     return parser.parse_args().model
 
 
 def main():
     load_dotenv()
     openai.api_key = os.getenv("OPENAI_API_KEY")
-    
+
     model = get_test_model()
-    
+
     print(f"{ANSI_BLUE}[EVALUATING MODEL `{model}`]{ANSI_RESET}")
     print(f"{ANSI_BRIGHT_MAGENTA}[STARTING EVALUATION]{ANSI_RESET}")
 
-    passed = 0; failed = 0
+    passed = 0
+    failed = 0
     for objective, guideline in TEST_CASES.items():
         print(f"{ANSI_BLUE}[EVALUATING]{ANSI_RESET} '{objective}'")
-        
+
         result = run_test_case(objective, guideline, model)
         if result:
             print(f"{ANSI_GREEN}[PASSED]{ANSI_RESET} '{objective}'")
@@ -166,5 +175,6 @@ def main():
         f"{ANSI_BRIGHT_MAGENTA}[EVALUATION COMPLETE]{ANSI_RESET} {passed} test{'' if passed == 1 else 's'} passed, {failed} test{'' if failed == 1 else 's'} failed"
     )
 
+
 if __name__ == "__main__":
     main()
@@ -5,6 +5,7 @@
 from dotenv import load_dotenv
 from ollama import Client
 from openai import OpenAI
+import anthropic
 from prompt_toolkit.shortcuts import input_dialog
 
 
@@ -38,6 +39,10 @@ def __init__(self):
         )
         self.ollama_host = (
             None # instance variables are backups in case savint to a `.env` fails
+
+        self.anthropic_api_key = (
+            None  # instance variables are backups in case saving to a `.env` fails
+
         )
 
     def initialize_openai(self):
@@ -91,6 +96,13 @@ def initialize_ollama(self):
         model = Client(host=self.ollama_host)
         return model
 
+    def initialize_anthropic(self):
+        if self.anthropic_api_key:
+            api_key = self.anthropic_api_key
+        else:
+            api_key = os.getenv("ANTHROPIC_API_KEY")
+        return anthropic.Anthropic(api_key=api_key)
+
     def validation(self, model, voice_mode):
         """
         Validate the input parameters for the dialog operation.
@@ -101,11 +113,15 @@ def validation(self, model, voice_mode):
             model == "gpt-4"
             or voice_mode
             or model == "gpt-4-with-som"
-            or model == "gpt-4-with-ocr",
+            or model == "gpt-4-with-ocr"
+            or model == "o1-with-ocr",
         )
         self.require_api_key(
             "GOOGLE_API_KEY", "Google API key", model == "gemini-pro-vision"
         )
+        self.require_api_key(
+            "ANTHROPIC_API_KEY", "Anthropic API key", model == "claude-3"
+        )
 
     def require_api_key(self, key_name, key_description, is_required):
         key_exists = bool(os.environ.get(key_name))
@@ -130,6 +146,8 @@ def prompt_and_save_api_key(self, key_name, key_description):
                 self.openai_api_key = key_value
             elif key_name == "GOOGLE_API_KEY":
                 self.google_api_key = key_value
+            elif key_name == "ANTHROPIC_API_KEY":
+                self.anthropic_api_key = key_value
             self.save_api_key_to_env(key_name, key_value)
             load_dotenv()  # Reload environment variables
             # Update the instance attribute with the new key