🤖 perf: optimize Ollama CI caching to <5s startup

ammar-agent · ammar-agent · commit 5081dce50570 · 2025-11-08T16:28:06.000Z
Key improvements:
- Combined binary, library, and model caching into single cache entry
  Previously: separate caches for binary and models
  Now: /usr/local/bin/ollama + /usr/local/lib/ollama + /usr/share/ollama

- Fixed model cache path from ~/.ollama/models to /usr/share/ollama
  Models are stored in system ollama user's home, not runner's home

- Separated installation from server startup
  Install step only runs on cache miss and includes model pull
  Startup step always runs but completes in &lt;5s with cached models

- Optimized readiness checks
  Install: 10s timeout, 0.5s polling (only on cache miss)
  Startup: 5s timeout, 0.2s polling (every run, with cache hit)

- Added cache key based on workflow file hash
  Cache invalidates when workflow changes, ensuring fresh install if needed

Expected timing:
- First run (cache miss): ~60s (download + install + model pull)
- Subsequent runs (cache hit): &lt;5s (just server startup)
- Cache size: ~13GB (gpt-oss:20b model)

Testing: Verified locally that Ollama starts in &lt;1s with cached models
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -99,39 +99,50 @@ jobs:
 
       - uses: ./.github/actions/setup-cmux
 
-      - name: Cache Ollama binary
-        id: cache-ollama-binary
+      - name: Cache Ollama installation
+        id: cache-ollama
         uses: actions/cache@v4
         with:
-          path: /usr/local/bin/ollama
-          key: ${{ runner.os }}-ollama-binary-v1
+          path: |
+            /usr/local/bin/ollama
+            /usr/local/lib/ollama
+            /usr/share/ollama
+          key: ${{ runner.os }}-ollama-complete-v2-${{ hashFiles('.github/workflows/ci.yml') }}
           restore-keys: |
-            ${{ runner.os }}-ollama-binary-
-
-      - name: Cache Ollama models
-        id: cache-ollama-models
-        uses: actions/cache@v4
-        with:
-          path: ~/.ollama/models
-          key: ${{ runner.os }}-ollama-models-v1
-          restore-keys: |
-            ${{ runner.os }}-ollama-models-
+            ${{ runner.os }}-ollama-complete-v2-
 
       - name: Install Ollama
-        if: steps.cache-ollama-binary.outputs.cache-hit != 'true'
+        if: steps.cache-ollama.outputs.cache-hit != 'true'
         run: |
+          echo "Cache miss - installing Ollama and pulling model..."
           curl -fsSL https://ollama.com/install.sh | sh
-
-      - name: Start Ollama and pull models
-        run: |
+          
           # Start Ollama service in background
           ollama serve &
-          # Wait for Ollama to be ready
-          timeout 30 sh -c 'until curl -s http://localhost:11434/api/tags > /dev/null 2>&1; do sleep 1; done'
-          echo "Ollama is ready"
-          # Pull the gpt-oss:20b model for tests (cached after first run)
+          OLLAMA_PID=$!
+          
+          # Wait for Ollama to be ready (fast check with shorter timeout)
+          timeout 10 sh -c 'until curl -sf http://localhost:11434/api/tags > /dev/null 2>&1; do sleep 0.5; done' || {
+            echo "Ollama failed to start"
+            exit 1
+          }
+          
+          echo "Ollama started, pulling gpt-oss:20b model..."
           ollama pull gpt-oss:20b
-          echo "Model pulled successfully"
+          
+          # Stop Ollama to complete installation
+          kill $OLLAMA_PID 2>/dev/null || true
+          wait $OLLAMA_PID 2>/dev/null || true
+          
+          echo "Ollama installation and model pull complete"
+
+      - name: Start Ollama server
+        run: |
+          echo "Starting Ollama server (models cached: ${{ steps.cache-ollama.outputs.cache-hit }})"
+          ollama serve &
+          # Fast readiness check - model is already cached
+          timeout 5 sh -c 'until curl -sf http://localhost:11434/api/tags > /dev/null 2>&1; do sleep 0.2; done'
+          echo "Ollama ready in under 5s"
 
       - name: Build worker files
         run: make build-main