Skip to content

Conversation

@ThomasK33
Copy link
Member

@ThomasK33 ThomasK33 commented Nov 7, 2025

Summary

Adds first-class Ollama support to cmux, enabling users to run local LLMs (e.g., ollama:llama3.2, ollama:qwen2.5-coder) with full integration into cmux's agent workflow including streaming, tool calling, and reasoning.

Implementation

Core Integration

  • Model string parsing - Handles Ollama model IDs with colons (e.g., ollama:gpt-oss:20b)
    • Splits only on first colon to preserve model ID format
    • Returns clear error messages for invalid formats
  • Ollama provider - Uses ollama-ai-provider-v2 from Vercel AI SDK
    • Lazy-loaded to avoid startup penalty
    • Configured with compatibility: "strict" for better API compatibility
    • Respects baseURL from config (http://localhost:11434/api)
  • Stream management - Full support for Ollama's streaming responses
    • Text deltas, reasoning tokens, tool calls
    • Proper stream-end event emission with usage metadata

Configuration

  • Config template - Ollama section in providers.jsonc template
    • Default baseURL: http://localhost:11434/api
    • Extensible for custom Ollama endpoints
  • Documentation - Updated model configuration docs
    • Examples for common Ollama models
    • baseURL configuration instructions

Testing

4 integration tests (102s total runtime):

  • Basic message streaming with Ollama
  • Tool calling (bash tool)
  • File operations (file_read tool)
  • Error handling (server not running)

964 unit tests pass
All CI checks pass

Bug Fixes Applied During Development

Model parsing:

  • Fixed split(":") breaking on model IDs with multiple colons
  • Now uses indexOf() + slice() for first-colon split only

Test assertions:

  • Fixed tests calling .join("") on event objects instead of delta text
  • Updated to use extractTextFromEvents() helper

Test timing:

  • Removed test.concurrent() to prevent race conditions
  • Sequential execution resolves stream-end event collection issues

Configuration:

  • Ensured /api suffix on baseURL in all locations:
    • Test setup, config template, docs, CI environment

How to Use

  1. Install Ollama locally: https://ollama.com/download
  2. Pull a model: ollama pull llama3.2
  3. Configure cmux in ~/.cmux/providers.jsonc:
{
  "providers": {
    "ollama": {
      "type": "ollama",
      "baseUrl": "http://localhost:11434/api"
    }
  },
  "models": {
    "ollama:llama3.2": {
      "providerName": "ollama",
      "modelId": "llama3.2"
    }
  }
}
  1. Select model in cmux: Choose ollama:llama3.2 from model picker
  2. Start chatting - Full tool calling and reasoning support works automatically

Breaking Changes

None - additive only. Existing providers unchanged.

Future Enhancements (Separate PRs)

  • Auto-detect Ollama server and installed models
  • Pull models from UI with progress indicator
  • Ollama-specific settings panel
  • Health check UI for server status

Generated with cmux

Integrates ollama-ai-provider-v2 to enable running AI models locally
through Ollama without requiring API keys.

Changes:
- Add ollama-ai-provider-v2 dependency
- Implement Ollama provider in aiService.ts with lazy loading
- Add OllamaProviderOptions type for future extensibility
- Support Ollama model display formatting (e.g., llama3.2:7b -> Llama 3.2 (7B))
- Update providers.jsonc template with Ollama configuration example
- Add comprehensive Ollama documentation to models.md
- Add unit tests for Ollama model name formatting

Ollama is a local service that doesn't require API keys. Users can run
any model from the Ollama Library (https://ollama.com/library) locally.

Example configuration in ~/.cmux/providers.jsonc:
{
  "ollama": {
    "baseUrl": "http://localhost:11434"
  }
}

Example model usage: ollama:llama3.2:7b

_Generated with `cmux`_
Adds comprehensive integration tests for Ollama provider to verify tool
calling and file operations work correctly with local models.

Changes:
- Add tests/ipcMain/ollama.test.ts with 4 test cases:
  * Basic message sending and response
  * Tool calling with bash tool (gpt-oss:20b)
  * File operations with file_read tool
  * Error handling when Ollama is not running
- Update setupWorkspace() to handle Ollama (no API key required)
- Update setupProviders() type signature for optional baseUrl
- Add Ollama installation and model pulling to CI workflow
- Configure CI to run Ollama tests with gpt-oss:20b model

The tests verify that Ollama can:
- Send messages and receive streaming responses
- Execute bash commands via tool calling
- Read files using the file_read tool
- Handle connection errors gracefully

CI Setup:
- Installs Ollama via official install script
- Pulls gpt-oss:20b model for tests
- Waits for Ollama service to be ready before running tests
- Sets OLLAMA_BASE_URL environment variable for tests

_Generated with `cmux`_
Cache Ollama models between CI runs to speed up integration tests.
The gpt-oss:20b model can be large, so caching saves significant time
on subsequent test runs.

Cache key: ${{ runner.os }}-ollama-gpt-oss-20b-v1

_Generated with `cmux`_
- Make cache keys more generic and future-proof
- Cache Ollama binary separately for instant cached runs
- Update model examples to popular models (gpt-oss, qwen3-coder)

Changes:
- Split Ollama caching into binary + models for better performance
- Only install Ollama if binary is not cached (saves time)
- Update docs to reference gpt-oss:20b, gpt-oss:120b, qwen3-coder:30b

_Generated with `cmux`_
- Fixed model string parsing to handle colons in model IDs (e.g., ollama:gpt-oss:20b)
  Split only on first colon instead of all colons
- Added Ollama compatibility mode (strict) for better API compatibility
- Fixed baseURL configuration to include /api suffix consistently
  Updated test setup, config template, docs, and CI
- Fixed test assertions to use extractTextFromEvents() helper
  Tests were incorrectly calling .join() on event objects instead of extracting delta text
- Removed test concurrency to prevent race conditions
  Sequential execution resolves stream-end event timing issues
- Updated file operations test to use README.md instead of package.json
  More reliable for test workspace environment

All 4 Ollama integration tests now pass consistently (102s total runtime)
@ammar-agent ammar-agent changed the title 🤖 feat: Add first-class Ollama support (auto-detect, auto-start, use Vercel AI SDK) 🤖 feat: add Ollama local model support with Vercel AI SDK integration Nov 8, 2025
@ammario ammario changed the title 🤖 feat: add Ollama local model support with Vercel AI SDK integration 🤖 feat: add Ollama local model support Nov 8, 2025
- Remove unused modelString import from ollama.test.ts
- Use consistent indexOf() pattern for provider extraction in streamMessage()
  Ensures model IDs with colons are handled uniformly throughout codebase
The 'before' variable was previously used for debug logging but is no longer needed
Key improvements:
- Combined binary, library, and model caching into single cache entry
  Previously: separate caches for binary and models
  Now: /usr/local/bin/ollama + /usr/local/lib/ollama + /usr/share/ollama

- Fixed model cache path from ~/.ollama/models to /usr/share/ollama
  Models are stored in system ollama user's home, not runner's home

- Separated installation from server startup
  Install step only runs on cache miss and includes model pull
  Startup step always runs but completes in <5s with cached models

- Optimized readiness checks
  Install: 10s timeout, 0.5s polling (only on cache miss)
  Startup: 5s timeout, 0.2s polling (every run, with cache hit)

- Added cache key based on workflow file hash
  Cache invalidates when workflow changes, ensuring fresh install if needed

Expected timing:
- First run (cache miss): ~60s (download + install + model pull)
- Subsequent runs (cache hit): <5s (just server startup)
- Cache size: ~13GB (gpt-oss:20b model)

Testing: Verified locally that Ollama starts in <1s with cached models
Fixes context limit display for Ollama models like ollama:gpt-oss:20b.

Problem:
- User model string: ollama:gpt-oss:20b
- Previous lookup: gpt-oss:20b (stripped provider)
- models.json key: ollama/gpt-oss:20b-cloud (LiteLLM convention)
- Result: Lookup failed, showed "Unknown model limits"

Solution:
Implemented multi-pattern fallback lookup that tries:
1. Direct model name (claude-opus-4-1)
2. Provider-prefixed (ollama/gpt-oss:20b)
3. Cloud variant (ollama/gpt-oss:20b-cloud) ← matches!
4. Base model (ollama/gpt-oss) as fallback

Benefits:
- Works automatically for all Ollama models in models.json
- Zero configuration required
- Backward compatible with existing lookups
- No API calls needed (works offline)

Testing:
- Added 15+ unit tests covering all lookup patterns
- Verified ollama:gpt-oss:20b → 131k context limit
- All 979 unit tests pass

Models that now work:
- ollama:gpt-oss:20b → ollama/gpt-oss:20b-cloud (131k)
- ollama:gpt-oss:120b → ollama/gpt-oss:120b-cloud (131k)
- ollama:llama3.1 → ollama/llama3.1 (8k)
- ollama:deepseek-v3.1:671b → ollama/deepseek-v3.1:671b-cloud
- Plus all existing Anthropic/OpenAI models
@ammario ammario marked this pull request as ready for review November 8, 2025 16:52
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Download only the Ollama binary (no system service installation)
- Use 'ollama start' instead of 'ollama serve'
- Cache binary and models separately for better cache efficiency
- Models now naturally go to ~/.ollama (no sudo/copying needed)
- Removed complex model copying logic from cache miss path
- Simplified workflow - Ollama server starts in setup action

Benefits:
- Cache works correctly (models in user home, not system location)
- Faster warm cache (<1s vs ~60s)
- No sudo operations needed
- Matches proven pydantic/ollama-action approach
Previously, setup-ollama action pulled models sequentially during setup.
Now tests pull models idempotently in beforeAll hook, enabling:

- Better parallelism across test jobs
- Idempotent model pulls (multiple tests can check/pull safely)
- Shared model cache across parallel test runners
- Ollama handles deduplication when multiple pulls happen simultaneously

Changes:
- Remove model input and pull logic from setup-ollama action
- Add ensureOllamaModel() helper to check if model exists and pull if needed
- Call ensureOllamaModel() in beforeAll hook before tests run
- Bump beforeAll timeout to 150s to accommodate potential model pull
- Simplify cache key to 'ollama-models-v2' (model-agnostic)

_Generated with `cmux`_
@ammar-agent
Copy link
Collaborator

✅ Refactoring Complete: Test-Initiated Model Pulls

Successfully refactored Ollama setup to improve parallelism by moving model pulls from the setup action into the test suite itself.

Changes in ab90e9b

Setup Action (.github/actions/setup-ollama/action.yml)

  • ❌ Removed model input parameter
  • ❌ Removed sequential model pull logic
  • ✅ Simplified cache key to ollama-models-v2 (model-agnostic)
  • ✅ Binary cache: still works (~1.4 GB, 3s restore)
  • ✅ Model cache: now shared across all models

Integration Tests (tests/ipcMain/ollama.test.ts)

  • ✅ Added ensureOllamaModel() helper function
    • Checks if model exists via ollama list
    • Pulls only if needed (idempotent)
    • Multiple tests can call safely - Ollama handles deduplication
  • ✅ Called in beforeAll hook before tests run
  • ✅ Bumped timeout to 150s to accommodate potential model pull

Benefits

  1. Better parallelism - Multiple test jobs can now share the cached model directory
  2. Idempotent pulls - Tests check if model exists before pulling
  3. Ollama-native deduplication - If multiple processes try to pull, Ollama handles it
  4. Simplified setup action - No longer needs to know about specific models
  5. Shared cache - All models go into one cache key, improving cache efficiency

Test Results (Run 19199669727)

Binary cache hit - 1.4 GB restored in ~3s
Model pull during test - 13 GB gpt-oss:20b pulled successfully
All Ollama tests passed - 4/4 tests in 134s
All CI checks passed - Static checks, unit tests, integration tests, E2E tests

Performance

First run (cold cache):

  • Binary cache: hit ⚡
  • Model cache: miss (expected with new cache key)
  • Model pull: ~35s during test
  • Total setup: ~40s (including test overhead)

Subsequent runs (warm cache):

  • Will hit both binary and model cache
  • ensureOllamaModel() detects model exists: ~1s
  • Expected setup: <5s

Generated with cmux

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants