You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Key improvements:
- Combined binary, library, and model caching into single cache entry
Previously: separate caches for binary and models
Now: /usr/local/bin/ollama + /usr/local/lib/ollama + /usr/share/ollama
- Fixed model cache path from ~/.ollama/models to /usr/share/ollama
Models are stored in system ollama user's home, not runner's home
- Separated installation from server startup
Install step only runs on cache miss and includes model pull
Startup step always runs but completes in <5s with cached models
- Optimized readiness checks
Install: 10s timeout, 0.5s polling (only on cache miss)
Startup: 5s timeout, 0.2s polling (every run, with cache hit)
- Added cache key based on workflow file hash
Cache invalidates when workflow changes, ensuring fresh install if needed
Expected timing:
- First run (cache miss): ~60s (download + install + model pull)
- Subsequent runs (cache hit): <5s (just server startup)
- Cache size: ~13GB (gpt-oss:20b model)
Testing: Verified locally that Ollama starts in <1s with cached models
0 commit comments