A modern, fast, and lightweight web interface for interacting with Ollama's large language models
- Complete Documentation - Full setup guide, features, and model library
- Code Execution Flow - Detailed line-by-line code analysis
- Mermaid Diagrams - Visual flow charts and sequence diagrams
- Model Pull System - Auto-download and optimization
- Development Notes - Feature docs and TODOs
- Modern Dark Theme UI - ChatGPT-inspired interface
- Real-time Communication - WebSocket support with auto-reconnect
- Multi-Model Support - Switch between coding, vision, and chat models
- Auto-Download Models - Automatic model pulling when needed
- Code Syntax Highlighting - VSCode-style highlighting with copy buttons
- Markdown Rendering - Full markdown support for rich responses
- Zero Dependencies Frontend - No Node.js, no frameworks, pure HTML/CSS/JS
| Layer | Technology |
|---|---|
| Backend | Go 1.21+ with Gorilla Mux & WebSockets |
| Frontend | Vanilla JavaScript + marked.js + highlight.js |
| LLM Runtime | Ollama + langchaingo library |
| Models | 10+ curated LLMs (coding, vision, chat) |
# Linux/macOS
curl -fsSL https://ollama.com/install.sh | sh
# Or use the included script
chmod +x ollama_install_basic.sh
./ollama_install_basic.shollama pull llama3.1:8b# Build the Go backend
go build -o ollamamax main.go
# Run the server
./ollamamaxNavigate to: http://localhost:8888
The application supports 10 carefully selected models across different use cases:
qwen2.5-coder:7b- Code generation and debuggingdeepseek-coder:6.7b- Algorithm and problem-solvingdeepseek-r1- Reasoning-focused codingglm-4.6- Multilingual code support
qwen3-vl- OCR, charts, diagram analysisllava:7b- Image understanding
llama3.1:8b- Meta's balanced all-rounder (default)qwen3:7b- Long-context chatgemma2:9b- Fast and efficientmistral:7b- General purpose
nomic-embed-text- For RAG pipelines
System Requirements:
- 8GB RAM minimum for 7B models
- 16GB RAM recommended for multiple models
- GPU optional but recommended (CUDA/ROCm)
┌─────────────┐ WebSocket/HTTP ┌──────────────┐
│ Browser │ ←───────────────────────→ │ Go Server │
│ (JS/HTML) │ │ :8888 │
└─────────────┘ └──────┬───────┘
│
│ langchaingo
▼
┌──────────────┐
│ Ollama │
│ Service │
│ :11434 │
└──────────────┘
Key Components:
- Frontend - Model selector, chat UI, markdown renderer
- Go Backend - WebSocket/HTTP handlers, model management
- Ollama Service - LLM inference engine
Data Flow:
User Selects Model → JS captures selection → WebSocket sends JSON →
Go validates model → Auto-pulls if missing → Updates currentModel →
Sends query to Ollama → Receives response → Renders in UI
See CODE_EXECUTION_ANALYSIS.md for complete execution flow with line numbers.
docs/
├── README.md # Full user guide and setup
├── CODE_EXECUTION_ANALYSIS.md # Line-by-line code walkthrough
├── MERMAID_FLOW_DIAGRAMS.md # Visual flow charts
├── MODEL_PULL_SYSTEM.md # Auto-download system
├── MODELS_ALIGNMENT.md # Model selection UI
├── BOTTOM_INPUT_FEATURE.md # Chat input implementation
├── UI_SPACING_IMPROVEMENTS.md # UI/UX enhancements
├── TODO.md # Roadmap and planned features
└── WARP.md # AI agent instructions
.
├── main.go # Go backend server
├── static/
│ ├── index.html # Main UI
│ ├── script.js # Frontend logic
│ ├── styles.css # Dark theme styling
│ └── images/ # Logo assets
├── ollama_pull_and_run.sh # Model optimization script
├── ollama_install_basic.sh # Ollama installer
└── docs/ # Documentation
-
main.go (589 lines)
- Lines 47-67: Available models array
- Lines 127-198: HTTP router setup
- Lines 308-404: WebSocket handler
- Lines 451-476: Model installation check
- Lines 479-515: Model auto-pull
- Lines 563-588: LLM query processing
-
script.js (589 lines)
- Lines 24-72: WebSocket connection
- Lines 147-268: Message rendering
- Lines 336-381: Send message logic
- Lines 454-484: Model selection handler
# Development build
go build -o ollamamax main.go
# Production build (optimized)
go build -ldflags="-s -w" -o ollamamax main.go# Check Ollama status
curl http://localhost:8888/api/health
# Get available models
curl http://localhost:8888/api/models
# Test chat (HTTP)
curl -X POST http://localhost:8888/api/chat \
-H "Content-Type: application/json" \
-d '{"message":"Hello","model_name":"llama3.1:8b"}'When you select a model that isn't installed, OllamaMax automatically:
- Detects the missing model via
ollama list - Shows a download progress indicator
- Pulls the model using
ollama_pull_and_run.sh(optimized) - Falls back to direct
ollama pullif script fails - Updates the UI when ready
- Real-time bidirectional communication
- Auto-reconnect on connection loss
- Typing indicators during inference
- Fallback to HTTP if WebSocket unavailable
- 30+ languages supported via highlight.js
- Copy buttons for individual code blocks
- Copy entire message button
- Language badges on code blocks
This is a personal project, but suggestions are welcome! See docs/TODO.md for planned features.
This project uses Ollama, which is licensed under the MIT License.
- Organized all documentation into
docs/folder - Created comprehensive code execution flow analysis
- Added Mermaid flow diagrams (7 different visualizations)
- Documented complete architecture and data flow
- Updated README with clear navigation structure
Built with Go, Ollama, and zero frontend frameworks

