Ragify - Self-Hosted RAG Container

All-in-one container for semantic document search. Index your docs, search with AI embeddings.

Includes: Ollama + nomic-embed-text, Qdrant vector DB, REST API, Web UI, MCP server.

Quick Start

docker pull ghcr.io/strawberry-code/ragify:latest-tika
docker run -d --name ragify -p 8080:8080 -v ragify_data:/data \
  ghcr.io/strawberry-code/ragify:latest-tika

Open http://localhost:8080 - upload files and search.

Image Variants

Image	Size	Description
`ragify:latest`	~3GB	Text and code files only
`ragify:latest-tika`	~4GB	Recommended - PDF, DOCX, XLSX, and 1000+ formats via Apache Tika

Production Setup with OAuth

For production, enable GitHub OAuth to restrict access.

1. Create GitHub OAuth App

Go to GitHub Developer Settings
Click "New OAuth App"
Set:
- Application name: Ragify
- Homepage URL: https://your-domain.com
- Authorization callback URL: https://your-domain.com/oauth/github-callback
Copy Client ID and Client Secret

2. Configuration Files

docker-compose.yml

services:
  ragify:
    image: ghcr.io/strawberry-code/ragify:latest-tika
    container_name: ragify
    ports:
      - "8080:8080"
    volumes:
      - ragify_data:/data
      - ./users.yaml:/config/users.yaml:ro
    env_file:
      - .env
    restart: unless-stopped

volumes:
  ragify_data:

users.yaml - Authorized GitHub usernames

authorized_users:
  - username: your-github-username
  - username: teammate-username

.env - Environment variables

AUTH_CONFIG=/config/users.yaml
GITHUB_CLIENT_ID=Ov23li...
GITHUB_CLIENT_SECRET=abc123...
BASE_URL=https://your-domain.com

3. Start

docker compose up -d

Environment Variables

Variable	Default	Description
`AUTH_CONFIG`	-	Path to users.yaml (enables OAuth)
`GITHUB_CLIENT_ID`	-	GitHub OAuth App Client ID
`GITHUB_CLIENT_SECRET`	-	GitHub OAuth App Secret
`BASE_URL`	`http://localhost:8080`	Public URL for OAuth callbacks
`API_PORT`	`8080`	API and Web UI port
`OLLAMA_MODEL`	`nomic-embed-text`	Embedding model
`CHUNK_SIZE`	`400`	Target chunk size in tokens
`CHUNK_MAX_TOKENS`	`1500`	Maximum chunk size (safe margin for nomic-embed-text 2048 limit)
`EMBEDDING_BATCH_SIZE`	`10`	Chunks per embedding API call (reduce if Ollama errors)

Features

Web UI

Upload files via drag & drop
Create and manage collections
Search with semantic results
View indexing job status

REST API

# List collections
curl http://localhost:8080/api/collections

# Search
curl -X POST http://localhost:8080/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "how to configure", "collection": "docs", "limit": 5}'

# Upload file
curl -X POST http://localhost:8080/api/upload \
  -F "file=@document.pdf" \
  -F "collection=docs"

MCP Server (Claude Desktop / Claude Code)

The container exposes an MCP endpoint for Claude integration.

Claude Code config (~/.claude.json):

{
  "mcpServers": {
    "ragify": {
      "type": "streamable-http",
      "url": "https://your-domain.com/mcp/sse",
      "headers": {
        "Authorization": "Bearer <your-oauth-token>"
      }
    }
  }
}

Processing Pipeline

┌─────────────────────────────────────────────────────────────────────┐
│                        RAGIFY PIPELINE                               │
│                                                                      │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────────┐   │
│  │  Upload  │───►│  Tika    │───►│ Chunking │───►│  Embedding   │   │
│  │  (file)  │    │ Extract  │    │ 2-stage  │    │ nomic-embed  │   │
│  └──────────┘    └──────────┘    └──────────┘    └──────┬───────┘   │
│                                                          │          │
│                                                          ▼          │
│  ┌──────────┐    ┌──────────┐                    ┌──────────────┐   │
│  │  Search  │◄───│  Qdrant  │◄───────────────────│    Store     │   │
│  │  Query   │    │  Vector  │                    │   Vectors    │   │
│  └──────────┘    └──────────┘                    └──────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

Pipeline Steps

Upload - Files uploaded via Web UI or API
Extraction - Apache Tika extracts text from PDF, DOCX, XLSX, etc. (tika variant only)
Chunking - Two-stage semantic chunking:
- Stage 1: Macro chunks with Chonkie (1024 tokens)
- Stage 2: Fine chunks with Semchunk (512 tokens, 50 overlap)
- Filter: Validates chunk quality, re-chunks if > 8192 tokens
Embedding - Ollama generates 768-dim vectors with nomic-embed-text
Storage - Vectors stored in Qdrant with metadata (file hash, URL, title)
Deduplication - SHA-256 file hash prevents re-indexing unchanged files

Supported Formats

Base image: .txt, .md, .py, .js, .ts, .java, .go, .rs, .c, .cpp, .json, .yaml, .xml, .html, .css

Tika image (additional): .pdf, .docx, .xlsx, .pptx, .odt, .rtf, .epub, and 1000+ more

Health Check

curl http://localhost:8080/health
# {"status": "healthy", "ollama": "ok", "qdrant": "ok"}

Volumes

Path	Description
`/data`	Qdrant storage (persist this!)
`/config/users.yaml`	Authorized users (read-only)
`/tmp/collections`	Uploaded files (temporary, 15-day retention)

CLI Access (Inside Container)

docker exec -it ragify bash

# Index a directory
python3 ragify.py index /data/docs

# Query
python3 ragify.py query "search term"

# List collections
python3 ragify.py list

Troubleshooting

Container won't start

docker logs ragify

Check for:

Missing AUTH_CONFIG when OAuth env vars are set
Invalid users.yaml format

PDF files not processed

Make sure you're using the -tika image variant:

docker pull ghcr.io/strawberry-code/ragify:latest-tika

OAuth callback error

Verify BASE_URL matches your actual domain and GitHub OAuth App callback URL.

License

MIT License - See LICENSE

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.claude		.claude
api		api
assets		assets
docker		docker
docs		docs
frontend		frontend
lib		lib
scripts		scripts
src/ragify_mcp		src/ragify_mcp
.env.example		.env.example
.gitignore		.gitignore
.mcp.json.example		.mcp.json.example
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.tika		Dockerfile.tika
Dockerfile.tika.local		Dockerfile.tika.local
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
config.yaml		config.yaml
docker-compose.local.yaml		docker-compose.local.yaml
docker-compose.yml.example		docker-compose.yml.example
pyproject.toml		pyproject.toml
ragify.py		ragify.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ragify - Self-Hosted RAG Container

Quick Start

Image Variants

Production Setup with OAuth

1. Create GitHub OAuth App

2. Configuration Files

3. Start

Environment Variables

Features

Web UI

REST API

MCP Server (Claude Desktop / Claude Code)

Processing Pipeline

Pipeline Steps

Supported Formats

Health Check

Volumes

CLI Access (Inside Container)

Troubleshooting

Container won't start

PDF files not processed

OAuth callback error

License

Contributing

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

strawberry-code/ragify

Folders and files

Latest commit

History

Repository files navigation

Ragify - Self-Hosted RAG Container

Quick Start

Image Variants

Production Setup with OAuth

1. Create GitHub OAuth App

2. Configuration Files

3. Start

Environment Variables

Features

Web UI

REST API

MCP Server (Claude Desktop / Claude Code)

Processing Pipeline

Pipeline Steps

Supported Formats

Health Check

Volumes

CLI Access (Inside Container)

Troubleshooting

Container won't start

PDF files not processed

OAuth callback error

License

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages