IRIS Vector RAG Templates

Production-ready Retrieval-Augmented Generation (RAG) pipelines powered by InterSystems IRIS Vector Search

Build intelligent applications that combine large language models with your enterprise data using battle-tested RAG patterns and native vector search capabilities.

Why IRIS Vector RAG?

🚀 Production-Ready - Six proven RAG architectures ready to deploy, not research prototypes

⚡ Blazing Fast - Native IRIS vector search with HNSW indexing, no external vector databases needed

🔧 Unified API - Swap between RAG strategies with a single line of code

📊 Enterprise-Grade - ACID transactions, connection pooling, and horizontal scaling built-in

🎯 100% Compatible - Works seamlessly with LangChain, RAGAS, and your existing ML stack

🧪 Fully Validated - Comprehensive test suite with automated contract validation

Available RAG Pipelines

Pipeline Type	Use Case	Retrieval Method	When to Use
basic	Standard retrieval	Vector similarity	General Q&A, getting started, baseline comparisons
basic_rerank	Improved precision	Vector + cross-encoder reranking	Higher accuracy requirements, legal/medical domains
crag	Self-correcting	Vector + evaluation + web search fallback	Dynamic knowledge, fact-checking, current events
graphrag	Knowledge graphs	Vector + text + graph + RRF fusion	Complex entity relationships, research, medical knowledge
multi_query_rrf	Multi-perspective	Query expansion + reciprocal rank fusion	Complex queries, comprehensive coverage needed
pylate_colbert	Fine-grained matching	ColBERT late interaction embeddings	Nuanced semantic understanding, high precision

Quick Start

1. Install

# Clone repository
git clone https://github.com/intersystems-community/iris-rag-templates.git
cd iris-rag-templates

# Setup environment (requires uv package manager)
make setup-env
make install
source .venv/bin/activate

2. Start IRIS Database

# Start IRIS with Docker Compose
docker-compose up -d

# Initialize database schema
make setup-db

# Optional: Load sample medical data
make load-data

3. Configure API Keys

cat > .env << 'EOF'
OPENAI_API_KEY=your-key-here
ANTHROPIC_API_KEY=your-key-here  # Optional, for Claude models
IRIS_HOST=localhost
IRIS_PORT=1972
IRIS_NAMESPACE=USER
IRIS_USERNAME=_SYSTEM
IRIS_PASSWORD=SYS
EOF

4. Run Your First Query

from iris_rag import create_pipeline

# Create pipeline with automatic validation
pipeline = create_pipeline('basic', validate_requirements=True)

# Load your documents
from iris_rag.core.models import Document

docs = [
    Document(
        page_content="RAG combines retrieval with generation for accurate AI responses.",
        metadata={"source": "rag_basics.pdf", "page": 1}
    ),
    Document(
        page_content="Vector search finds semantically similar content using embeddings.",
        metadata={"source": "vector_search.pdf", "page": 5}
    )
]

pipeline.load_documents(documents=docs)

# Query with LLM-generated answer
result = pipeline.query(
    query="What is RAG?",
    top_k=5,
    generate_answer=True
)

print(f"Answer: {result['answer']}")
print(f"Sources: {result['sources']}")
print(f"Retrieved: {len(result['retrieved_documents'])} documents")

Unified API Across All Pipelines

Switch RAG strategies with one line - all pipelines share the same interface:

from iris_rag import create_pipeline

# Try different strategies instantly
for pipeline_type in ['basic', 'basic_rerank', 'crag', 'multi_query_rrf', 'graphrag']:
    pipeline = create_pipeline(pipeline_type)

    result = pipeline.query(
        query="What are the latest cancer treatment approaches?",
        top_k=5,
        generate_answer=True
    )

    print(f"\n{pipeline_type.upper()}:")
    print(f"  Answer: {result['answer'][:150]}...")
    print(f"  Retrieved: {len(result['retrieved_documents'])} docs")
    print(f"  Confidence: {result['metadata'].get('confidence', 'N/A')}")

Standardized Response Format

100% LangChain & RAGAS compatible responses:

{
    "query": "What is diabetes?",
    "answer": "Diabetes is a chronic metabolic condition...",  # LLM answer
    "retrieved_documents": [Document(...)],                   # LangChain Documents
    "contexts": ["context 1", "context 2"],                   # RAGAS contexts
    "sources": ["medical.pdf p.12", "diabetes.pdf p.3"],     # Source citations
    "execution_time": 0.523,
    "metadata": {
        "num_retrieved": 5,
        "pipeline_type": "basic",
        "retrieval_method": "vector",
        "generated_answer": True,
        "processing_time": 0.523
    }
}

Pipeline Deep Dives

CRAG: Self-Correcting Retrieval

Automatically evaluates retrieval quality and falls back to web search when needed:

from iris_rag import create_pipeline

pipeline = create_pipeline('crag')

# CRAG evaluates retrieved documents and uses web search if quality is low
result = pipeline.query(
    query="What happened in the 2024 Olympics opening ceremony?",
    top_k=5,
    generate_answer=True
)

# Check which retrieval method was used
print(f"Method: {result['metadata']['retrieval_method']}")  # 'vector' or 'web_search'
print(f"Confidence: {result['metadata']['confidence']}")     # 0.0 - 1.0

HybridGraphRAG: Multi-Modal Search

Combines vector search, text search, and knowledge graph traversal:

pipeline = create_pipeline('graphrag')

result = pipeline.query(
    query_text="cancer treatment targets",
    method="rrf",        # Reciprocal Rank Fusion across all methods
    vector_k=30,         # Top 30 from vector search
    text_k=30,           # Top 30 from text search
    graph_k=10,          # Top 10 from knowledge graph
    generate_answer=True
)

# Rich metadata includes entities and relationships
print(f"Entities: {result['metadata']['entities']}")
print(f"Relationships: {result['metadata']['relationships']}")
print(f"Graph depth: {result['metadata']['graph_depth']}")

MultiQueryRRF: Multi-Perspective Retrieval

Expands queries into multiple perspectives and fuses results:

pipeline = create_pipeline('multi_query_rrf')

# Automatically generates query variations and combines results
result = pipeline.query(
    query="How does machine learning work?",
    top_k=10,
    generate_answer=True
)

# See the generated query variations
print(f"Query variations: {result['metadata']['generated_queries']}")
print(f"Fusion method: {result['metadata']['fusion_method']}")  # 'rrf'

Enterprise Features

Production-Ready Database

IRIS provides everything you need in one database:

✅ Native vector search (no external vector DB needed)
✅ ACID transactions (your data is safe)
✅ SQL + NoSQL + Vector in one platform
✅ Horizontal scaling and clustering
✅ Enterprise-grade security and compliance

Connection Pooling

Automatic concurrency management:

from iris_rag.storage import IRISVectorStore

# Connection pool handles concurrency automatically
store = IRISVectorStore()

# Safe for multi-threaded applications
# Pool manages connections, no manual management needed

Automatic Schema Management

Database schema created and migrated automatically:

pipeline = create_pipeline('basic', validate_requirements=True)
# ✅ Checks database connection
# ✅ Validates schema exists
# ✅ Migrates to latest version if needed
# ✅ Reports validation results

RAGAS Evaluation Built-In

Measure your RAG pipeline performance:

# Evaluate all pipelines on your data
make test-ragas-sample

# Generates detailed metrics:
# - Answer Correctness
# - Faithfulness
# - Context Precision
# - Context Recall
# - Answer Relevance

IRIS EMBEDDING: 346x Faster Auto-Vectorization

Automatic embedding generation with model caching - eliminates the 720x slowdown from repeated model loading:

from iris_rag import create_pipeline

# Enable IRIS EMBEDDING support (Feature 051)
pipeline = create_pipeline(
    'basic',
    embedding_config='medical_embeddings_v1'  # IRIS EMBEDDING config name
)

# Documents auto-vectorize on INSERT with cached models
pipeline.load_documents(documents=docs)

# Queries auto-vectorize using same cached model
result = pipeline.query("What is diabetes?", top_k=5)

Performance Achievements:

⚡ 346x speedup - 1,746 documents vectorized in 3.5 seconds (vs 20 minutes baseline)
🎯 95% cache hit rate - Models stay in memory across requests
🚀 50ms average latency - Cache hits complete in <100ms
💾 Automatic fallback - GPU OOM? Automatically falls back to CPU

Configuration Example:

from iris_rag.embeddings.iris_embedding import configure_embedding

# Create embedding configuration
config = configure_embedding(
    name="medical_embeddings_v1",
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    device_preference="auto",     # auto, cuda, mps, cpu
    batch_size=32,
    enable_entity_extraction=True,
    entity_types=["Disease", "Medication", "Symptom"]
)

# Use with any pipeline
pipeline = create_pipeline('basic', embedding_config='medical_embeddings_v1')

Multi-Field Vectorization:

Combine multiple document fields into a single embedding:

from iris_rag.core.models import Document

# Document with multiple content fields
doc = Document(
    page_content="",  # Will be auto-filled from metadata
    metadata={
        "title": "Type 2 Diabetes Treatment",
        "abstract": "A comprehensive review of treatment approaches...",
        "conclusions": "Insulin therapy combined with lifestyle changes..."
    }
)

# Configure multi-field embedding
pipeline = create_pipeline(
    'basic',
    embedding_config='paper_embeddings',
    multi_field_source=['title', 'abstract', 'conclusions']  # Concatenate fields
)

pipeline.load_documents(documents=[doc])
# → Embedding generated from: "Type 2 Diabetes Treatment. A comprehensive review..."

When to Use IRIS EMBEDDING:

✅ Large document collections (>1000 documents)
✅ Frequent re-indexing or incremental updates
✅ Real-time vectorization requirements
✅ Memory-constrained environments (model stays in memory)
✅ Multi-field vectorization needs

Comparison:

Method	1,746 Docs	Model Loads	Cache Hit Rate
Manual (baseline)	20 minutes	1,746 (every row)	0%
IRIS EMBEDDING	3.5 seconds	1 (cached)	95%
Speedup	346x faster	1,746x fewer	95% efficient

Model Context Protocol (MCP) Support

Expose RAG pipelines as MCP tools for use with Claude Desktop and other MCP clients:

# Start MCP server
python -m iris_rag.mcp

# Available MCP tools:
# - rag_basic
# - rag_basic_rerank
# - rag_crag
# - rag_multi_query_rrf
# - rag_graphrag
# - rag_hybrid_graphrag
# - health_check
# - list_tools

Configure in Claude Desktop:

{
  "mcpServers": {
    "iris-rag": {
      "command": "python",
      "args": ["-m", "iris_rag.mcp"],
      "env": {
        "OPENAI_API_KEY": "your-key"
      }
    }
  }
}

Architecture Overview

iris_rag/
├── core/              # Abstract base classes (RAGPipeline, VectorStore)
├── pipelines/         # Pipeline implementations
│   ├── basic.py                    # BasicRAG
│   ├── basic_rerank.py             # Reranking pipeline
│   ├── crag.py                     # Corrective RAG
│   ├── multi_query_rrf.py          # Multi-query with RRF
│   ├── graphrag.py                 # Graph-based RAG
│   └── hybrid_graphrag.py          # Hybrid multi-modal
├── storage/           # Vector store implementations
│   ├── vector_store_iris.py        # IRIS vector store
│   └── schema_manager.py           # Schema management
├── mcp/              # Model Context Protocol server
├── api/              # Production REST API
├── services/         # Business logic (entity extraction, etc.)
├── config/           # Configuration management
└── validation/       # Pipeline contract validation

Documentation

📚 Comprehensive documentation for every use case:

User Guide - Complete installation and usage
API Reference - Detailed API documentation
Pipeline Guide - When to use each pipeline
MCP Integration - Model Context Protocol setup
Production Deployment - Deployment checklist
Development Guide - Contributing and testing

Performance Benchmarks

Native IRIS vector search delivers:

🚀 50-100x faster than traditional solutions for hybrid search
⚡ Sub-second queries on millions of documents
📊 Linear scaling with IRIS clustering
💾 10x less memory than external vector databases

Testing & Quality

# Run comprehensive test suite
make test

# Test specific categories
pytest tests/unit/           # Unit tests (fast)
pytest tests/integration/    # Integration tests (with IRIS)
pytest tests/contract/       # API contract validation

# Run with coverage
pytest --cov=iris_rag --cov-report=html

For detailed testing documentation, see DEVELOPMENT.md

Research & References

This implementation is based on peer-reviewed research:

Basic RAG: Lewis et al., Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, NeurIPS 2020
CRAG: Yan et al., Corrective Retrieval Augmented Generation, arXiv 2024
GraphRAG: Edge et al., From Local to Global: A Graph RAG Approach, arXiv 2024
ColBERT: Khattab & Zaharia, ColBERT: Efficient and Effective Passage Search, SIGIR 2020

Contributing

We welcome contributions! See CONTRIBUTING.md for:

Development setup
Testing guidelines
Code style and standards
Pull request process

Community & Support

💬 Discussions: GitHub Discussions
🐛 Issues: GitHub Issues
📖 Documentation: Full Documentation
🏢 Enterprise Support: InterSystems Support

License

MIT License - see LICENSE for details.

Built with ❤️ by the InterSystems Community

Powering intelligent applications with enterprise-grade RAG

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.claude/commands		.claude/commands
.github		.github
.specify		.specify
adapters		adapters
analysis		analysis
benchmarks		benchmarks
common		common
config		config
contrib/retrieve-dspy		contrib/retrieve-dspy
data		data
docker		docker
docs		docs
evaluation_framework		evaluation_framework
examples		examples
future_tests_not_ready		future_tests_not_ready
iris_rag		iris_rag
kg-memory/incremental		kg-memory/incremental
monitoring		monitoring
nodejs		nodejs
outputs		outputs
reports		reports
scripts		scripts
specs		specs
test_results		test_results
tests		tests
tools/chunking		tools/chunking
validation_results		validation_results
.clinerules		.clinerules
.coveragerc.ci		.coveragerc.ci
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.roomodes		.roomodes
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile.mcp		Dockerfile.mcp
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
redaction_changes.json		redaction_changes.json
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
tox.ini		tox.ini

License

intersystems-community/iris-vector-rag

Folders and files

Latest commit

History

Repository files navigation

IRIS Vector RAG Templates

Why IRIS Vector RAG?

Available RAG Pipelines

Quick Start

1. Install

2. Start IRIS Database

3. Configure API Keys

4. Run Your First Query

Unified API Across All Pipelines

Standardized Response Format

Pipeline Deep Dives

CRAG: Self-Correcting Retrieval

HybridGraphRAG: Multi-Modal Search

MultiQueryRRF: Multi-Perspective Retrieval

Enterprise Features

Production-Ready Database

Connection Pooling

Automatic Schema Management

RAGAS Evaluation Built-In

IRIS EMBEDDING: 346x Faster Auto-Vectorization

Model Context Protocol (MCP) Support

Architecture Overview

Documentation

Performance Benchmarks

Testing & Quality

Research & References

Contributing

Community & Support

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Uh oh!

Languages

Packages