Skip to content

intersystems-community/iris-vector-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IRIS Vector RAG Templates

Production-ready Retrieval-Augmented Generation (RAG) pipelines powered by InterSystems IRIS Vector Search

Build intelligent applications that combine large language models with your enterprise data using battle-tested RAG patterns and native vector search capabilities.

License: MIT Python 3.11+ InterSystems IRIS

Why IRIS Vector RAG?

🚀 Production-Ready - Six proven RAG architectures ready to deploy, not research prototypes

Blazing Fast - Native IRIS vector search with HNSW indexing, no external vector databases needed

🔧 Unified API - Swap between RAG strategies with a single line of code

📊 Enterprise-Grade - ACID transactions, connection pooling, and horizontal scaling built-in

🎯 100% Compatible - Works seamlessly with LangChain, RAGAS, and your existing ML stack

🧪 Fully Validated - Comprehensive test suite with automated contract validation

Available RAG Pipelines

Pipeline Type Use Case Retrieval Method When to Use
basic Standard retrieval Vector similarity General Q&A, getting started, baseline comparisons
basic_rerank Improved precision Vector + cross-encoder reranking Higher accuracy requirements, legal/medical domains
crag Self-correcting Vector + evaluation + web search fallback Dynamic knowledge, fact-checking, current events
graphrag Knowledge graphs Vector + text + graph + RRF fusion Complex entity relationships, research, medical knowledge
multi_query_rrf Multi-perspective Query expansion + reciprocal rank fusion Complex queries, comprehensive coverage needed
pylate_colbert Fine-grained matching ColBERT late interaction embeddings Nuanced semantic understanding, high precision

Quick Start

1. Install

# Clone repository
git clone https://github.com/intersystems-community/iris-rag-templates.git
cd iris-rag-templates

# Setup environment (requires uv package manager)
make setup-env
make install
source .venv/bin/activate

2. Start IRIS Database

# Start IRIS with Docker Compose
docker-compose up -d

# Initialize database schema
make setup-db

# Optional: Load sample medical data
make load-data

3. Configure API Keys

cat > .env << 'EOF'
OPENAI_API_KEY=your-key-here
ANTHROPIC_API_KEY=your-key-here  # Optional, for Claude models
IRIS_HOST=localhost
IRIS_PORT=1972
IRIS_NAMESPACE=USER
IRIS_USERNAME=_SYSTEM
IRIS_PASSWORD=SYS
EOF

4. Run Your First Query

from iris_rag import create_pipeline

# Create pipeline with automatic validation
pipeline = create_pipeline('basic', validate_requirements=True)

# Load your documents
from iris_rag.core.models import Document

docs = [
    Document(
        page_content="RAG combines retrieval with generation for accurate AI responses.",
        metadata={"source": "rag_basics.pdf", "page": 1}
    ),
    Document(
        page_content="Vector search finds semantically similar content using embeddings.",
        metadata={"source": "vector_search.pdf", "page": 5}
    )
]

pipeline.load_documents(documents=docs)

# Query with LLM-generated answer
result = pipeline.query(
    query="What is RAG?",
    top_k=5,
    generate_answer=True
)

print(f"Answer: {result['answer']}")
print(f"Sources: {result['sources']}")
print(f"Retrieved: {len(result['retrieved_documents'])} documents")

Unified API Across All Pipelines

Switch RAG strategies with one line - all pipelines share the same interface:

from iris_rag import create_pipeline

# Try different strategies instantly
for pipeline_type in ['basic', 'basic_rerank', 'crag', 'multi_query_rrf', 'graphrag']:
    pipeline = create_pipeline(pipeline_type)

    result = pipeline.query(
        query="What are the latest cancer treatment approaches?",
        top_k=5,
        generate_answer=True
    )

    print(f"\n{pipeline_type.upper()}:")
    print(f"  Answer: {result['answer'][:150]}...")
    print(f"  Retrieved: {len(result['retrieved_documents'])} docs")
    print(f"  Confidence: {result['metadata'].get('confidence', 'N/A')}")

Standardized Response Format

100% LangChain & RAGAS compatible responses:

{
    "query": "What is diabetes?",
    "answer": "Diabetes is a chronic metabolic condition...",  # LLM answer
    "retrieved_documents": [Document(...)],                   # LangChain Documents
    "contexts": ["context 1", "context 2"],                   # RAGAS contexts
    "sources": ["medical.pdf p.12", "diabetes.pdf p.3"],     # Source citations
    "execution_time": 0.523,
    "metadata": {
        "num_retrieved": 5,
        "pipeline_type": "basic",
        "retrieval_method": "vector",
        "generated_answer": True,
        "processing_time": 0.523
    }
}

Pipeline Deep Dives

CRAG: Self-Correcting Retrieval

Automatically evaluates retrieval quality and falls back to web search when needed:

from iris_rag import create_pipeline

pipeline = create_pipeline('crag')

# CRAG evaluates retrieved documents and uses web search if quality is low
result = pipeline.query(
    query="What happened in the 2024 Olympics opening ceremony?",
    top_k=5,
    generate_answer=True
)

# Check which retrieval method was used
print(f"Method: {result['metadata']['retrieval_method']}")  # 'vector' or 'web_search'
print(f"Confidence: {result['metadata']['confidence']}")     # 0.0 - 1.0

HybridGraphRAG: Multi-Modal Search

Combines vector search, text search, and knowledge graph traversal:

pipeline = create_pipeline('graphrag')

result = pipeline.query(
    query_text="cancer treatment targets",
    method="rrf",        # Reciprocal Rank Fusion across all methods
    vector_k=30,         # Top 30 from vector search
    text_k=30,           # Top 30 from text search
    graph_k=10,          # Top 10 from knowledge graph
    generate_answer=True
)

# Rich metadata includes entities and relationships
print(f"Entities: {result['metadata']['entities']}")
print(f"Relationships: {result['metadata']['relationships']}")
print(f"Graph depth: {result['metadata']['graph_depth']}")

MultiQueryRRF: Multi-Perspective Retrieval

Expands queries into multiple perspectives and fuses results:

pipeline = create_pipeline('multi_query_rrf')

# Automatically generates query variations and combines results
result = pipeline.query(
    query="How does machine learning work?",
    top_k=10,
    generate_answer=True
)

# See the generated query variations
print(f"Query variations: {result['metadata']['generated_queries']}")
print(f"Fusion method: {result['metadata']['fusion_method']}")  # 'rrf'

Enterprise Features

Production-Ready Database

IRIS provides everything you need in one database:

  • ✅ Native vector search (no external vector DB needed)
  • ✅ ACID transactions (your data is safe)
  • ✅ SQL + NoSQL + Vector in one platform
  • ✅ Horizontal scaling and clustering
  • ✅ Enterprise-grade security and compliance

Connection Pooling

Automatic concurrency management:

from iris_rag.storage import IRISVectorStore

# Connection pool handles concurrency automatically
store = IRISVectorStore()

# Safe for multi-threaded applications
# Pool manages connections, no manual management needed

Automatic Schema Management

Database schema created and migrated automatically:

pipeline = create_pipeline('basic', validate_requirements=True)
# ✅ Checks database connection
# ✅ Validates schema exists
# ✅ Migrates to latest version if needed
# ✅ Reports validation results

RAGAS Evaluation Built-In

Measure your RAG pipeline performance:

# Evaluate all pipelines on your data
make test-ragas-sample

# Generates detailed metrics:
# - Answer Correctness
# - Faithfulness
# - Context Precision
# - Context Recall
# - Answer Relevance

IRIS EMBEDDING: 346x Faster Auto-Vectorization

Automatic embedding generation with model caching - eliminates the 720x slowdown from repeated model loading:

from iris_rag import create_pipeline

# Enable IRIS EMBEDDING support (Feature 051)
pipeline = create_pipeline(
    'basic',
    embedding_config='medical_embeddings_v1'  # IRIS EMBEDDING config name
)

# Documents auto-vectorize on INSERT with cached models
pipeline.load_documents(documents=docs)

# Queries auto-vectorize using same cached model
result = pipeline.query("What is diabetes?", top_k=5)

Performance Achievements:

  • 346x speedup - 1,746 documents vectorized in 3.5 seconds (vs 20 minutes baseline)
  • 🎯 95% cache hit rate - Models stay in memory across requests
  • 🚀 50ms average latency - Cache hits complete in <100ms
  • 💾 Automatic fallback - GPU OOM? Automatically falls back to CPU

Configuration Example:

from iris_rag.embeddings.iris_embedding import configure_embedding

# Create embedding configuration
config = configure_embedding(
    name="medical_embeddings_v1",
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    device_preference="auto",     # auto, cuda, mps, cpu
    batch_size=32,
    enable_entity_extraction=True,
    entity_types=["Disease", "Medication", "Symptom"]
)

# Use with any pipeline
pipeline = create_pipeline('basic', embedding_config='medical_embeddings_v1')

Multi-Field Vectorization:

Combine multiple document fields into a single embedding:

from iris_rag.core.models import Document

# Document with multiple content fields
doc = Document(
    page_content="",  # Will be auto-filled from metadata
    metadata={
        "title": "Type 2 Diabetes Treatment",
        "abstract": "A comprehensive review of treatment approaches...",
        "conclusions": "Insulin therapy combined with lifestyle changes..."
    }
)

# Configure multi-field embedding
pipeline = create_pipeline(
    'basic',
    embedding_config='paper_embeddings',
    multi_field_source=['title', 'abstract', 'conclusions']  # Concatenate fields
)

pipeline.load_documents(documents=[doc])
# → Embedding generated from: "Type 2 Diabetes Treatment. A comprehensive review..."

When to Use IRIS EMBEDDING:

  • ✅ Large document collections (>1000 documents)
  • ✅ Frequent re-indexing or incremental updates
  • ✅ Real-time vectorization requirements
  • ✅ Memory-constrained environments (model stays in memory)
  • ✅ Multi-field vectorization needs

Comparison:

Method 1,746 Docs Model Loads Cache Hit Rate
Manual (baseline) 20 minutes 1,746 (every row) 0%
IRIS EMBEDDING 3.5 seconds 1 (cached) 95%
Speedup 346x faster 1,746x fewer 95% efficient

Model Context Protocol (MCP) Support

Expose RAG pipelines as MCP tools for use with Claude Desktop and other MCP clients:

# Start MCP server
python -m iris_rag.mcp

# Available MCP tools:
# - rag_basic
# - rag_basic_rerank
# - rag_crag
# - rag_multi_query_rrf
# - rag_graphrag
# - rag_hybrid_graphrag
# - health_check
# - list_tools

Configure in Claude Desktop:

{
  "mcpServers": {
    "iris-rag": {
      "command": "python",
      "args": ["-m", "iris_rag.mcp"],
      "env": {
        "OPENAI_API_KEY": "your-key"
      }
    }
  }
}

Architecture Overview

iris_rag/
├── core/              # Abstract base classes (RAGPipeline, VectorStore)
├── pipelines/         # Pipeline implementations
│   ├── basic.py                    # BasicRAG
│   ├── basic_rerank.py             # Reranking pipeline
│   ├── crag.py                     # Corrective RAG
│   ├── multi_query_rrf.py          # Multi-query with RRF
│   ├── graphrag.py                 # Graph-based RAG
│   └── hybrid_graphrag.py          # Hybrid multi-modal
├── storage/           # Vector store implementations
│   ├── vector_store_iris.py        # IRIS vector store
│   └── schema_manager.py           # Schema management
├── mcp/              # Model Context Protocol server
├── api/              # Production REST API
├── services/         # Business logic (entity extraction, etc.)
├── config/           # Configuration management
└── validation/       # Pipeline contract validation

Documentation

📚 Comprehensive documentation for every use case:

Performance Benchmarks

Native IRIS vector search delivers:

  • 🚀 50-100x faster than traditional solutions for hybrid search
  • Sub-second queries on millions of documents
  • 📊 Linear scaling with IRIS clustering
  • 💾 10x less memory than external vector databases

Testing & Quality

# Run comprehensive test suite
make test

# Test specific categories
pytest tests/unit/           # Unit tests (fast)
pytest tests/integration/    # Integration tests (with IRIS)
pytest tests/contract/       # API contract validation

# Run with coverage
pytest --cov=iris_rag --cov-report=html

For detailed testing documentation, see DEVELOPMENT.md

Research & References

This implementation is based on peer-reviewed research:

Contributing

We welcome contributions! See CONTRIBUTING.md for:

  • Development setup
  • Testing guidelines
  • Code style and standards
  • Pull request process

Community & Support

License

MIT License - see LICENSE for details.


Built with ❤️ by the InterSystems Community

Powering intelligent applications with enterprise-grade RAG

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 7