A knowledge graph system built on InterSystems IRIS that combines graph traversal, vector similarity search, and full-text search in a single database.
NEW: Interactive Demo Server showcasing fraud detection + biomedical capabilities
Proven at Scale Across Industries:
- Financial Services: Real-time fraud detection (130M+ transactions), bitemporal audit trails, <10ms queries
- Biomedical Research: Protein interaction networks (100K+ proteins), drug discovery, <50ms multi-hop queries
Same IRIS platform. Different domains. Powerful results.
Two Deployment Modes:
- External (DEFAULT - simpler): Python app connects to IRIS via
iris.connect() - Embedded (ADVANCED - optional): Python app runs INSIDE IRIS container
# 1. Start IRIS database
docker-compose up -d
# 2. Install Python dependencies
pip install iris-vector-graph # Core features
pip install iris-vector-graph[ml] # + Machine learning (fraud scoring models)
# 3. Load fraud schema
docker exec -i iris /usr/irissys/bin/irissession IRIS -U USER < sql/fraud/schema.sql
# 4. Start fraud API (external Python)
PYTHONPATH=src python -m iris_fraud_server
# Test fraud scoring API
curl -X POST http://localhost:8000/fraud/score \
-H 'Content-Type: application/json' \
-d '{"mode":"MLP","payer":"acct:test","device":"dev:laptop","amount":1000.0}'# Run FastAPI INSIDE IRIS container (licensed IRIS required)
docker-compose -f docker-compose.fraud-embedded.yml up -d
# Test fraud scoring API (~2 min startup)
curl -X POST http://localhost:8100/fraud/score \
-H 'Content-Type: application/json' \
-d '{"mode":"MLP","payer":"acct:test","device":"dev:laptop","amount":1000.0}'What you get:
- FastAPI fraud scoring (external
:8000or embedded:8100) - Bitemporal data (track when transactions happened vs. when you learned about them)
- Complete audit trails (regulatory compliance: SOX, MiFID II)
- Direct IRIS queries (no middleware)
Learn more: examples/bitemporal/README.md - Fraud scenarios, chargeback defense, model tracking
# 1. Start IRIS database
docker-compose up -d
# 2. Install dependencies
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync && source .venv/bin/activate
# 3. Load STRING protein database (10K proteins, ~1 minute)
python scripts/performance/string_db_scale_test.py --max-proteins 10000
# 4. Start interactive demo server (external Python)
PYTHONPATH=src python -m iris_demo_server.app
# 5. Open browser
open http://localhost:8200/bio# Run demo server INSIDE IRIS container (licensed IRIS required)
# Coming soon - currently only external mode supported for biomedical demoWhat you get:
- Interactive protein search with vector similarity (EGFR, TP53, etc.)
- D3.js graph visualization with click-to-expand nodes showing interaction networks
- Pathway analysis between proteins using BFS graph traversal
- Real STRING DB data (10K proteins, 37K interactions)
- <100ms queries powered by direct IRIS integration (no API middleware)
- 20/20 contract tests passing - production-ready biomedical demo
Learn more:
docs/biomedical-demo-setup.md- Complete setup guide with scaling optionsbiomedical/README.md- Architecture and development patterns
| Use Case | Features | Performance |
|---|---|---|
| Real-Time Fraud Detection | Graph-based scoring, MLP models, device fingerprinting | <10ms scoring, 130M+ transactions |
| Bitemporal Audit Trails | Valid time vs. system time, chargeback defense, compliance | <10ms time-travel queries |
| Late Arrival Detection | Settlement delay analysis, backdated transaction flagging | Pattern detection across 130M events |
| Regulatory Compliance | SOX, GDPR, MiFID II, Basel III reporting | Complete audit trail preservation |
Files:
examples/bitemporal/- Fraud scenarios, audit queries, Python APIsql/bitemporal/- Schema (2 tables, 3 views, 8 indexes)src/iris_fraud_server/- FastAPI fraud scoring serverdocker-compose.fraud-embedded.yml- Licensed IRIS + embedded Python
Quick Links:
| Use Case | Features | Performance |
|---|---|---|
| Protein Interaction Networks | STRING DB integration, pathway analysis, vector similarity | <50ms multi-hop queries (100K+ proteins) |
| Drug Discovery | Compound similarity, target identification, graph analytics | <10ms vector search (HNSW) |
| Literature Mining | Hybrid search (embeddings + BM25), entity extraction | RRF fusion, sub-second queries |
| Pathway Analysis | Multi-hop traversal, PageRank, connected components | NetworkX integration, embedded Python |
Files:
biomedical/- Protein queries, pathway examplessql/schema.sql- Graph schema (nodes, edges, properties, embeddings)iris_vector_graph_core/- Core Python graph enginedocker-compose.acorn.yml- ACORN-1 with HNSW optimization
Quick Links:
Two standalone implementations of the Traveling Salesman Problem demonstrating graph algorithms on IRIS:
Find optimal pathways through protein interaction networks:
# Test with 10 cancer-related proteins
python scripts/algorithms/tsp_demo.py --proteins 10 --compare-methodsAlgorithms: Greedy (1ms), Christofides (15ms), 2-opt (8ms) Use case: Optimize order to study protein interactions in cancer pathways
Optimize caregiver routes for home healthcare:
# Load sample data (8 patients, 26 travel edges)
docker exec -i iris /usr/irissys/bin/irissession IRIS -U USER < sql/caregiver_routing_demo.sql
# Run optimization demo (IRIS Terminal)
Do ^TestCaregiverRouterPerformance: <2ms for 8-patient routes Integration: Direct Business Process method calls Impact: 53% travel time reduction (75min → 35min)
What you get:
- Python approach: NetworkX integration, multiple algorithms, FastAPI endpoint example
- ObjectScript approach: Zero dependencies, Interoperability production integration, bitemporal audit
- Comprehensive docs: Neo4j comparison, performance benchmarks, real-world use cases
Files:
scripts/algorithms/tsp_demo.py- Python demo (works with STRING protein data)iris/src/Graph/CaregiverRouter.cls- ObjectScript TSP optimizeriris/src/Graph/ScheduleOptimizationProcess.cls- Business Process integrationsql/caregiver_routing_demo.sql- Sample healthcare data
Learn more:
docs/algorithms/TSP_ANALYSIS.md- Deep dive and Neo4j comparisondocs/algorithms/TSP_IMPLEMENTATION_SUMMARY.md- Overview and benchmarksdocs/examples/CAREGIVER_ROUTING_DEMO.md- Step-by-step tutorial
Deployment Options:
- External (Default): Python app connects to IRIS via
iris.connect()- simpler setup, easier debugging - Embedded (Advanced): Python app runs inside IRIS container - maximum performance, requires licensed IRIS
External Deployment (DEFAULT) Embedded Deployment (OPTIONAL)
┌────────────────────────┐ ┌──────────────────────────────┐
│ FastAPI Server │ │ IRIS Container │
│ (external Python) │ │ ┌──────────────────────────┐ │
│ │ │ │ FastAPI Server │ │
│ iris.connect() ─────┼──────────┤►│ (/usr/irissys/bin/ │ │
│ to localhost:1972 │ │ │ irispython) │ │
└────────────────────────┘ │ └──────────────────────────┘ │
│ ┌──────────────────────────┐ │
│ │ IRIS Database Engine │ │
│ │ (Bitemporal/Graph/Vector)│ │
│ └──────────────────────────┘ │
└──────────────────────────────┘
Same Platform: InterSystems IRIS
Same Features: Vector Search, Graph Traversal, Bitemporal Audit
Different Domains: Finance vs. Life Sciences
Core Components:
- IRIS Globals: Append-only storage (perfect for audit trails + graph data)
- Embedded Python: Run ML models and graph algorithms in-database
- SQL Procedures:
kg_KNN_VEC(vector search),kg_RRF_FUSE(hybrid search) - HNSW Indexing: 100x faster vector similarity (requires IRIS 2025.3+ or ACORN-1)
| Feature | Financial Services Use | Biomedical Use |
|---|---|---|
| Embedded Python | Fraud ML models in-database | Graph analytics (PageRank, etc.) |
| Personalized PageRank | Entity importance scoring | Document ranking, pathway analysis |
| Temporal Queries | Bitemporal audit ("what did we know when?") | Time-series biomarker analysis |
| Graph Traversal | Fraud ring detection (multi-hop) | Protein interaction pathways |
| Vector Search | Transaction similarity | Protein/compound similarity |
| Partial Indexes | WHERE system_to IS NULL (10x faster) |
WHERE label = 'protein' |
- Globals Storage: Append-only (no UPDATE contention)
- Partial Indexes: Filter at index level (
WHERE system_to IS NULL) - Temporal Views: Pre-filter current versions
- Foreign Key Constraints: Referential integrity across graph
- HNSW Vector Index: 100x faster than flat search (ACORN-1)
- PPR Functional Index: ObjectScript
$LISTBUILD+$LISTNEXTfor 8.9x faster PageRank at scale (10K nodes: 184ms vs 1,631ms Python)
| Metric | Community IRIS | Licensed IRIS |
|---|---|---|
| Transactions | 30M | 130M |
| Database Size | 5.3GB | 22.1GB |
| Fraud Scoring | <10ms | <10ms |
| Bitemporal Queries | <10ms (indexed) | <10ms (indexed) |
| Time-Travel Queries | <50ms | <50ms |
| Late Arrival Detection | Pattern search across 30M | Pattern search across 130M |
| Metric | Pure Python | ObjectScript Native |
|---|---|---|
| Vector Search | 5800ms (flat) → 1.7ms (HNSW) | Same (HNSW index) |
| Multi-hop Queries | <50ms | <50ms |
| Hybrid Search (RRF) | <100ms | <20ms |
| Personalized PageRank (1K) | 14.5ms | 14.3ms |
| Personalized PageRank (10K) | 1,631ms | 184ms (8.9x faster) ✨ |
| Graph Analytics | NetworkX integration | Zero-copy Global access |
Tested At Scale:
- ✅ 130M fraud transactions (licensed IRIS)
- ✅ 100K+ protein interactions (STRING DB)
- ✅ 768-dimensional embeddings (biomedical models)
Compute entity importance scores for knowledge graph ranking:
from iris_vector_graph_core import IRISGraphEngine
import iris
# Connect to IRIS
conn = iris.connect("localhost", 1972, "USER", "_SYSTEM", "SYS")
engine = IRISGraphEngine(conn)
# Compute PPR scores from seed entity
scores = engine.kg_PERSONALIZED_PAGERANK(
seed_entities=["PROTEIN:TP53"], # Seed with cancer protein
damping_factor=0.85, # Standard PageRank parameter
top_k=20 # Return top 20 scored entities
)
# Results: {'PROTEIN:TP53': 0.152, 'PROTEIN:MDM2': 0.087, ...}
# Rank documents by PPR scores
docs = engine.kg_PPR_RANK_DOCUMENTS(
seed_entities=["PROTEIN:TP53"],
top_k=10
)
# Results: [{document_id, score, top_entities, entity_count}, ...]Performance: <25ms for 1K entities, ~200ms for 10K entities (Python implementation)
sql/
schema.sql # Core graph schema
bitemporal/ # Fraud detection schema
fraud/ # Transaction tables
examples/
bitemporal/ # Financial services (fraud, audit)
biomedical/ # Life sciences (proteins, pathways)
iris_vector_graph_core/ # Python graph engine
src/iris_fraud_server/ # FastAPI fraud API
scripts/
fraud/ # 130M loader, benchmarks
migrations/ # NodePK migration
docker/
Dockerfile.fraud-embedded # Licensed IRIS + fraud API
start-fraud-server.sh # Embedded Python startup
MIT License - See LICENSE
We welcome contributions! This repo demonstrates IRIS versatility across:
- Financial Services: Fraud detection, bitemporal data, regulatory compliance
- Biomedical Research: Protein networks, drug discovery, literature mining
Feel free to add examples from other domains or improve existing implementations.
Production-Ready: Proven with 130M+ financial transactions and 100K+ biomedical interactions on InterSystems IRIS.