An enhanced Retrieval-Augmented Generation (RAG) system that combines temporal tracing, graph relationships, and agentic retrieval to provide intelligent, context-aware knowledge management.
- Temporal Tracing: Track the evolution of knowledge over time with full history
- Graph Relationships: Connect related concepts and states
- Intelligent Relationship Updates: Automatically updates memory connections when states evolve using semantic similarity and LLM analysis
- Agentic Retrieval: Intelligent, multi-step retrieval strategies
- Memory Promotion: Synthesize new knowledge states from historical data with conflict resolution and quality checks
- Time-Travel Queries: Query knowledge as it existed at any point in time
- Edge-Based Relevance: Contextual importance via edge weights, NOT filtering
- All states always accessible - nothing forgotten
- Edge strength represents contextual relevance (0.0-1.0)
- Low-strength edges: connections exist but ranked lower
- High-strength edges: prioritized during graph traversal
- Graph structure ensures even "distant" memories are discoverable
- Importance Learning: System learns what's important from access patterns
- Multi-Layer Caching: Redis-backed caching for embeddings, queries, and frequently accessed states
- Hierarchical Consolidation: Auto-summarizes at daily/weekly/monthly levels (like human sleep)
- Latest State Tracking: Instant O(1) lookup for "what's the current status?" queries (materialized view)
- Graph-Based Relevance: Even old/low-strength memories found via edges to latest states
- Storage Tier Support: Infrastructure for hot/warm/cold storage (working/active/archived)
- Instant Latest: <10ms for current state queries (PostgreSQL materialized views)
- Cached Queries: <10ms for frequently accessed data (Redis caching)
- Full Search: <100ms with vector search optimization and caching
- Scalable Storage: Supports millions of states with PostgreSQL + TimescaleDB partitioning
- Space Efficient: Diff-based versioning support for reduced storage overhead
TracingRAG uses a multi-layer architecture:
- Storage Layer: Qdrant (vectors), Neo4j (graphs), PostgreSQL + TimescaleDB (documents)
- Core Services: Memory management, graph operations, embeddings, caching (Redis)
- Agentic Layer: LLM-based query planning, memory promotion, retrieval orchestration
- API Layer: FastAPI REST endpoints with async support
See docs/ for detailed documentation on architecture, usage, and deployment.
Required:
- Python 3.11+ with FastAPI and asyncio
- Qdrant for vector storage and semantic search
- Neo4j for graph database and relationship tracking
- PostgreSQL + TimescaleDB for document storage and temporal queries
- OpenRouter API for LLM access (structured output, query analysis, synthesis)
Embeddings (choose one):
- SentenceTransformers (default, free, runs locally)
all-mpnet-base-v2- English, 768 dim (default)paraphrase-multilingual-mpnet-base-v2- 50+ languages, 768 dim
- OpenAI Embeddings (optional, best multilingual support)
text-embedding-3-small- 100+ languages, 1536 dimtext-embedding-3-large- 100+ languages, 3072 dim- Automatic fallback if local model fails
Optional:
- Redis for caching (embeddings, queries, working memory)
- If not available: uses in-memory LRU cache (1000 items max)
- Recommended for production
Monitoring:
- Prometheus for metrics
- Structlog for structured JSON logging
- Python 3.11+ (including Python 3.14)
- Note for Python 3.14+: Some dependencies (like
greenlet) need to compile from source since pre-built wheels aren't available yet. You'll need:- macOS:
xcode-select --install - Ubuntu/Debian:
sudo apt install build-essential python3-dev - Other systems: C compiler and Python development headers
- macOS:
- Recommended: Use Python 3.11-3.13 for the smoothest installation (pre-built wheels available)
- Note for Python 3.14+: Some dependencies (like
- Docker and Docker Compose
- Poetry (Python package manager)
- Clone the repository:
git clone <repository-url>
cd TracingRAG- Install Poetry (if not already installed):
curl -sSL https://install.python-poetry.org | python3 -- (Optional) If you have multiple Python versions installed:
# Poetry will automatically use the correct Python version
# But you can specify which one explicitly:
poetry env use python3.11 # Use Python 3.11
# OR
poetry env use python3.12 # Use Python 3.12
# OR
poetry env use python3.13 # Use Python 3.13
# OR
poetry env use python3.14 # Use Python 3.14 (requires build tools, see prerequisites)
# Check which Python is being used:
poetry env info- Install dependencies:
poetry install- Copy environment variables and configure:
cp .env.example .env
# Edit .env with your configurationRequired configuration:
OPENROUTER_API_KEY- Your OpenRouter API key for LLM access
Embedding configuration (choose one):
Option 1: Local embeddings (default, free)
# English only (default)
EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2
# OR for multilingual support (50+ languages)
EMBEDDING_MODEL=sentence-transformers/paraphrase-multilingual-mpnet-base-v2Option 2: OpenAI embeddings (best multilingual, API costs)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_EMBEDDING_MODEL=text-embedding-3-small # 100+ languagesOptional configuration:
- Redis caching (recommended for production): Already configured in
docker-compose.yml
- Start infrastructure services:
docker-compose up -d- Run database migrations:
poetry run alembic upgrade head- Start the API server:
poetry run uvicorn tracingrag.api.main:app --reloadThe API will be available at http://localhost:8000
API documentation: http://localhost:8000/docs
TracingRAG provides a comprehensive REST API for all operations:
System:
GET /- API informationGET /health- Health checkGET /metrics- System metrics
Memory Management:
POST /api/v1/memories- Create memory stateGET /api/v1/memories/{id}- Get memory by IDGET /api/v1/memories- List memories (with pagination and filtering)GET /api/v1/traces/{topic}- Get version history for a topic
Query/RAG:
POST /api/v1/query- Query the RAG system (supports both standard and agent-based retrieval)
Promotion:
POST /api/v1/promote- Promote a memory stateGET /api/v1/promotion-candidates- Get topics that are candidates for promotion
- Swagger UI:
http://localhost:8000/docs- Interactive API explorer - ReDoc:
http://localhost:8000/redoc- API reference documentation - OpenAPI JSON:
http://localhost:8000/openapi.json- Machine-readable API spec
# Create a memory
curl -X POST http://localhost:8000/api/v1/memories \
-H "Content-Type: application/json" \
-d '{
"topic": "project_alpha",
"content": "Initial design for API authentication",
"tags": ["design", "security"],
"confidence": 0.95
}'
# Query the RAG system
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "What is the status of project alpha?",
"use_agent": false
}'
# Get promotion candidates
curl "http://localhost:8000/api/v1/promotion-candidates?limit=10&min_priority=7"For complete API documentation, see docs/API_GUIDE.md.
TracingRAG/
├── tracingrag/ # Main package
│ ├── __init__.py
│ ├── core/ # Core domain models
│ │ └── models/ # Data models (memory, graph, rag, promotion)
│ ├── services/ # Business logic layer
│ │ ├── memory.py # Memory state management
│ │ ├── rag.py # RAG pipeline orchestration
│ │ ├── graph.py # Graph relationship management
│ │ ├── retrieval.py # Retrieval strategies
│ │ ├── promotion.py # Memory promotion & synthesis
│ │ ├── embedding.py # Embedding generation
│ │ ├── cache.py # Caching layer (Redis)
│ │ └── ... # Other services
│ ├── storage/ # Storage layer
│ │ ├── qdrant.py # Qdrant vector database
│ │ ├── neo4j_client.py # Neo4j graph database
│ │ ├── database.py # PostgreSQL integration
│ │ ├── redis_client.py # Redis caching
│ │ └── models.py # SQLAlchemy models
│ ├── agents/ # Agentic layer
│ │ ├── query_planner.py # Query planning agent
│ │ ├── memory_manager.py # Memory management agent
│ │ ├── service.py # Agent orchestration
│ │ └── tools.py # Agent tools
│ ├── api/ # API layer
│ │ ├── main.py # FastAPI application
│ │ ├── schemas.py # Pydantic schemas
│ │ └── security.py # Authentication & authorization
│ └── utils/ # Utilities
├── tests/ # Test suite
├── scripts/ # Utility scripts
├── docs/ # Additional documentation
├── examples/ # Usage examples
├── alembic/ # Database migrations
├── k8s/ # Kubernetes manifests
├── docker-compose.yml # Local development infrastructure
├── Dockerfile # Application container
├── pyproject.toml # Poetry configuration
├── .env.example # Environment variables template
└── README.md # This file
from tracingrag.client import TracingRAGClient
client = TracingRAGClient("http://localhost:8000")
# Create initial memory state
state = client.create_memory(
topic="project_alpha",
content="Starting development of feature X with approach Y",
tags=["project", "development"]
)# Query for relevant memories
results = client.query(
query="What's the status of project alpha?",
include_history=True, # Include trace context
include_related=True, # Include graph connections
depth=2 # Graph traversal depth
)
for result in results:
print(f"Topic: {result.topic}")
print(f"Content: {result.content}")
print(f"Version: {result.version}")
print(f"Related: {[r.topic for r in result.related_states]}")# Promote memory to new state with synthesis
new_state = client.promote_memory(
topic="project_alpha",
reason="Bug discovered and fixed, feature complete"
)
# The system will:
# 1. Analyze trace history
# 2. Find related states (e.g., bug reports)
# 3. Synthesize new state with LLM
# 4. Create appropriate graph edgesfrom datetime import datetime, timedelta
# What did we know about this topic 2 weeks ago?
past_state = client.query_at_time(
topic="project_alpha",
timestamp=datetime.now() - timedelta(weeks=2)
)poetry run pytestpoetry run ruff check .
poetry run ruff format .poetry run mypy tracingragKey environment variables (see .env.example):
OPENROUTER_API_KEY: OpenRouter API key for LLM accessQDRANT_URL: Qdrant server URLQDRANT_API_KEY: Qdrant API key (if using cloud)NEO4J_URI: Neo4j connection URINEO4J_USERNAME: Neo4j usernameNEO4J_PASSWORD: Neo4j passwordDATABASE_URL: PostgreSQL connection URLEMBEDDING_MODEL: Model to use for embeddings (default: all-mpnet-base-v2)
- Phase 1: Foundation - Core data models, storage interfaces, and basic infrastructure
- Phase 2: Retrieval Services - Semantic search, graph-enhanced retrieval, temporal queries, hybrid retrieval
- Phase 3: Graph Layer - Edge management, relationship types, temporal validity, graph traversal
- Phase 4: Basic RAG - Query processing, context building, LLM integration, response generation
- Phase 5: Agentic Layer - Intelligent agents for query planning and memory management
- Phase 6: Memory Promotion - Synthesis capabilities and knowledge consolidation
- Phase 7: Advanced Features - Redis caching, hierarchical consolidation, performance optimization
- Phase 8: Production Ready - Security, monitoring, CI/CD, Kubernetes deployment
🎉 TracingRAG is now production-ready!
TracingRAG is fully production-ready with:
- Security: JWT authentication, API key support, rate limiting, input validation
- Monitoring: Prometheus metrics (50+ metrics), structured logging, health checks
- CI/CD: Automated testing, linting, Docker builds, security scanning
- Kubernetes: Complete K8s manifests with autoscaling (HPA), ingress, TLS support
- Performance: Multi-stage Docker builds, caching layers, optimized resource allocation
See docs/DEPLOYMENT_GUIDE.md for complete deployment instructions.
Contributions are welcome! Please see docs/DEVELOPMENT.md for development setup and guidelines.
MIT License - see LICENSE for details.
Inspired by:
- Zep's Graphiti - Temporal knowledge graphs
- Microsoft GraphRAG - Graph-based RAG
- LangChain - LLM application patterns