High-performance LLM query cache with semantic search. Reduce API costs 80% and latency from 8.5s to 1ms using Redis + Qdrant vector DB. Multi-provider support (OpenAI, Anthropic).
redis embeddings openai cost-optimization rag fastapi vector-database qdrant semantic-cache llm-caching
-
Updated
Dec 2, 2025 - Python