β οΈ Educational Workshop: This repository contains demonstration code for AWS re:Invent 2025. Not intended for production deployment without proper security hardening and testing.
Duration: 60 minutes | Level: 400 (Expert)
Build production-grade hybrid search combining semantic vectors, full-text search, and fuzzy matching. Implement Model Context Protocol (MCP) for context-aware retrieval with persona-based securityβenabling AI agents to query structured data beyond traditional RAG.
What You'll Build:
- Hybrid search with fuzzy, semantic, and RRF methods
- MCP-based agent with intelligent database querying
- Context-aware filtering with Row-Level Security
βββ notebooks/
β βββ 01-dat409-hybrid-search-TODO.ipynb # Hands-on lab with TODO blocks
β βββ 02-dat409-hybrid-search-SOLUTIONS.ipynb # Reference implementation
βββ data/
β βββ amazon-products-sample.csv # 21,704 product dataset
βββ demo-app/
β βββ streamlit_app.py # Full-stack reference application
β βββ requirements.txt
β βββ .streamlit/config.toml
βββ scripts/
β βββ bootstrap-code-editor-unified.sh # Environment setup
β βββ setup/test_connection.py
βββ cfn/ # CloudFormation templates
βββ requirements.txt # Workshop dependencies
Complete 3 search methods (6 TODO sections total):
| Method | Technology | Use Case |
|---|---|---|
| Fuzzy | pg_trgm + GIN | Typo tolerance ("wireles hedphones") |
| Semantic | pgvector + HNSW + Cohere | Conceptual queries ("eco-friendly products") |
| Hybrid RRF | Reciprocal Rank Fusion | Multi-signal fusion without ML overhead |
Hands-On:
cd /notebooks
# Open 01-dat409-hybrid-search-TODO.ipynbKey Learning:
- When to use each search method
- HNSW vs IVFFlat index strategies
- RRF vs weighted fusion
- Cohere Rerank for ML-based optimization
Explore MCP-enabled context-aware search:
User Query β Claude Sonnet 4 β MCP Tools β Aurora PostgreSQL
β β β
Tool Selection SQL Query RLS-Filtered Results
Hands-On:
cd /demo-app
streamlit run streamlit_app.pyKey Learning:
- Dynamic retrieval strategy selection
- Persona-based RLS for multi-tenant agents
- Cohere Rerank vs RRF comparison
- Production deployment patterns
For AWS re:Invent Participants:
- Access Code Editor via provided CloudFront URL
- Navigate to
/notebooks/ - Open
01-dat409-hybrid-search-TODO.ipynb - Complete 3 TODO blocks (guided with hints)
- Launch demo app:
streamlit run demo-app/streamlit_app.py
Pre-Configured Environment:
- β Aurora PostgreSQL 17.5 with pgvector 0.8.0
- β 21,704 products with pre-generated Cohere embeddings
- β Python 3.13 + Jupyter + all dependencies
- β Amazon Bedrock access (Cohere Embed v3, Rerank v3.5)
- β MCP server (awslabs.postgres-mcp-server)
- β Strands Agent Framework + Claude Sonnet 4
Bedrock Pricing (us-west-2):
- Cohere Embed v3: $0.0001 per 1K tokens
- Workshop dataset: ~$2.17 for 21,704 products (one-time)
- Production: Pre-generate embeddings to avoid repeated costs
- Cohere Rerank v3.5: $0.002 per search
- ~$2 per 1,000 searches
- Use for user-facing search where accuracy is critical
Cost Optimization Strategies:
- β Pre-generate embeddings: One-time cost vs per-query cost
- β Cache rerank results: Redis with 1-hour TTL (reduces 80%+ of rerank calls)
- β Use RRF for internal tools: Zero cost, in-database fusion
- β Batch embedding generation: Process in batches of 96 texts (Cohere limit)
When to Use What:
- Cohere Rerank: Customer-facing search, high-value queries (~$0.002/search)
- RRF: Internal tools, high-volume, cost-sensitive (~$0/search)
- Hybrid without rerank: Balance of accuracy and cost
| Service | Purpose |
|---|---|
| Amazon Aurora PostgreSQL | Vector storage with pgvector 0.8.0 extension |
| Amazon Bedrock | Cohere Embed v3 (embeddings), Rerank v3.5 (ML reranking) |
| RDS Data API | Serverless, IAM-authenticated database access |
| Claude Sonnet 4 | Natural language β SQL translation via Bedrock |
MCP shifts from relevance-based retrieval (RAG) to structured, queryable, context-rich inputs.
| RAG | MCP |
|---|---|
| β Fixed retrieval patterns | β Dynamic tool selection |
| β No query-time filtering | β Context-aware filtering |
| β Static embeddings only | β Hybrid retrieval strategies |
| β Limited multi-step reasoning | β Direct structured data access |
User Query β Claude Sonnet 4 β MCP Tools β Aurora PostgreSQL
β β β
Analyzes Intent SQL Query RLS-Filtered Results
Selects Tools run_query WHERE persona = ANY(access)
Key Components:
- Strands Agent: Orchestration & tool calling
- Claude Sonnet 4: Natural language β SQL translation
- MCP Client: Standardized database tools (
awslabs.postgres-mcp-server) - Aurora Data API: Serverless, IAM-authenticated access
- RLS: Application-level security via system prompt
| Method | Best For | Avoid When |
|---|---|---|
| Semantic | Conceptual queries, cross-language, intent-based | Exact SKU lookup, low-latency (<10ms) |
| Keyword | Exact terms, Boolean queries, structured fields | Typos common, multi-language content |
| Fuzzy | Typo tolerance, auto-complete, unreliable input | Precision critical, large result sets |
| Hybrid | Production systems, mixed queries | Single-method suffices |
HNSW vs IVFFlat:
- HNSW: User-facing search, >100K vectors, read-heavy (10-50ms queries)
- IVFFlat: Rapid prototyping, frequent updates, write-heavy (50-200ms queries)
Cohere Rerank vs RRF:
- Cohere Rerank: User-facing search, accuracy critical (~50-200ms latency, cost per request)
- RRF: Internal tools, cost-sensitive, low-latency (in-database, zero cost)
MCP enables agents to dynamically select retrieval strategies (vector, keyword, SQL filters) based on query intentβenabling time-based, persona-based, and operational context filtering impossible with static embeddings alone.
Core Technologies:
- pgvector - Vector similarity search
- Model Context Protocol - Standardized AI tool protocol
- Aurora PostgreSQL - Managed database
- PostgreSQL RLS - Row-level security
AWS Services:
- Amazon Bedrock - Cohere Embed v3, Rerank v3.5
- RDS Data API - Serverless access
- Strands Agent Framework - MCP-compatible agents
Extend This Workshop:
- Add time-based filtering (
WHERE created_at > NOW() - INTERVAL '7 days') - Implement query caching (Redis/ElastiCache)
- Build custom MCP tools for your domain
Production Checklist:
- HNSW indexes on vector columns
- GIN indexes on tsvector/trigram columns
- Connection pooling (PgBouncer/RDS Proxy)
- RLS policies and IAM authentication
- Audit logging enabled
- Monitoring and observability (see below)
Monitoring & Observability:
For production deployments, monitor search performance and database health:
- Database Insights: Track query latency, top SQL statements, and database load in real-time
- CloudWatch Metrics: Monitor custom metrics for search method usage (semantic vs keyword vs fuzzy) and result quality
- Application Logging: Log search queries, response times, and result counts for analysis and optimization
π‘ Note: Advanced vector optimization techniques (Binary Quantization, Scalar Quantization) are covered in the companion session DAT406 - Build Agentic AI powered search with Amazon Aurora and Amazon RDS
β Star this repository | π΄ Fork for your use cases | π Report issues | π‘ Submit PRs
See CONTRIBUTING.md for guidelines.
MIT-0 License - See LICENSE
AWS re:Invent 2025 | DAT409 - 400 Level Expert Session
Hybrid Search with Aurora PostgreSQL for MCP Retrieval
Β© 2025 Shayon Sanyal