|
| 1 | +# How GenAI Solves sqlite-rembed's Open Issues |
| 2 | + |
| 3 | +## Issue #2: Rate Limiting Options |
| 4 | + |
| 5 | +### The Challenge |
| 6 | +Different providers have different rate limits, and coordinating these across multiple custom HTTP clients was complex. Some providers return rate limit information in headers (like OpenAI's `x-ratelimit-*` headers), while others don't. |
| 7 | + |
| 8 | +### How GenAI Helps |
| 9 | + |
| 10 | +#### 1. Automatic Retry with Exponential Backoff |
| 11 | +GenAI includes built-in retry logic that automatically handles rate limiting: |
| 12 | +```rust |
| 13 | +// genai automatically retries with exponential backoff |
| 14 | +client.embed(&model, text, None) |
| 15 | + .await // Retries happen internally |
| 16 | +``` |
| 17 | + |
| 18 | +This means: |
| 19 | +- Transient 429 (Too Many Requests) errors are automatically retried |
| 20 | +- Exponential backoff prevents hammering the API |
| 21 | +- No manual retry logic needed |
| 22 | + |
| 23 | +#### 2. Unified Error Handling |
| 24 | +GenAI provides consistent error types across all providers: |
| 25 | +```rust |
| 26 | +match result { |
| 27 | + Err(e) if e.is_rate_limit() => { |
| 28 | + // Handle rate limit uniformly across providers |
| 29 | + } |
| 30 | + Err(e) => // Other errors |
| 31 | +} |
| 32 | +``` |
| 33 | + |
| 34 | +#### 3. Rate Limit Headers Access |
| 35 | +GenAI can expose response metadata including rate limit headers: |
| 36 | +```rust |
| 37 | +let response = client.embed(&model, text, None).await?; |
| 38 | +// Future: Access response.metadata() for rate limit info |
| 39 | +``` |
| 40 | + |
| 41 | +### Future Improvements |
| 42 | +With genai, we could implement: |
| 43 | +- Smart request throttling based on header information |
| 44 | +- Provider-specific rate limit tracking |
| 45 | +- Automatic backoff when approaching limits |
| 46 | + |
| 47 | +## Issue #3: Token/Request Usage Tracking |
| 48 | + |
| 49 | +### The Challenge |
| 50 | +Each provider reports token usage differently, making it difficult to track costs and usage across different APIs. |
| 51 | + |
| 52 | +### How GenAI Helps |
| 53 | + |
| 54 | +#### 1. Unified Usage Metrics |
| 55 | +GenAI provides consistent token usage information across providers: |
| 56 | +```rust |
| 57 | +let response = client.embed_batch(&model, texts, None).await?; |
| 58 | +// Access token usage |
| 59 | +if let Some(usage) = response.usage() { |
| 60 | + let tokens_used = usage.total_tokens(); |
| 61 | + let requests_made = 1; // Track per request |
| 62 | +} |
| 63 | +``` |
| 64 | + |
| 65 | +#### 2. Batch Processing Reduces Tracking Complexity |
| 66 | +With batch processing, tracking becomes simpler: |
| 67 | +- 1 batch request = 1 API call (easy to count) |
| 68 | +- Token usage is reported per batch |
| 69 | +- Dramatic reduction in request count makes tracking easier |
| 70 | + |
| 71 | +#### 3. Provider-Agnostic Metrics |
| 72 | +GenAI normalizes metrics across providers: |
| 73 | +```rust |
| 74 | +pub struct Usage { |
| 75 | + pub prompt_tokens: Option<u32>, |
| 76 | + pub completion_tokens: Option<u32>, |
| 77 | + pub total_tokens: Option<u32>, |
| 78 | +} |
| 79 | +``` |
| 80 | + |
| 81 | +### Implementation Ideas |
| 82 | + |
| 83 | +#### Per-Client Usage Tracking |
| 84 | +```sql |
| 85 | +-- Could add a usage tracking table |
| 86 | +CREATE TABLE rembed_usage ( |
| 87 | + client_name TEXT, |
| 88 | + timestamp DATETIME DEFAULT CURRENT_TIMESTAMP, |
| 89 | + requests INTEGER, |
| 90 | + tokens_used INTEGER, |
| 91 | + batch_size INTEGER |
| 92 | +); |
| 93 | + |
| 94 | +-- Track usage after each batch |
| 95 | +INSERT INTO rembed_usage (client_name, requests, tokens_used, batch_size) |
| 96 | +VALUES ('openai-fast', 1, 5000, 100); |
| 97 | +``` |
| 98 | + |
| 99 | +#### Usage Statistics Function |
| 100 | +```sql |
| 101 | +-- Future: Add usage statistics function |
| 102 | +SELECT rembed_usage_stats('openai-fast'); |
| 103 | +-- Returns: {"total_requests": 150, "total_tokens": 750000, "avg_batch_size": 50} |
| 104 | +``` |
| 105 | + |
| 106 | +## Combined Benefits |
| 107 | + |
| 108 | +The migration to genai provides a foundation for solving both issues: |
| 109 | + |
| 110 | +1. **Unified Interface**: One library handles all provider quirks |
| 111 | +2. **Consistent Metadata**: Rate limits and usage data in standard format |
| 112 | +3. **Built-in Resilience**: Automatic retries reduce manual error handling |
| 113 | +4. **Future-Proof**: New providers automatically get these benefits |
| 114 | + |
| 115 | +## Code Example: Rate Limiting with Token Tracking |
| 116 | + |
| 117 | +Here's how we could extend the current implementation: |
| 118 | + |
| 119 | +```rust |
| 120 | +// In genai_client.rs |
| 121 | +pub struct EmbeddingClientWithTracking { |
| 122 | + client: Arc<GenAiClient>, |
| 123 | + model: String, |
| 124 | + usage: Arc<Mutex<UsageStats>>, |
| 125 | +} |
| 126 | + |
| 127 | +pub struct UsageStats { |
| 128 | + total_requests: u64, |
| 129 | + total_tokens: u64, |
| 130 | + rate_limit_hits: u64, |
| 131 | + last_rate_limit_reset: Option<Instant>, |
| 132 | +} |
| 133 | + |
| 134 | +impl EmbeddingClientWithTracking { |
| 135 | + pub fn embed_batch_with_tracking(&self, texts: Vec<&str>) -> Result<Vec<Vec<f32>>> { |
| 136 | + let response = self.client.embed_batch(&self.model, texts, None).await?; |
| 137 | + |
| 138 | + // Track usage |
| 139 | + if let Some(usage) = response.usage() { |
| 140 | + let mut stats = self.usage.lock().unwrap(); |
| 141 | + stats.total_requests += 1; |
| 142 | + stats.total_tokens += usage.total_tokens().unwrap_or(0) as u64; |
| 143 | + } |
| 144 | + |
| 145 | + // Check rate limit headers (when genai exposes them) |
| 146 | + if let Some(headers) = response.headers() { |
| 147 | + if let Some(remaining) = headers.get("x-ratelimit-remaining-requests") { |
| 148 | + // Implement smart throttling |
| 149 | + } |
| 150 | + } |
| 151 | + |
| 152 | + Ok(response.embeddings) |
| 153 | + } |
| 154 | +} |
| 155 | +``` |
| 156 | + |
| 157 | +## SQL Interface for Monitoring |
| 158 | + |
| 159 | +```sql |
| 160 | +-- Check current rate limit status |
| 161 | +SELECT rembed_rate_limit_status('openai-fast'); |
| 162 | +-- Returns: {"remaining_requests": 4999, "reset_in": "12ms"} |
| 163 | + |
| 164 | +-- Get usage statistics |
| 165 | +SELECT rembed_usage_summary('openai-fast', 'today'); |
| 166 | +-- Returns: {"requests": 150, "tokens": 750000, "cost_estimate": "$0.15"} |
| 167 | + |
| 168 | +-- Set rate limit configuration |
| 169 | +INSERT INTO temp.rembed_rate_limits(client, max_rpm, max_tpm) VALUES |
| 170 | + ('openai-fast', 5000, 5000000); |
| 171 | +``` |
| 172 | + |
| 173 | +## Conclusion |
| 174 | + |
| 175 | +The genai migration provides: |
| 176 | +1. **Immediate benefits**: Automatic retries partially address rate limiting |
| 177 | +2. **Foundation for future**: Standardized interface for implementing full solutions |
| 178 | +3. **Simplified implementation**: One place to add rate limiting/tracking logic |
| 179 | +4. **Provider flexibility**: Works uniformly across all 10+ providers |
| 180 | + |
| 181 | +While the full solutions for #2 and #3 aren't implemented yet, genai has transformed them from complex multi-provider challenges into straightforward feature additions. |
0 commit comments