Skip to content

Commit c90a3b7

Browse files
author
Ryan Malloy
committed
Document how genai addresses issues asg017#2 and asg017#3
Explains how the genai migration provides foundation for: - Issue asg017#2 (Rate limiting): Automatic retries with exponential backoff - Issue asg017#3 (Token tracking): Unified usage metrics across providers Key benefits: - genai's built-in retry logic partially solves rate limiting - Consistent error types and usage data across all providers - Foundation for implementing smart throttling and usage tracking - One implementation point instead of per-provider solutions While not fully solving these issues yet, genai transforms them from complex multi-provider challenges into straightforward feature additions.
1 parent 44b47f5 commit c90a3b7

File tree

1 file changed

+181
-0
lines changed

1 file changed

+181
-0
lines changed

GENAI_BENEFITS.md

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
# How GenAI Solves sqlite-rembed's Open Issues
2+
3+
## Issue #2: Rate Limiting Options
4+
5+
### The Challenge
6+
Different providers have different rate limits, and coordinating these across multiple custom HTTP clients was complex. Some providers return rate limit information in headers (like OpenAI's `x-ratelimit-*` headers), while others don't.
7+
8+
### How GenAI Helps
9+
10+
#### 1. Automatic Retry with Exponential Backoff
11+
GenAI includes built-in retry logic that automatically handles rate limiting:
12+
```rust
13+
// genai automatically retries with exponential backoff
14+
client.embed(&model, text, None)
15+
.await // Retries happen internally
16+
```
17+
18+
This means:
19+
- Transient 429 (Too Many Requests) errors are automatically retried
20+
- Exponential backoff prevents hammering the API
21+
- No manual retry logic needed
22+
23+
#### 2. Unified Error Handling
24+
GenAI provides consistent error types across all providers:
25+
```rust
26+
match result {
27+
Err(e) if e.is_rate_limit() => {
28+
// Handle rate limit uniformly across providers
29+
}
30+
Err(e) => // Other errors
31+
}
32+
```
33+
34+
#### 3. Rate Limit Headers Access
35+
GenAI can expose response metadata including rate limit headers:
36+
```rust
37+
let response = client.embed(&model, text, None).await?;
38+
// Future: Access response.metadata() for rate limit info
39+
```
40+
41+
### Future Improvements
42+
With genai, we could implement:
43+
- Smart request throttling based on header information
44+
- Provider-specific rate limit tracking
45+
- Automatic backoff when approaching limits
46+
47+
## Issue #3: Token/Request Usage Tracking
48+
49+
### The Challenge
50+
Each provider reports token usage differently, making it difficult to track costs and usage across different APIs.
51+
52+
### How GenAI Helps
53+
54+
#### 1. Unified Usage Metrics
55+
GenAI provides consistent token usage information across providers:
56+
```rust
57+
let response = client.embed_batch(&model, texts, None).await?;
58+
// Access token usage
59+
if let Some(usage) = response.usage() {
60+
let tokens_used = usage.total_tokens();
61+
let requests_made = 1; // Track per request
62+
}
63+
```
64+
65+
#### 2. Batch Processing Reduces Tracking Complexity
66+
With batch processing, tracking becomes simpler:
67+
- 1 batch request = 1 API call (easy to count)
68+
- Token usage is reported per batch
69+
- Dramatic reduction in request count makes tracking easier
70+
71+
#### 3. Provider-Agnostic Metrics
72+
GenAI normalizes metrics across providers:
73+
```rust
74+
pub struct Usage {
75+
pub prompt_tokens: Option<u32>,
76+
pub completion_tokens: Option<u32>,
77+
pub total_tokens: Option<u32>,
78+
}
79+
```
80+
81+
### Implementation Ideas
82+
83+
#### Per-Client Usage Tracking
84+
```sql
85+
-- Could add a usage tracking table
86+
CREATE TABLE rembed_usage (
87+
client_name TEXT,
88+
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
89+
requests INTEGER,
90+
tokens_used INTEGER,
91+
batch_size INTEGER
92+
);
93+
94+
-- Track usage after each batch
95+
INSERT INTO rembed_usage (client_name, requests, tokens_used, batch_size)
96+
VALUES ('openai-fast', 1, 5000, 100);
97+
```
98+
99+
#### Usage Statistics Function
100+
```sql
101+
-- Future: Add usage statistics function
102+
SELECT rembed_usage_stats('openai-fast');
103+
-- Returns: {"total_requests": 150, "total_tokens": 750000, "avg_batch_size": 50}
104+
```
105+
106+
## Combined Benefits
107+
108+
The migration to genai provides a foundation for solving both issues:
109+
110+
1. **Unified Interface**: One library handles all provider quirks
111+
2. **Consistent Metadata**: Rate limits and usage data in standard format
112+
3. **Built-in Resilience**: Automatic retries reduce manual error handling
113+
4. **Future-Proof**: New providers automatically get these benefits
114+
115+
## Code Example: Rate Limiting with Token Tracking
116+
117+
Here's how we could extend the current implementation:
118+
119+
```rust
120+
// In genai_client.rs
121+
pub struct EmbeddingClientWithTracking {
122+
client: Arc<GenAiClient>,
123+
model: String,
124+
usage: Arc<Mutex<UsageStats>>,
125+
}
126+
127+
pub struct UsageStats {
128+
total_requests: u64,
129+
total_tokens: u64,
130+
rate_limit_hits: u64,
131+
last_rate_limit_reset: Option<Instant>,
132+
}
133+
134+
impl EmbeddingClientWithTracking {
135+
pub fn embed_batch_with_tracking(&self, texts: Vec<&str>) -> Result<Vec<Vec<f32>>> {
136+
let response = self.client.embed_batch(&self.model, texts, None).await?;
137+
138+
// Track usage
139+
if let Some(usage) = response.usage() {
140+
let mut stats = self.usage.lock().unwrap();
141+
stats.total_requests += 1;
142+
stats.total_tokens += usage.total_tokens().unwrap_or(0) as u64;
143+
}
144+
145+
// Check rate limit headers (when genai exposes them)
146+
if let Some(headers) = response.headers() {
147+
if let Some(remaining) = headers.get("x-ratelimit-remaining-requests") {
148+
// Implement smart throttling
149+
}
150+
}
151+
152+
Ok(response.embeddings)
153+
}
154+
}
155+
```
156+
157+
## SQL Interface for Monitoring
158+
159+
```sql
160+
-- Check current rate limit status
161+
SELECT rembed_rate_limit_status('openai-fast');
162+
-- Returns: {"remaining_requests": 4999, "reset_in": "12ms"}
163+
164+
-- Get usage statistics
165+
SELECT rembed_usage_summary('openai-fast', 'today');
166+
-- Returns: {"requests": 150, "tokens": 750000, "cost_estimate": "$0.15"}
167+
168+
-- Set rate limit configuration
169+
INSERT INTO temp.rembed_rate_limits(client, max_rpm, max_tpm) VALUES
170+
('openai-fast', 5000, 5000000);
171+
```
172+
173+
## Conclusion
174+
175+
The genai migration provides:
176+
1. **Immediate benefits**: Automatic retries partially address rate limiting
177+
2. **Foundation for future**: Standardized interface for implementing full solutions
178+
3. **Simplified implementation**: One place to add rate limiting/tracking logic
179+
4. **Provider flexibility**: Works uniformly across all 10+ providers
180+
181+
While the full solutions for #2 and #3 aren't implemented yet, genai has transformed them from complex multi-provider challenges into straightforward feature additions.

0 commit comments

Comments
 (0)