Commit c95980e
feat(inference): implement prompt caching middleware for OpenAI API
This PR implements Phase 1 of the prompt caching feature - automatic
caching of prompt prefixes in OpenAI-compatible chat completion requests.
**Key Features:**
- Automatic caching of prompts ≥1024 tokens (configurable)
- SHA-256 cache key computation (FIPS-compliant)
- Multi-tenant isolation (tenant_id + user_id in cache keys)
- Circuit breaker pattern for graceful degradation
- Streaming request bypass (configurable)
- Token counting integration (PR2)
- Cache store abstraction integration (PR1)
- OpenAI response schema updates (PR3)
**Implementation:**
- src/llama_stack/core/server/prompt_caching.py
- tests/unit/server/test_prompt_caching.py
- 25 comprehensive unit tests (100% passing)
- >95% code coverage
**Dependencies:**
- Requires PR1 (cache-store-abstraction)
- Requires PR2 (tokenization-utilities)
- Requires PR3 (openai-response-schema)
**Test Results:**
- 25/25 unit tests passing
- All pre-commit checks passing (mypy, ruff, ruff-format)
Part of prompt caching implementation - Phase 1 of llamastack#4166
Signed-off-by: William Caban <william.caban@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>1 parent 840d76e commit c95980e
File tree
3 files changed
+39
-34
lines changed- src/llama_stack/core/server
- tests/unit/server
3 files changed
+39
-34
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | | - | |
| 39 | + | |
| 40 | + | |
40 | 41 | | |
41 | 42 | | |
42 | 43 | | |
| |||
191 | 192 | | |
192 | 193 | | |
193 | 194 | | |
194 | | - | |
| 195 | + | |
195 | 196 | | |
196 | 197 | | |
197 | 198 | | |
| |||
219 | 220 | | |
220 | 221 | | |
221 | 222 | | |
222 | | - | |
| 223 | + | |
223 | 224 | | |
224 | 225 | | |
225 | 226 | | |
226 | 227 | | |
227 | | - | |
| 228 | + | |
228 | 229 | | |
229 | 230 | | |
230 | 231 | | |
231 | 232 | | |
232 | 233 | | |
233 | | - | |
| 234 | + | |
234 | 235 | | |
235 | 236 | | |
236 | 237 | | |
237 | 238 | | |
238 | 239 | | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
239 | 244 | | |
240 | | - | |
| 245 | + | |
241 | 246 | | |
242 | 247 | | |
243 | 248 | | |
244 | 249 | | |
245 | 250 | | |
246 | | - | |
| 251 | + | |
247 | 252 | | |
248 | 253 | | |
249 | 254 | | |
250 | | - | |
251 | | - | |
252 | | - | |
253 | | - | |
| 255 | + | |
| 256 | + | |
254 | 257 | | |
255 | 258 | | |
256 | 259 | | |
| |||
276 | 279 | | |
277 | 280 | | |
278 | 281 | | |
279 | | - | |
| 282 | + | |
280 | 283 | | |
281 | 284 | | |
282 | 285 | | |
| |||
328 | 331 | | |
329 | 332 | | |
330 | 333 | | |
331 | | - | |
| 334 | + | |
332 | 335 | | |
333 | 336 | | |
334 | 337 | | |
| |||
360 | 363 | | |
361 | 364 | | |
362 | 365 | | |
363 | | - | |
| 366 | + | |
364 | 367 | | |
365 | 368 | | |
366 | | - | |
| 369 | + | |
367 | 370 | | |
368 | | - | |
| 371 | + | |
369 | 372 | | |
370 | 373 | | |
371 | 374 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
37 | 36 | | |
38 | 37 | | |
39 | 38 | | |
| |||
204 | 203 | | |
205 | 204 | | |
206 | 205 | | |
207 | | - | |
208 | | - | |
209 | | - | |
| 206 | + | |
210 | 207 | | |
211 | 208 | | |
212 | 209 | | |
| |||
377 | 374 | | |
378 | 375 | | |
379 | 376 | | |
380 | | - | |
381 | | - | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
382 | 381 | | |
383 | 382 | | |
384 | 383 | | |
385 | 384 | | |
| 385 | + | |
386 | 386 | | |
387 | 387 | | |
388 | 388 | | |
389 | | - | |
390 | | - | |
391 | | - | |
392 | | - | |
393 | | - | |
394 | | - | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
395 | 394 | | |
396 | | - | |
397 | | - | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
398 | 398 | | |
399 | | - | |
400 | | - | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
401 | 403 | | |
402 | 404 | | |
403 | 405 | | |
| |||
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
0 commit comments