-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
🚀 Describe the new functionality needed
While working on #3944 and diving into the current generator I've realized how non-pytonic the current approach was.
We can certainly push thing even "code-first" to get an OpenAPI spec generated from code via FastAPI with the help of Pydantic.
Overview
Migrate from custom @webmethod decorators to standard FastAPI routers, splitting each API into three files:
service.py- Protocol definitions (no decorators, just type hints)models.py- Pydantic request/response models with Field descriptionsroutes.py- FastAPI router with route decorators
Benefits
- Simpler OpenAPI generation: Direct FastAPI schema generation, no introspection needed (~800 lines vs 1232)
- Standard FastAPI patterns: Use router decorators for metadata (summary, description, deprecation)
- Better type safety: Pydantic Field descriptions instead of docstring parsing (see all the
:paramstuff) - Cleaner code: Separation of concerns (protocol, models, routes)
Current State
- ~20 APIs using
@webmethoddecorators in protocol files - Route discovery via
get_all_api_routes()introspecting webmethod decorators - OpenAPI generator introspects function signatures and docstrings - very cumbersome
- The YAML generation is not helping either (special handling for docstring on multiple lines, need for scalar operator during the generation) but YAML is a different story
- Server uses
create_dynamic_typed_route()to wrap protocol methods - We maintain our own "routers" implementation in src/llama_stack/schema_utils.py - most of it is being offered by FastAPI routers
Proposed Structure
Example: Batches API
batches_service.py - Protocol only (the contract), no decorators:
@runtime_checkable
class BatchService(Protocol):
"""The Batches API enables efficient processing of multiple requests in a single operation,
particularly useful for processing large datasets, batch evaluation workflows, and
cost-effective inference at scale.
The API is designed to allow use of openai client libraries for seamless integration.
This API provides the following extensions:
- idempotent batch creation
Note: This API is currently under active development and may undergo changes.
"""
async def create_batch(
self,
input_file_id: str,
endpoint: str,
completion_window: Literal["24h"],
metadata: dict[str, str] | None = None,
idempotency_key: str | None = None,
) -> BatchObject:
"""Create a new batch for processing multiple API requests."""
...models.py - Pydantic models with Field descriptions - I'm keep @json_schema_typeintentionally for now:
@json_schema_type
class CreateBatchRequest(BaseModel):
"""Request model for creating a batch."""
input_file_id: str = Field(..., description="The ID of an uploaded file containing requests for the batch.")
endpoint: str = Field(..., description="The endpoint to be used for all requests in the batch.")
completion_window: Literal["24h"] = Field(
..., description="The time window within which the batch should be processed."
)
metadata: dict[str, str] | None = Field(default=None, description="Optional metadata for the batch.")
idempotency_key: str | None = Field(
default=None, description="Optional idempotency key. When provided, enables idempotent behavior."
)routes.py - FastAPI router with decorators:
router = APIRouter(
prefix=f"/{LLAMA_STACK_API_V1}",
tags=["Batches"],
responses=standard_responses,
)
@router.post(
"/batches",
response_model=BatchObject,
summary="Create a new batch for processing multiple API requests.",
description="Create a new batch for processing multiple API requests.",
)
async def create_batch(
request: CreateBatchRequest = Body(...),
svc: BatchService = Depends(get_batch_service),
) -> BatchObject:
"""Create a new batch."""
return await svc.create_batch(
input_file_id=request.input_file_id,
endpoint=request.endpoint,
completion_window=request.completion_window,
metadata=request.metadata,
idempotency_key=request.idempotency_key,
)Infrastructure Needed
-
Router utilities (
core/server/router_utils.py):standard_responsesdict for common error responses- Helper functions for dependency injection
-
Router registration (
core/server/routers.py):register_router(Api, router_factory)function- Router discovery mechanism
get_all_routers()to replaceget_all_api_routes()
-
Server integration:
- Update
server.pyto include routers instead of webmethod routes - Dependency injection for service implementations
- Update
-
Schema utilities:
- Keep
@json_schema_typedecorator fromllama_stack.schema_utilsif beneficial for OpenAPI schema generation - Evaluate if it adds value over standard Pydantic model generation (controls top-level component registration)
- Keep
I already have a complete implementation ready at https://github.com/leseb/llama-stack/tree/router to be looked at. So the server starts and I could validate a lot of the integration tests.
Migration Tasks
Phase 1: Infrastructure
- Create
router_utils.pywith standard responses - Create router registration system
- Update
fastapi_generator.pyto read FastAPI routers directly - Simplify generator (remove webmethod introspection code)
Phase 2: API Migration
For each API (batches, models, inference, etc.):
- Extract Protocol to
service.py(remove@webmethoddecorators) - Create
models.pywith Pydantic request/response models - Create
routes.pywith FastAPI router - Register router in router registry
- Update provider implementations to match new structure
Phase 3: Cleanup
- Remove
@webmethoddecorator system - Remove
get_all_api_routes()webmethod discovery - Remove
create_dynamic_typed_route()wrapper - Update all references to use router
Each phase can be a PR or all at one, hopefully we can iterate fast on the PRs if we like this approach, otherwise any new API change will require a rebase :)
💡 Why is this needed? What if we don't build it?
Nothing, we just have a half-baked OpenAPI generator that still relies on docstrings to populate most of the fields.
Other thoughts
No response