Skip to content

FastAPI Router Migration Proposal #4084

@leseb

Description

@leseb

🚀 Describe the new functionality needed

While working on #3944 and diving into the current generator I've realized how non-pytonic the current approach was.
We can certainly push thing even "code-first" to get an OpenAPI spec generated from code via FastAPI with the help of Pydantic.

Overview

Migrate from custom @webmethod decorators to standard FastAPI routers, splitting each API into three files:

  • service.py - Protocol definitions (no decorators, just type hints)
  • models.py - Pydantic request/response models with Field descriptions
  • routes.py - FastAPI router with route decorators

Benefits

  • Simpler OpenAPI generation: Direct FastAPI schema generation, no introspection needed (~800 lines vs 1232)
  • Standard FastAPI patterns: Use router decorators for metadata (summary, description, deprecation)
  • Better type safety: Pydantic Field descriptions instead of docstring parsing (see all the :param stuff)
  • Cleaner code: Separation of concerns (protocol, models, routes)

Current State

  • ~20 APIs using @webmethod decorators in protocol files
  • Route discovery via get_all_api_routes() introspecting webmethod decorators
  • OpenAPI generator introspects function signatures and docstrings - very cumbersome
  • The YAML generation is not helping either (special handling for docstring on multiple lines, need for scalar operator during the generation) but YAML is a different story
  • Server uses create_dynamic_typed_route() to wrap protocol methods
  • We maintain our own "routers" implementation in src/llama_stack/schema_utils.py - most of it is being offered by FastAPI routers

Proposed Structure

Example: Batches API

batches_service.py - Protocol only (the contract), no decorators:

@runtime_checkable
class BatchService(Protocol):
    """The Batches API enables efficient processing of multiple requests in a single operation,
    particularly useful for processing large datasets, batch evaluation workflows, and
    cost-effective inference at scale.
    The API is designed to allow use of openai client libraries for seamless integration.
    This API provides the following extensions:
     - idempotent batch creation
    Note: This API is currently under active development and may undergo changes.
    """

    async def create_batch(
        self,
        input_file_id: str,
        endpoint: str,
        completion_window: Literal["24h"],
        metadata: dict[str, str] | None = None,
        idempotency_key: str | None = None,
    ) -> BatchObject:
        """Create a new batch for processing multiple API requests."""
        ...

models.py - Pydantic models with Field descriptions - I'm keep @json_schema_typeintentionally for now:

@json_schema_type
class CreateBatchRequest(BaseModel):
    """Request model for creating a batch."""

    input_file_id: str = Field(..., description="The ID of an uploaded file containing requests for the batch.")
    endpoint: str = Field(..., description="The endpoint to be used for all requests in the batch.")
    completion_window: Literal["24h"] = Field(
        ..., description="The time window within which the batch should be processed."
    )
    metadata: dict[str, str] | None = Field(default=None, description="Optional metadata for the batch.")
    idempotency_key: str | None = Field(
        default=None, description="Optional idempotency key. When provided, enables idempotent behavior."
    )

routes.py - FastAPI router with decorators:

router = APIRouter(
    prefix=f"/{LLAMA_STACK_API_V1}",
    tags=["Batches"],
    responses=standard_responses,
)

@router.post(
    "/batches",
    response_model=BatchObject,
    summary="Create a new batch for processing multiple API requests.",
    description="Create a new batch for processing multiple API requests.",
)
async def create_batch(
    request: CreateBatchRequest = Body(...),
    svc: BatchService = Depends(get_batch_service),
) -> BatchObject:
    """Create a new batch."""
    return await svc.create_batch(
        input_file_id=request.input_file_id,
        endpoint=request.endpoint,
        completion_window=request.completion_window,
        metadata=request.metadata,
        idempotency_key=request.idempotency_key,
    )

Infrastructure Needed

  1. Router utilities (core/server/router_utils.py):

    • standard_responses dict for common error responses
    • Helper functions for dependency injection
  2. Router registration (core/server/routers.py):

    • register_router(Api, router_factory) function
    • Router discovery mechanism
    • get_all_routers() to replace get_all_api_routes()
  3. Server integration:

    • Update server.py to include routers instead of webmethod routes
    • Dependency injection for service implementations
  4. Schema utilities:

    • Keep @json_schema_type decorator from llama_stack.schema_utils if beneficial for OpenAPI schema generation
    • Evaluate if it adds value over standard Pydantic model generation (controls top-level component registration)

I already have a complete implementation ready at https://github.com/leseb/llama-stack/tree/router to be looked at. So the server starts and I could validate a lot of the integration tests.

Migration Tasks

Phase 1: Infrastructure

  • Create router_utils.py with standard responses
  • Create router registration system
  • Update fastapi_generator.py to read FastAPI routers directly
  • Simplify generator (remove webmethod introspection code)

Phase 2: API Migration

For each API (batches, models, inference, etc.):

  1. Extract Protocol to service.py (remove @webmethod decorators)
  2. Create models.py with Pydantic request/response models
  3. Create routes.py with FastAPI router
  4. Register router in router registry
  5. Update provider implementations to match new structure

Phase 3: Cleanup

  • Remove @webmethod decorator system
  • Remove get_all_api_routes() webmethod discovery
  • Remove create_dynamic_typed_route() wrapper
  • Update all references to use router

Each phase can be a PR or all at one, hopefully we can iterate fast on the PRs if we like this approach, otherwise any new API change will require a rebase :)

💡 Why is this needed? What if we don't build it?

Nothing, we just have a half-baked OpenAPI generator that still relies on docstrings to populate most of the fields.

Other thoughts

No response

Sub-issues

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions