FastAPI Router Migration Proposal

### 🚀 Describe the new functionality needed

While working on https://github.com/llamastack/llama-stack/pull/3944 and diving into the current generator I've realized how non-pytonic the current approach was. 
We can certainly push thing even "code-first" to get an OpenAPI spec generated from code via FastAPI with the help of Pydantic.

## Overview

Migrate from custom `@webmethod` decorators to standard FastAPI [routers](https://fastapi.tiangolo.com/reference/apirouter/#fastapi.APIRouter), splitting each API into three files:
- `service.py` - Protocol definitions (no decorators, just type hints)
- `models.py` - Pydantic request/response models with Field descriptions
- `routes.py` - FastAPI router with route decorators

## Benefits

- **Simpler OpenAPI generation**: Direct FastAPI schema generation, no introspection needed (~800 lines vs 1232)
- **Standard FastAPI patterns**: Use router decorators for metadata (summary, description, deprecation)
- **Better type safety**: Pydantic Field descriptions instead of docstring parsing (see all the `:param` stuff)
- **Cleaner code**: Separation of concerns (protocol, models, routes)

## Current State

- ~20 APIs using `@webmethod` decorators in protocol files
- Route discovery via `get_all_api_routes()` introspecting webmethod decorators
- OpenAPI generator introspects function signatures and docstrings - very cumbersome
- The YAML generation is not helping either (special handling for docstring on multiple lines, need for scalar operator during the generation) but YAML is a different story
- Server uses `create_dynamic_typed_route()` to wrap protocol methods
- We maintain our own "routers" implementation in src/llama_stack/schema_utils.py - most of it is being offered by FastAPI routers

## Proposed Structure

### Example: Batches API

**`batches_service.py`** - Protocol only (the contract), no decorators:
```python
@runtime_checkable
class BatchService(Protocol):
    """The Batches API enables efficient processing of multiple requests in a single operation,
    particularly useful for processing large datasets, batch evaluation workflows, and
    cost-effective inference at scale.
    The API is designed to allow use of openai client libraries for seamless integration.
    This API provides the following extensions:
     - idempotent batch creation
    Note: This API is currently under active development and may undergo changes.
    """

    async def create_batch(
        self,
        input_file_id: str,
        endpoint: str,
        completion_window: Literal["24h"],
        metadata: dict[str, str] | None = None,
        idempotency_key: str | None = None,
    ) -> BatchObject:
        """Create a new batch for processing multiple API requests."""
        ...
```

**`models.py`** - Pydantic models with Field descriptions - I'm keep `@json_schema_type`intentionally for now:
```python
@json_schema_type
class CreateBatchRequest(BaseModel):
    """Request model for creating a batch."""

    input_file_id: str = Field(..., description="The ID of an uploaded file containing requests for the batch.")
    endpoint: str = Field(..., description="The endpoint to be used for all requests in the batch.")
    completion_window: Literal["24h"] = Field(
        ..., description="The time window within which the batch should be processed."
    )
    metadata: dict[str, str] | None = Field(default=None, description="Optional metadata for the batch.")
    idempotency_key: str | None = Field(
        default=None, description="Optional idempotency key. When provided, enables idempotent behavior."
    )
```

**`routes.py`** - FastAPI router with decorators:
```python
router = APIRouter(
    prefix=f"/{LLAMA_STACK_API_V1}",
    tags=["Batches"],
    responses=standard_responses,
)

@router.post(
    "/batches",
    response_model=BatchObject,
    summary="Create a new batch for processing multiple API requests.",
    description="Create a new batch for processing multiple API requests.",
)
async def create_batch(
    request: CreateBatchRequest = Body(...),
    svc: BatchService = Depends(get_batch_service),
) -> BatchObject:
    """Create a new batch."""
    return await svc.create_batch(
        input_file_id=request.input_file_id,
        endpoint=request.endpoint,
        completion_window=request.completion_window,
        metadata=request.metadata,
        idempotency_key=request.idempotency_key,
    )
```

## Infrastructure Needed

1. **Router utilities** (`core/server/router_utils.py`):
   - `standard_responses` dict for common error responses
   - Helper functions for dependency injection

2. **Router registration** (`core/server/routers.py`):
   - `register_router(Api, router_factory)` function
   - Router discovery mechanism
   - `get_all_routers()` to replace `get_all_api_routes()`

3. **Server integration**:
   - Update `server.py` to include routers instead of webmethod routes
   - Dependency injection for service implementations

4. **Schema utilities**:
   - Keep `@json_schema_type` decorator from `llama_stack.schema_utils` if beneficial for OpenAPI schema generation
   - Evaluate if it adds value over standard Pydantic model generation (controls top-level component registration)

I already have a complete implementation ready at https://github.com/leseb/llama-stack/tree/router to be looked at. So the server starts and I could validate a lot of the integration tests.

## Migration Tasks

### Phase 1: Infrastructure
- Create `router_utils.py` with standard responses
- Create router registration system
- Update `fastapi_generator.py` to read FastAPI routers directly
- Simplify generator (remove webmethod introspection code)

### Phase 2: API Migration
For each API (batches, models, inference, etc.):
1. Extract Protocol to `service.py` (remove `@webmethod` decorators)
2. Create `models.py` with Pydantic request/response models
3. Create `routes.py` with FastAPI router
4. Register router in router registry
5. Update provider implementations to match new structure

### Phase 3: Cleanup
- Remove `@webmethod` decorator system
- Remove `get_all_api_routes()` webmethod discovery
- Remove `create_dynamic_typed_route()` wrapper
- Update all references to use router

Each phase can be a PR or all at one, hopefully we can iterate fast on the PRs if we like this approach, otherwise any new API change will require a rebase :)

### 💡 Why is this needed? What if we don't build it?

Nothing, we just have a half-baked OpenAPI generator that still relies on docstrings to populate most of the fields.

### Other thoughts

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FastAPI Router Migration Proposal #4084

🚀 Describe the new functionality needed

Overview

Benefits

Current State

Proposed Structure

Example: Batches API

Infrastructure Needed

Migration Tasks

Phase 1: Infrastructure

Phase 2: API Migration

Phase 3: Cleanup

💡 Why is this needed? What if we don't build it?

Other thoughts

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FastAPI Router Migration Proposal #4084

Description

🚀 Describe the new functionality needed

Overview

Benefits

Current State

Proposed Structure

Example: Batches API

Infrastructure Needed

Migration Tasks

Phase 1: Infrastructure

Phase 2: API Migration

Phase 3: Cleanup

💡 Why is this needed? What if we don't build it?

Other thoughts

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions