Skip to content

Conversation

@williamcaban
Copy link

@williamcaban williamcaban commented Nov 16, 2025

MLflow Prompt Registry Provider

🛑 DO NOT MERGE 🛑
This PR illustrates the intended use of PR #4166 and PR #4168 which it has as dependencies.

Summary

This PR adds a new remote MLflow provider for the Prompts API, enabling centralized prompt management and versioning using MLflow's Prompt Registry (MLflow 3.4+).

What's New

Remote Provider: remote::mlflow

A production-ready provider that integrates Llama Stack's Prompts API with MLflow's centralized prompt registry, supporting:

  • Version Control: Immutable prompt versioning with full history
  • Default Version Management: Easy version switching via aliases
  • Auto Variable Extraction: Automatic detection of {{ variable }} placeholders
  • Centralized Storage: Team collaboration via shared MLflow server
  • Metadata Preservation: Llama Stack metadata stored as MLflow tags

Key Features

1. Full CRUD Operations

# Create prompt
prompt = client.prompts.create(
    prompt="Summarize {{ text }} in {{ num_sentences }} sentences"
)

# Retrieve by ID (gets default version)
retrieved = client.prompts.get(prompt_id=prompt.prompt_id)

# Update (creates new version)
updated = client.prompts.update(
    prompt_id=prompt.prompt_id,
    prompt="Summarize in exactly {{ num_sentences }} sentences:\n\n{{ text }}",
    version=1,
    set_as_default=True
)

# List all prompts
prompts = client.prompts.list()

# List all versions of a prompt
versions = client.prompts.list_versions(prompt_id=prompt.prompt_id)

# Set default version
client.prompts.set_default_version(prompt_id=prompt.prompt_id, version=2)

2. Deterministic ID Mapping

  • Llama Stack format: pmpt_<48-hex-chars>
  • MLflow format: llama_prompt_<48-hex-chars>
  • Bidirectional conversion preserves IDs

3. Automatic Variable Extraction

# No need to specify variables - auto-extracted from template
prompt = client.prompts.create(
    prompt="You are a {{ role }} assistant. {{ instruction }}"
)
# Automatically extracts: ["role", "instruction"]

4. Configuration

prompts:
  - provider_id: mlflow-prompts
    provider_type: remote::mlflow
    config:
      mlflow_tracking_uri: http://localhost:5555
      experiment_name: llama-stack-prompts
      timeout_seconds: 30

Implementation Details

Files Added

  • Provider Implementation:

    • src/llama_stack/providers/registry/prompts.py - Provider registry
    • src/llama_stack/providers/remote/prompts/mlflow/config.py - Configuration schema
    • src/llama_stack/providers/remote/prompts/mlflow/mapping.py - ID mapping utilities
    • src/llama_stack/providers/remote/prompts/mlflow/mlflow.py - Main adapter (520 lines)
  • Documentation:

    • docs/docs/providers/prompts/remote_mlflow.mdx - Comprehensive user guide
    • tests/integration/providers/remote/prompts/mlflow/README.md - Testing guide
  • Tests:

    • Unit tests: 41 tests (config + mapping)
    • Integration tests: 14 end-to-end scenarios
    • Manual test script: Interactive validation

Testing Summary

All Tests Passing

Test Suite Count Status
Unit Tests 41/41 PASS
Integration Tests 14/14 PASS
Manual Test Script 11/12 PASS*

*One non-critical test skipped (cache stats not applicable to this provider)

Integration Test Coverage:

  • Create and retrieve prompts
  • Update prompts (version management)
  • List prompts (default versions only)
  • List all versions of a prompt
  • Set default version
  • Variable auto-extraction
  • Variable validation
  • Error handling (not found, wrong version)
  • Complex templates with multiple variables
  • Edge cases (empty templates, no variables)

Breaking Changes

None. This is a new provider addition.

Dependencies

  • Required: mlflow>=3.4.0 (added to provider pip_packages)
  • Python: 3.12+ (existing requirement)

Usage Example

1. Start MLflow Server

mlflow server --host 127.0.0.1 --port 5555

2. Configure Llama Stack

prompts:
  - provider_id: mlflow-prompts
    provider_type: remote::mlflow
    config:
      mlflow_tracking_uri: http://localhost:5555
      experiment_name: llama-stack-prompts

3. Use Prompts API

from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url="http://localhost:5000")

# Create versioned prompt
prompt = client.prompts.create(
    prompt="Translate {{ text }} to {{ language }}"
)
print(f"Created: {prompt.prompt_id} v{prompt.version}")

# Update creates new version
v2 = client.prompts.update(
    prompt_id=prompt.prompt_id,
    prompt="Translate the following text to {{ language }}:\n\n{{ text }}",
    version=1,
    set_as_default=True
)
print(f"Updated to v{v2.version}")

Review Notes

Architecture Decisions

  1. Deterministic ID Mapping: Uses hash-based mapping to ensure consistent IDs across Llama Stack and MLflow
  2. Immutable Versioning: All versions are preserved; updates create new versions
  3. Default Alias: Uses MLflow's alias feature for default version tracking
  4. Metadata Tags: Stores Llama Stack metadata as MLflow tags for discoverability

Known Limitations

  1. No Deletion: MLflow doesn't support deleting prompts - delete_prompt() raises NotImplementedError
  2. Sequential Versions: Versions are sequential integers (1, 2, 3...) - cannot be manually set
  3. Experiment Required: All prompts must be within an MLflow experiment (auto-created if missing)

Future Enhancements

  • Search/filter prompts by metadata
  • Batch operations for multiple prompts
  • Prompt templates with inheritance
  • Integration with MLflow's prompt evaluation features

Checklist

  • Implementation complete
  • Unit tests passing (41/41)
  • Integration tests passing (14/14)
  • Documentation added
  • Manual testing performed
  • No breaking changes
  • Follows existing provider patterns
  • Error handling implemented
  • Logging added
  • Configuration validated

How to Test

Quick Test

# 1. Start MLflow server
mlflow server --host 127.0.0.1 --port 5555

# 2. Run manual test script
uv run python scripts/test_mlflow_prompts_manual.py

# 3. View prompts in MLflow UI
open http://localhost:5555

Full Test Suite

# Run unit tests
uv run --group unit pytest -sv tests/unit/providers/remote/prompts/mlflow/

# Run integration tests (requires MLflow server)
MLFLOW_TRACKING_URI=http://localhost:5555 \
  uv run --group test pytest -sv tests/integration/providers/remote/prompts/mlflow/

Related Issues

This implements the Prompts API provider for MLflow Prompt Registry, enabling:

  • Centralized prompt management across teams
  • Version control for prompt templates
  • Production-ready prompt deployment workflows
  • Integration with existing MLflow infrastructure

This commit adds a new remote provider for the Prompts API that integrates
with MLflow's Prompt Registry (MLflow 3.4+), enabling centralized prompt
management and versioning.

Key features:
- Full CRUD operations (create, read, update, list)
- Immutable version control with default version management
- Automatic variable extraction from {{ variable }} placeholders
- Deterministic ID mapping between Llama Stack and MLflow
- Comprehensive test coverage (41 unit + 14 integration tests)

Provider implementation:
- src/llama_stack/providers/remote/prompts/mlflow/
  - config.py: Configuration schema with validation
  - mapping.py: ID mapping and metadata utilities
  - mlflow.py: Main adapter implementation (520 lines)
- src/llama_stack/providers/registry/prompts.py: Provider registration

Testing:
- tests/unit/providers/remote/prompts/mlflow/
  - test_config.py: 14 configuration tests
  - test_mapping.py: 27 ID mapping tests
- tests/integration/providers/remote/prompts/mlflow/
  - test_end_to_end.py: 14 end-to-end scenarios
  - conftest.py: Test fixtures and server availability checks
  - README.md: Testing documentation
- scripts/test_mlflow_prompts_manual.py: Manual validation script

Documentation:
- docs/docs/providers/prompts/remote_mlflow.mdx: User guide
- docs/docs/providers/prompts/index.mdx: Provider category index

Dependencies:
- Requires mlflow>=3.4.0 (added to provider pip_packages)

Limitations:
- No deletion support (MLflow constraint)
- Sequential version numbers only (1, 2, 3...)

Signed-off-by: William Caban <william.caban@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamcaban this is the right direction

few things jump out for me -

  • the existing impl needs to be moved to be an inline::reference impl (let the llm know it's in src/llama_stack/core/prompts/prompts.py)
  • mlflow credential handling needs to be added, should follow the pattern in inference (provider-data backstopped by config)
  • tests don't need to be standalone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants