feat: Clean up lora endpoints #4715

Aphoh · 2025-12-03T05:24:31Z

Overview:

Adds onto #4617, cleaning up some logic that was put in for doing custom vllm lora routes. It's substantially cleaner with the engine-wildcard routes. @biswapanda

Summary by CodeRabbit

Refactor
- Restructured LoRA (Low-Rank Adaptation) management to use internal engine routes exclusively, removing previous public API endpoint exposure.
- Streamlined system status server by eliminating LoRA management features, including public endpoint declarations, request/response data structures, and associated route handlers.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>

Signed-off-by: William Arnold <warnold@nvidia.com>

Aphoh · 2025-12-03T18:43:57Z

@coderabbitai review

coderabbitai · 2025-12-03T18:44:02Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2025-12-03T18:49:40Z

Walkthrough

The PR consolidates LoRA management within the vLLM engine by transitioning LoRA endpoints from public system status server routes to internal engine routes, while removing corresponding public data structures and handlers from the system status server.

Changes

Cohort / File(s)	Summary
Engine route registration for LoRA management `components/src/dynamo/vllm/main.py`	Removed public endpoint declarations for load_lora, unload_lora, and list_loras from decode worker initialization. Added internal engine-route registrations under /engine/load_lora, /engine/unload_lora, and /engine/list_loras with async handler wrappers forwarding to existing handler methods.
LoRA removal from system status server `lib/runtime/src/system_status_server.rs`	Removed entire LoRA management feature set: public request/response structures (LoadLoraRequest, LoraSource, LoraResponse), all route registrations (GET/POST for /v1/loras and DELETE for /v1/loras/{*lora_name}), in-process LoRA endpoint call logic, and JSON parsing helpers. Simplified router composition by eliminating conditional LoRA route setup.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Route registration logic: Verify that the internal engine-route handlers correctly forward to existing LoRA methods and maintain backward compatibility with calling patterns
Struct removal impacts: Confirm that removing LoraRequest, LoraSource, and LoraResponse does not break any external callers or downstream dependencies
Error handling: Ensure the new async handlers in main.py provide appropriate error responses matching the old response format

Poem

🐰 The routes hop inward, neat and tidy,
LoRA management moves to engine's heart,
From public squares to private chambers,
A cleaner API, sleek and light—
Thump thump goes the refactored code! 🌟

Pre-merge checks

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The description provides an overview and links to a related issue (#4617), but omits the 'Details' and 'Where should the reviewer start?' sections specified in the template.	Add 'Details' section describing the specific changes (LoRA endpoint removal, route refactoring) and 'Where should the reviewer start?' section highlighting key files.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Clean up lora endpoints' directly reflects the main change—removing and refactoring LoRA management endpoints from multiple files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b39f8b4 and 8f725cf.

📒 Files selected for processing (2)

components/src/dynamo/vllm/main.py (1 hunks)
lib/runtime/src/system_status_server.rs (3 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 3095
File: lib/llm/src/kv_router/indexer.rs:0-0
Timestamp: 2025-09-17T20:55:06.333Z
Learning: When PeaBrane encounters a complex implementation issue that would significantly expand PR scope (like the remove_worker_sender method in lib/llm/src/kv_router/indexer.rs that required thread-safe map updates and proper shard targeting), they prefer to remove the problematic implementation entirely rather than rush a partial fix, deferring the proper solution to a future PR.

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 4644
File: lib/llm/src/lora/source.rs:313-323
Timestamp: 2025-11-26T22:06:18.789Z
Learning: In the Dynamo LoRA caching system (lib/llm/src/lora/source.rs), the local LoRA cache directory is considered ephemeral with S3 being the authoritative upstream source. If the local cache is removed or corrupted, it will be automatically rehydrated from S3, making complex three-step atomic swap patterns unnecessary for cache updates.

📚 Learning: 2025-10-07T20:32:27.578Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3266
File: lib/llm/src/http/service/metrics.rs:833-854
Timestamp: 2025-10-07T20:32:27.578Z
Learning: In Axum 0.6+, routers with different state types can be merged successfully using .merge(). Each router maintains its own state internally, so a Router<Arc<MetricsHandlerState>> can be merged with a Router<Arc<service_v2::State>> without compilation issues.

Applied to files:

lib/runtime/src/system_status_server.rs

🔇 Additional comments (3)

components/src/dynamo/vllm/main.py (1)

529-553: LoRA engine routes cleanly wrapped around decode worker handler

The three engine route handlers correctly forward the JSON body into the corresponding DecodeWorkerHandler async methods and return the first yielded result, with a simple structured fallback when nothing is produced. Registering "load_lora", "unload_lora", and "list_loras" as engine routes here keeps the LoRA control plane nicely scoped to the decode worker and aligns with the /engine/* wildcard routing on the system status server.

lib/runtime/src/system_status_server.rs (2)

15-19: New axum imports match engine wildcard routing usage

The additions of extract::Path and routing::{any, get} are consistent with the /engine/{*path} wildcard route and engine_route_handler usage below; no issues spotted.

121-157: */engine/{path} wildcard routing correctly centralizes engine routes

The router directs all /engine/{*path} requests through any({ move |path, body| engine_route_handler(state, path, body) }), which cleanly reuses the existing engine_route_handler for all HTTP methods. This simplifies composition and ensures engine routes like "load_lora", "unload_lora", and "list_loras" are reached via their tail segment.

All register_engine_route callers in components/src/dynamo/vllm/main.py and components/src/dynamo/sglang/main.py correctly register keys matching the {*path} tail (e.g., "load_lora" rather than "/engine/load_lora"). The engine_route_handler correctly infers path as Path<String> and body as Bytes for the any handler, and Axum 0.8.4 properly supports the {*path} syntax with braces.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Aphoh · 2025-12-03T18:58:04Z

@coderabbitai what happens to the DYN_* environment variables?

coderabbitai · 2025-12-03T18:59:09Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for DYN_LORA_ENABLED usage in the codebase
echo "=== Searching for DYN_LORA_ENABLED usage ==="
rg -n "DYN_LORA_ENABLED" -C 3

echo -e "\n=== Searching for other DYN_* environment variable patterns ==="
rg -n "DYN_[A-Z_]+" -C 2

echo -e "\n=== Checking the files changed in this PR ==="
git diff --name-only origin/warnold/sglang-profile...HEAD

echo -e "\n=== Checking the diff for lib/runtime/src/system_status_server.rs ==="
git diff origin/warnold/sglang-profile...HEAD -- lib/runtime/src/system_status_server.rs | head -200

Length of output: 263003

🏁 Script executed:

#!/bin/bash
# Check the current state of DYN_LORA_ENABLED usage in Python
echo "=== Checking DYN_LORA_ENABLED usage in Python handlers ==="
cat components/src/dynamo/vllm/handlers.py | grep -A 10 -B 5 "DYN_LORA_ENABLED"

echo -e "\n=== Checking DYN_LORA_PATH usage ==="
rg -n "DYN_LORA_PATH" components/src/dynamo/

echo -e "\n=== Checking if env var definitions still exist ==="
rg -n "DYN_LORA" lib/runtime/src/config/environment_names.rs

echo -e "\n=== Checking system_status_server.rs for any DYN_LORA references ==="
rg -n "DYN_LORA" lib/runtime/src/system_status_server.rs || echo "No DYN_LORA references found in system_status_server.rs"

echo -e "\n=== Checking system_status_server.rs for LoRA-related code ==="
rg -n -i "lora" lib/runtime/src/system_status_server.rs || echo "No LoRA references found in system_status_server.rs"

Length of output: 2419

@Aphoh The DYN_* environment variables (specifically DYN_LORA_ENABLED and DYN_LORA_PATH) remain fully functional and unchanged.

What Changed

This PR moved LoRA management from Rust routes (in system_status_server.rs) to Python engine-wildcard routes (in components/src/dynamo/vllm/main.py), but the environment variables themselves are unaffected:

DYN_LORA_ENABLED: Still enables/disables LoRA support (set to true, 1, or yes to enable)
DYN_LORA_PATH: Still configures the cache directory for downloaded LoRA adapters

How They Work Now

When DYN_LORA_ENABLED=true is set, the Python handler initializes a LoRAManager singleton
The LoRA routes are now registered as internal engine routes:
- /engine/load_lora
- /engine/unload_lora
- /engine/list_loras
These routes use the same LoRAManager that checks the environment variables

User Impact

No changes needed for users who already set these environment variables. They continue to work exactly as before, just with a cleaner implementation under the hood.

🧠 Learnings used

Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.

Signed-off-by: William Arnold <warnold@nvidia.com>

Aphoh · 2025-12-05T18:31:58Z

Base got merged, leaving for later

Aphoh and others added 7 commits December 2, 2025 21:15

Add engine routes, impl sglang's start/stop profile

501bda1

Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>

cargo fmt

8266fc6

Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>

Fix CI

3a3eb27

Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>

Fix doctest

9b5d251

Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>

Better async when calling engine route

ebb00dc

Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>

rerun CI

b39f8b4

Signed-off-by: William Arnold <warnold@nvidia.com>

Clean up lora endpoints

8f725cf

Signed-off-by: William Arnold <warnold@nvidia.com>

Aphoh requested review from a team as code owners December 3, 2025 05:24

pull-request-size bot added the size/L label Dec 3, 2025

Aphoh changed the title ~~Clean up lora endpoints~~ feat: Clean up lora endpoints Dec 3, 2025

github-actions bot added the feat label Dec 3, 2025

ishandhanani approved these changes Dec 3, 2025

View reviewed changes

Fix operator lora paths + docs

7f1220b

Signed-off-by: William Arnold <warnold@nvidia.com>

Aphoh requested a review from a team as a code owner December 5, 2025 18:04

copy-pr-bot bot temporarily deployed to GITLAB December 5, 2025 18:04 Inactive

copy-pr-bot bot temporarily deployed to GITLAB December 5, 2025 18:05 Inactive

Fix CI lint

8a69454

Signed-off-by: William Arnold <warnold@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB December 5, 2025 18:09 Inactive

copy-pr-bot bot temporarily deployed to GITLAB December 5, 2025 18:15 Inactive

Fix CI again

a369a9d

Signed-off-by: William Arnold <warnold@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB December 5, 2025 18:25 Inactive

Base automatically changed from warnold/sglang-profile to main December 5, 2025 18:28

pull-request-size bot removed the size/L label Dec 5, 2025

pull-request-size bot added the size/XXL label Dec 5, 2025

copy-pr-bot bot temporarily deployed to GITLAB December 5, 2025 18:29 Inactive

Aphoh closed this Dec 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Clean up lora endpoints #4715

feat: Clean up lora endpoints #4715

Uh oh!

Aphoh commented Dec 3, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

Aphoh commented Dec 3, 2025

Uh oh!

coderabbitai bot commented Dec 3, 2025

Uh oh!

coderabbitai bot commented Dec 3, 2025

Uh oh!

Aphoh commented Dec 3, 2025

Uh oh!

coderabbitai bot commented Dec 3, 2025

Uh oh!

Aphoh commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Clean up lora endpoints #4715

feat: Clean up lora endpoints #4715

Uh oh!

Conversation

Aphoh commented Dec 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Uh oh!

Aphoh commented Dec 3, 2025

Uh oh!

coderabbitai bot commented Dec 3, 2025

Uh oh!

coderabbitai bot commented Dec 3, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

Aphoh commented Dec 3, 2025

Uh oh!

coderabbitai bot commented Dec 3, 2025

What Changed

How They Work Now

User Impact

Uh oh!

Aphoh commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Aphoh commented Dec 3, 2025 •

edited by coderabbitai bot

Loading