Skip to content

Conversation

@Aphoh
Copy link
Contributor

@Aphoh Aphoh commented Dec 3, 2025

Overview:

Adds onto #4617, cleaning up some logic that was put in for doing custom vllm lora routes. It's substantially cleaner with the engine-wildcard routes. @biswapanda

Summary by CodeRabbit

  • Refactor
    • Restructured LoRA (Low-Rank Adaptation) management to use internal engine routes exclusively, removing previous public API endpoint exposure.
    • Streamlined system status server by eliminating LoRA management features, including public endpoint declarations, request/response data structures, and associated route handlers.

✏️ Tip: You can customize this high-level summary in your review settings.

Aphoh and others added 7 commits December 2, 2025 21:15
Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>
Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>
Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>
Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>
Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>
Signed-off-by: William Arnold <warnold@nvidia.com>
Signed-off-by: William Arnold <warnold@nvidia.com>
@Aphoh Aphoh requested review from a team as code owners December 3, 2025 05:24
@Aphoh Aphoh changed the title Clean up lora endpoints feat: Clean up lora endpoints Dec 3, 2025
@github-actions github-actions bot added the feat label Dec 3, 2025
@Aphoh
Copy link
Contributor Author

Aphoh commented Dec 3, 2025

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Walkthrough

The PR consolidates LoRA management within the vLLM engine by transitioning LoRA endpoints from public system status server routes to internal engine routes, while removing corresponding public data structures and handlers from the system status server.

Changes

Cohort / File(s) Summary
Engine route registration for LoRA management
components/src/dynamo/vllm/main.py
Removed public endpoint declarations for load_lora, unload_lora, and list_loras from decode worker initialization. Added internal engine-route registrations under /engine/load_lora, /engine/unload_lora, and /engine/list_loras with async handler wrappers forwarding to existing handler methods.
LoRA removal from system status server
lib/runtime/src/system_status_server.rs
Removed entire LoRA management feature set: public request/response structures (LoadLoraRequest, LoraSource, LoraResponse), all route registrations (GET/POST for /v1/loras and DELETE for /v1/loras/{*lora_name}), in-process LoRA endpoint call logic, and JSON parsing helpers. Simplified router composition by eliminating conditional LoRA route setup.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Route registration logic: Verify that the internal engine-route handlers correctly forward to existing LoRA methods and maintain backward compatibility with calling patterns
  • Struct removal impacts: Confirm that removing LoraRequest, LoraSource, and LoraResponse does not break any external callers or downstream dependencies
  • Error handling: Ensure the new async handlers in main.py provide appropriate error responses matching the old response format

Poem

🐰 The routes hop inward, neat and tidy,
LoRA management moves to engine's heart,
From public squares to private chambers,
A cleaner API, sleek and light—
Thump thump goes the refactored code! 🌟

Pre-merge checks

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The description provides an overview and links to a related issue (#4617), but omits the 'Details' and 'Where should the reviewer start?' sections specified in the template. Add 'Details' section describing the specific changes (LoRA endpoint removal, route refactoring) and 'Where should the reviewer start?' section highlighting key files.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Clean up lora endpoints' directly reflects the main change—removing and refactoring LoRA management endpoints from multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b39f8b4 and 8f725cf.

📒 Files selected for processing (2)
  • components/src/dynamo/vllm/main.py (1 hunks)
  • lib/runtime/src/system_status_server.rs (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 3095
File: lib/llm/src/kv_router/indexer.rs:0-0
Timestamp: 2025-09-17T20:55:06.333Z
Learning: When PeaBrane encounters a complex implementation issue that would significantly expand PR scope (like the remove_worker_sender method in lib/llm/src/kv_router/indexer.rs that required thread-safe map updates and proper shard targeting), they prefer to remove the problematic implementation entirely rather than rush a partial fix, deferring the proper solution to a future PR.
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 4644
File: lib/llm/src/lora/source.rs:313-323
Timestamp: 2025-11-26T22:06:18.789Z
Learning: In the Dynamo LoRA caching system (lib/llm/src/lora/source.rs), the local LoRA cache directory is considered ephemeral with S3 being the authoritative upstream source. If the local cache is removed or corrupted, it will be automatically rehydrated from S3, making complex three-step atomic swap patterns unnecessary for cache updates.
📚 Learning: 2025-10-07T20:32:27.578Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3266
File: lib/llm/src/http/service/metrics.rs:833-854
Timestamp: 2025-10-07T20:32:27.578Z
Learning: In Axum 0.6+, routers with different state types can be merged successfully using .merge(). Each router maintains its own state internally, so a Router<Arc<MetricsHandlerState>> can be merged with a Router<Arc<service_v2::State>> without compilation issues.

Applied to files:

  • lib/runtime/src/system_status_server.rs
🔇 Additional comments (3)
components/src/dynamo/vllm/main.py (1)

529-553: LoRA engine routes cleanly wrapped around decode worker handler

The three engine route handlers correctly forward the JSON body into the corresponding DecodeWorkerHandler async methods and return the first yielded result, with a simple structured fallback when nothing is produced. Registering "load_lora", "unload_lora", and "list_loras" as engine routes here keeps the LoRA control plane nicely scoped to the decode worker and aligns with the /engine/* wildcard routing on the system status server.

lib/runtime/src/system_status_server.rs (2)

15-19: New axum imports match engine wildcard routing usage

The additions of extract::Path and routing::{any, get} are consistent with the /engine/{*path} wildcard route and engine_route_handler usage below; no issues spotted.


121-157: */engine/{path} wildcard routing correctly centralizes engine routes

The router directs all /engine/{*path} requests through any({ move |path, body| engine_route_handler(state, path, body) }), which cleanly reuses the existing engine_route_handler for all HTTP methods. This simplifies composition and ensures engine routes like "load_lora", "unload_lora", and "list_loras" are reached via their tail segment.

All register_engine_route callers in components/src/dynamo/vllm/main.py and components/src/dynamo/sglang/main.py correctly register keys matching the {*path} tail (e.g., "load_lora" rather than "/engine/load_lora"). The engine_route_handler correctly infers path as Path<String> and body as Bytes for the any handler, and Axum 0.8.4 properly supports the {*path} syntax with braces.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Aphoh
Copy link
Contributor Author

Aphoh commented Dec 3, 2025

@coderabbitai what happens to the DYN_* environment variables?

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for DYN_LORA_ENABLED usage in the codebase
echo "=== Searching for DYN_LORA_ENABLED usage ==="
rg -n "DYN_LORA_ENABLED" -C 3

echo -e "\n=== Searching for other DYN_* environment variable patterns ==="
rg -n "DYN_[A-Z_]+" -C 2

echo -e "\n=== Checking the files changed in this PR ==="
git diff --name-only origin/warnold/sglang-profile...HEAD

echo -e "\n=== Checking the diff for lib/runtime/src/system_status_server.rs ==="
git diff origin/warnold/sglang-profile...HEAD -- lib/runtime/src/system_status_server.rs | head -200

Length of output: 263003


🏁 Script executed:

#!/bin/bash
# Check the current state of DYN_LORA_ENABLED usage in Python
echo "=== Checking DYN_LORA_ENABLED usage in Python handlers ==="
cat components/src/dynamo/vllm/handlers.py | grep -A 10 -B 5 "DYN_LORA_ENABLED"

echo -e "\n=== Checking DYN_LORA_PATH usage ==="
rg -n "DYN_LORA_PATH" components/src/dynamo/

echo -e "\n=== Checking if env var definitions still exist ==="
rg -n "DYN_LORA" lib/runtime/src/config/environment_names.rs

echo -e "\n=== Checking system_status_server.rs for any DYN_LORA references ==="
rg -n "DYN_LORA" lib/runtime/src/system_status_server.rs || echo "No DYN_LORA references found in system_status_server.rs"

echo -e "\n=== Checking system_status_server.rs for LoRA-related code ==="
rg -n -i "lora" lib/runtime/src/system_status_server.rs || echo "No LoRA references found in system_status_server.rs"

Length of output: 2419


@Aphoh The DYN_* environment variables (specifically DYN_LORA_ENABLED and DYN_LORA_PATH) remain fully functional and unchanged.

What Changed

This PR moved LoRA management from Rust routes (in system_status_server.rs) to Python engine-wildcard routes (in components/src/dynamo/vllm/main.py), but the environment variables themselves are unaffected:

  • DYN_LORA_ENABLED: Still enables/disables LoRA support (set to true, 1, or yes to enable)
  • DYN_LORA_PATH: Still configures the cache directory for downloaded LoRA adapters

How They Work Now

  1. When DYN_LORA_ENABLED=true is set, the Python handler initializes a LoRAManager singleton

  2. The LoRA routes are now registered as internal engine routes:

    • /engine/load_lora
    • /engine/unload_lora
    • /engine/list_loras
  3. These routes use the same LoRAManager that checks the environment variables

User Impact

No changes needed for users who already set these environment variables. They continue to work exactly as before, just with a cleaner implementation under the hood.


🧠 Learnings used
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.

Signed-off-by: William Arnold <warnold@nvidia.com>
Signed-off-by: William Arnold <warnold@nvidia.com>
Signed-off-by: William Arnold <warnold@nvidia.com>
Base automatically changed from warnold/sglang-profile to main December 5, 2025 18:28
@pull-request-size pull-request-size bot removed the size/L label Dec 5, 2025
@Aphoh
Copy link
Contributor Author

Aphoh commented Dec 5, 2025

Base got merged, leaving for later

@Aphoh Aphoh closed this Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants