feat: add lora k8s deployment example #4714

biswapanda · 2025-12-03T04:49:25Z

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added deployment examples for running vLLM with LoRA (Low-Rank Adaptation) support
- Includes configurations for GPU resource management, model deployment, and LoRA parameter tuning
- Added utility to synchronize LoRA models to object storage endpoints

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-03T04:53:29Z

Walkthrough

Three new Kubernetes manifest files added for LoRA-enabled VLLM deployment configuration. Files define a DynamoGraphDeployment with compute services, a DynamoModel resource referencing a LoRA base model, and a Job for synchronizing LoRA models to MinIO storage.

Changes

Cohort / File(s)	Summary
LoRA VLLM Deployment Configuration `examples/backends/vllm/deploy/lora/agg_lora.yaml`, `examples/backends/vllm/deploy/lora/lora-model.yaml`, `examples/backends/vllm/deploy/lora/sync-lora-job.yaml`	Introduces deployment manifest defining Frontend and VllmDecodeWorker services with GPU resource constraints; model resource linking Qwen base model to LoRA adapter; and Job for downloading Qwen model and syncing to MinIO with HuggingFace token authentication

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Note: Familiarity with Kubernetes custom resources (DynamoGraphDeployment, DynamoModel) and MinIO configuration patterns will facilitate review
Verify S3 endpoint URIs and credential handling (aws-credentials, hf-token-secret) match deployment environment
Confirm GPU resource limits and container environment variables (MAX_RANK, EAGER_ENFORCE) align with intended LoRA configuration

Poem

🐰 Three manifests hopping through the VLLM gates,
LoRA adapters tuned to decode fates,
MinIO buckets fill with model grace,
Qwen whispers through the GPU space! 🚀✨

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description contains only template placeholders with no actual content filled in for any required sections (Overview, Details, Where should the reviewer start, Related Issues).	Fill in all required sections with meaningful content: provide an overview of the LoRA deployment example, describe the three YAML manifests added, guide reviewers to key files, and reference the actual GitHub issue being closed.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: adding Kubernetes deployment examples for LoRA with VLLM, clearly summarizing the primary objective of the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

examples/backends/vllm/deploy/lora/sync-lora-job.yaml (1)
29-31: Consider parameterizing MODEL_NAME or documenting manual override pattern.

Line 31 shows a commented alternative MODEL_NAME value, suggesting users may need to swap this value. If this is intended as a template, consider adding a comment at the top of the file documenting how users should customize the manifest, or explore using Kustomize overlays or Helm values for parameterization.

Add a comment near the top of the file to guide users:
 apiVersion: batch/v1
 kind: Job
 metadata:
   name: sync-hf-lora-to-minio
+# NOTE: Before applying, update the MODEL_NAME value below to match your LoRA model.
 spec:

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c655585 and 7ae79f6.

📒 Files selected for processing (3)

examples/backends/vllm/deploy/lora/agg_lora.yaml (1 hunks)
examples/backends/vllm/deploy/lora/lora-model.yaml (1 hunks)
examples/backends/vllm/deploy/lora/sync-lora-job.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.

📚 Learning: 2025-10-24T04:21:08.751Z

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.

Applied to files:

examples/backends/vllm/deploy/lora/lora-model.yaml

📚 Learning: 2025-09-04T19:03:06.643Z

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.

Applied to files:

examples/backends/vllm/deploy/lora/agg_lora.yaml

🪛 Checkov (3.2.334)

examples/backends/vllm/deploy/lora/sync-lora-job.yaml

[medium] 1-33: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

[medium] 1-33: Minimize the admission of root containers

(CKV_K8S_23)

🪛 GitHub Actions: Copyright Checks

examples/backends/vllm/deploy/lora/lora-model.yaml

[error] 1-1: Copyright check failed: invalid/missing header in 'examples/backends/vllm/deploy/lora/lora-model.yaml'.

examples/backends/vllm/deploy/lora/sync-lora-job.yaml

[error] 1-1: Copyright check failed: invalid/missing header in 'examples/backends/vllm/deploy/lora/sync-lora-job.yaml'.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: operator (amd64)
GitHub Check: trtllm (amd64)
GitHub Check: operator (arm64)
GitHub Check: sglang (arm64)
GitHub Check: vllm (amd64)
GitHub Check: vllm (arm64)
GitHub Check: Build and Test - dynamo

coderabbitai · 2025-12-03T04:53:33Z

examples/backends/vllm/deploy/lora/agg_lora.yaml

+            - name: AWS_ACCESS_KEY_ID
+              value: "minioadmin"
+            - name: AWS_SECRET_ACCESS_KEY
+              value: "minioadmin"


⚠️ Potential issue | 🟠 Major

Move hardcoded AWS credentials to Kubernetes secrets.

Lines 42–45 expose AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as hardcoded plaintext values in the manifest. This creates a security risk and makes credentials visible in version control. Since the manifest already uses envFromSecret (line 17) for HF tokens, apply the same pattern for AWS credentials.

Apply this diff:

envFromSecret: hf-token-secret + envFromSecretList: + - name: hf-token-secret + - name: minio-credentials

Then remove the hardcoded credentials:

env: - name: DYN_LORA_ENABLED value: "true" - name: DYN_LORA_PATH value: "/tmp/dynamo_loras_minio" - name: DYN_SYSTEM_ENABLED value: "true" - name: DYN_SYSTEM_PORT value: "9090" - name: AWS_ENDPOINT value: "http://minio:9000" - - name: AWS_ACCESS_KEY_ID - value: "minioadmin" - - name: AWS_SECRET_ACCESS_KEY - value: "minioadmin" - name: AWS_REGION value: "us-east-1" - name: AWS_ALLOW_HTTP value: "true" - name: BUCKET_NAME value: "my-loras"

The minio-credentials secret (with AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY) will be injected via the envFromSecretList. Coordinate this with the fix in sync-lora-job.yaml.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In examples/backends/vllm/deploy/lora/agg_lora.yaml around lines 42 to 45, the AWS credentials are hardcoded as plaintext environment variables; remove these two entries and instead reference them via the existing secret injection pattern by adding the secret name (minio-credentials) to the envFromSecretList so AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are supplied from that Kubernetes Secret; after this change delete the hardcoded lines and ensure the corresponding sync-lora-job.yaml is updated to create/expect the same minio-credentials secret so both manifests use the same secret-based injection.

coderabbitai · 2025-12-03T04:53:33Z

examples/backends/vllm/deploy/lora/lora-model.yaml

+apiVersion: nvidia.com/v1alpha1
+kind: DynamoModel
+metadata:
+  name: codelion-recovery-lora
+spec:
+  modelName: codelion/Qwen3-0.6B-accuracy-recovery-lora
+  baseModelName: Qwen/Qwen3-0.6B
+  modelType: lora
+  source:
+    uri: s3://my-loras/codelion/Qwen3-0.6B-accuracy-recovery-lora


⚠️ Potential issue | 🔴 Critical

Add missing copyright header to fix pipeline failure.

The file is missing the SPDX copyright header required by the copyright check. Compare with agg_lora.yaml (lines 1–2) for the correct format.

Apply this diff to add the copyright header:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 apiVersion: nvidia.com/v1alpha1 kind: DynamoModel

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

apiVersion: nvidia.com/v1alpha1

kind: DynamoModel

metadata:

name: codelion-recovery-lora

spec:

modelName: codelion/Qwen3-0.6B-accuracy-recovery-lora

baseModelName: Qwen/Qwen3-0.6B

modelType: lora

source:

uri: s3://my-loras/codelion/Qwen3-0.6B-accuracy-recovery-lora

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

# SPDX-License-Identifier: Apache-2.0

apiVersion: nvidia.com/v1alpha1

kind: DynamoModel

metadata:

name: codelion-recovery-lora

spec:

modelName: codelion/Qwen3-0.6B-accuracy-recovery-lora

baseModelName: Qwen/Qwen3-0.6B

modelType: lora

source:

uri: s3://my-loras/codelion/Qwen3-0.6B-accuracy-recovery-lora

🧰 Tools

🪛 GitHub Actions: Copyright Checks

[error] 1-1: Copyright check failed: invalid/missing header in 'examples/backends/vllm/deploy/lora/lora-model.yaml'.

🤖 Prompt for AI Agents

In examples/backends/vllm/deploy/lora/lora-model.yaml around lines 1 to 10, the file is missing the required SPDX copyright header; prepend the same two-line header used in agg_lora.yaml (lines 1–2) to the very top of this file, preserving exact text, spacing and a trailing newline so the repository copyright check passes.

coderabbitai · 2025-12-03T04:53:33Z

examples/backends/vllm/deploy/lora/sync-lora-job.yaml

@@ -0,0 +1,33 @@
+apiVersion: batch/v1


⚠️ Potential issue | 🔴 Critical

Add missing copyright header to fix pipeline failure.

The file is missing the SPDX copyright header required by the copyright check.

Apply this diff to add the copyright header:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 apiVersion: batch/v1 kind: Job

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

apiVersion: batch/v1

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

# SPDX-License-Identifier: Apache-2.0

apiVersion: batch/v1

kind: Job

🧰 Tools

🪛 Checkov (3.2.334)

[medium] 1-33: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

[medium] 1-33: Minimize the admission of root containers

(CKV_K8S_23)

🪛 GitHub Actions: Copyright Checks

[error] 1-1: Copyright check failed: invalid/missing header in 'examples/backends/vllm/deploy/lora/sync-lora-job.yaml'.

🤖 Prompt for AI Agents

In examples/backends/vllm/deploy/lora/sync-lora-job.yaml around line 1, the file is missing the required SPDX copyright header; add a copyright header as the first lines of the file using the project standard template (replace YEAR and OWNER accordingly), for example include an SPDX-FileCopyrightText line and an SPDX-License-Identifier line (e.g. "SPDX-FileCopyrightText: 2025 YourOrganization" and "SPDX-License-Identifier: Apache-2.0") so the copyright check passes.

coderabbitai · 2025-12-03T04:53:33Z

examples/backends/vllm/deploy/lora/sync-lora-job.yaml

+      containers:
+      - name: uploader
+        image: python:3.10-slim
+        command:
+        - /bin/sh
+        - -c
+        - |
+          set -eux
+          pip install --no-cache-dir huggingface-hub awscli
+          hf download  $MODEL_NAME --local-dir /tmp/lora
+          rm -rf /tmp/lora/.cache
+          aws --endpoint-url=http://minio:9000 s3 mb s3://my-loras || true
+          aws --endpoint-url=http://minio:9000 s3 sync /tmp/lora s3://my-loras/$MODEL_NAME
+        envFrom:
+        - secretRef:
+            name: hf-token-secret
+        env:
+        - name: AWS_ACCESS_KEY_ID
+          value: minioadmin
+        - name: AWS_SECRET_ACCESS_KEY
+          value: minioadmin
+        - name: MODEL_NAME
+          value: codelion/Qwen3-0.6B-accuracy-recovery-lora
+          # value: Neural-Hacker/Qwen3-Math-Reasoning-LoRA
+      restartPolicy: Never


⚠️ Potential issue | 🟠 Major

Enforce security constraints and move hardcoded credentials to secrets.

Two security concerns:

Container runs as root without privilege escalation constraints. The python:3.10-slim image runs as root by default, and allowPrivilegeEscalation is not explicitly disabled. This violates the CKV_K8S_20 and CKV_K8S_23 checks.

AWS credentials hardcoded in plain text. Lines 25–28 expose AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as hardcoded values in the manifest. This is a security and compliance risk. The pattern already uses envFrom.secretRef for HF tokens (lines 21–23), so credentials should follow the same approach.

Apply this diff:

containers: - name: uploader image: python:3.10-slim + securityContext: + runAsNonRoot: true + runAsUser: 1000 + allowPrivilegeEscalation: false + readOnlyRootFilesystem: true command: - /bin/sh - -c - | set -eux pip install --no-cache-dir huggingface-hub awscli hf download $MODEL_NAME --local-dir /tmp/lora rm -rf /tmp/lora/.cache aws --endpoint-url=http://minio:9000 s3 mb s3://my-loras || true aws --endpoint-url=http://minio:9000 s3 sync /tmp/lora s3://my-loras/$MODEL_NAME + volumeMounts: + - name: tmp + mountPath: /tmp envFrom: - secretRef: name: hf-token-secret + - secretRef: + name: minio-credentials # Create this secret with AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY env: - name: MODEL_NAME value: codelion/Qwen3-0.6B-accuracy-recovery-lora # value: Neural-Hacker/Qwen3-Math-Reasoning-LoRA - - name: AWS_ACCESS_KEY_ID - value: minioadmin - - name: AWS_SECRET_ACCESS_KEY - value: minioadmin - name: AWS_REGION value: "us-east-1" - name: AWS_ALLOW_HTTP value: "true" - name: BUCKET_NAME value: "my-loras" + volumes: + - name: tmp + emptyDir: {} restartPolicy: Never

Create a Kubernetes secret with the MinIO credentials:

kubectl create secret generic minio-credentials \ --from-literal=AWS_ACCESS_KEY_ID=minioadmin \ --from-literal=AWS_SECRET_ACCESS_KEY=minioadmin

🤖 Prompt for AI Agents

In examples/backends/vllm/deploy/lora/sync-lora-job.yaml around lines 8 to 32, the container runs as root without privilege escalation constraints and MinIO AWS credentials are hardcoded; update the pod spec to set a securityContext for the container (runAsNonRoot: true, allowPrivilegeEscalation: false, and set a non-root runAsUser if needed) and remove the hardcoded AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env entries, replacing them with an envFrom.secretRef that references a new Kubernetes secret (e.g., minio-credentials) created via the provided kubectl create secret generic command so credentials are not in plain text.

feat: add lora deployment example

7ae79f6

biswapanda requested review from a team as code owners December 3, 2025 04:49

pull-request-size bot added the size/L label Dec 3, 2025

github-actions bot added the feat label Dec 3, 2025

coderabbitai bot reviewed Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add lora k8s deployment example #4714

feat: add lora k8s deployment example #4714

biswapanda commented Dec 3, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 3, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 3, 2025

Uh oh!

coderabbitai bot Dec 3, 2025

Uh oh!

coderabbitai bot Dec 3, 2025

Uh oh!

coderabbitai bot Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-apiVersion: batch/v1
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+apiVersion: batch/v1
+kind: Job

feat: add lora k8s deployment example #4714

Are you sure you want to change the base?

feat: add lora k8s deployment example #4714

Conversation

biswapanda commented Dec 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 3, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

biswapanda commented Dec 3, 2025 •

edited by coderabbitai bot

Loading