Skip to content

Conversation

@biswapanda
Copy link
Contributor

@biswapanda biswapanda commented Dec 3, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features
    • Added deployment examples for running vLLM with LoRA (Low-Rank Adaptation) support
    • Includes configurations for GPU resource management, model deployment, and LoRA parameter tuning
    • Added utility to synchronize LoRA models to object storage endpoints

✏️ Tip: You can customize this high-level summary in your review settings.

@biswapanda biswapanda requested review from a team as code owners December 3, 2025 04:49
@github-actions github-actions bot added the feat label Dec 3, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Walkthrough

Three new Kubernetes manifest files added for LoRA-enabled VLLM deployment configuration. Files define a DynamoGraphDeployment with compute services, a DynamoModel resource referencing a LoRA base model, and a Job for synchronizing LoRA models to MinIO storage.

Changes

Cohort / File(s) Summary
LoRA VLLM Deployment Configuration
examples/backends/vllm/deploy/lora/agg_lora.yaml, examples/backends/vllm/deploy/lora/lora-model.yaml, examples/backends/vllm/deploy/lora/sync-lora-job.yaml
Introduces deployment manifest defining Frontend and VllmDecodeWorker services with GPU resource constraints; model resource linking Qwen base model to LoRA adapter; and Job for downloading Qwen model and syncing to MinIO with HuggingFace token authentication

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Note: Familiarity with Kubernetes custom resources (DynamoGraphDeployment, DynamoModel) and MinIO configuration patterns will facilitate review
  • Verify S3 endpoint URIs and credential handling (aws-credentials, hf-token-secret) match deployment environment
  • Confirm GPU resource limits and container environment variables (MAX_RANK, EAGER_ENFORCE) align with intended LoRA configuration

Poem

🐰 Three manifests hopping through the VLLM gates,
LoRA adapters tuned to decode fates,
MinIO buckets fill with model grace,
Qwen whispers through the GPU space! 🚀✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description contains only template placeholders with no actual content filled in for any required sections (Overview, Details, Where should the reviewer start, Related Issues). Fill in all required sections with meaningful content: provide an overview of the LoRA deployment example, describe the three YAML manifests added, guide reviewers to key files, and reference the actual GitHub issue being closed.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding Kubernetes deployment examples for LoRA with VLLM, clearly summarizing the primary objective of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
examples/backends/vllm/deploy/lora/sync-lora-job.yaml (1)

29-31: Consider parameterizing MODEL_NAME or documenting manual override pattern.

Line 31 shows a commented alternative MODEL_NAME value, suggesting users may need to swap this value. If this is intended as a template, consider adding a comment at the top of the file documenting how users should customize the manifest, or explore using Kustomize overlays or Helm values for parameterization.

Add a comment near the top of the file to guide users:

 apiVersion: batch/v1
 kind: Job
 metadata:
   name: sync-hf-lora-to-minio
+# NOTE: Before applying, update the MODEL_NAME value below to match your LoRA model.
 spec:
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c655585 and 7ae79f6.

📒 Files selected for processing (3)
  • examples/backends/vllm/deploy/lora/agg_lora.yaml (1 hunks)
  • examples/backends/vllm/deploy/lora/lora-model.yaml (1 hunks)
  • examples/backends/vllm/deploy/lora/sync-lora-job.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.
📚 Learning: 2025-10-24T04:21:08.751Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.

Applied to files:

  • examples/backends/vllm/deploy/lora/lora-model.yaml
📚 Learning: 2025-09-04T19:03:06.643Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.

Applied to files:

  • examples/backends/vllm/deploy/lora/agg_lora.yaml
🪛 Checkov (3.2.334)
examples/backends/vllm/deploy/lora/sync-lora-job.yaml

[medium] 1-33: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-33: Minimize the admission of root containers

(CKV_K8S_23)

🪛 GitHub Actions: Copyright Checks
examples/backends/vllm/deploy/lora/lora-model.yaml

[error] 1-1: Copyright check failed: invalid/missing header in 'examples/backends/vllm/deploy/lora/lora-model.yaml'.

examples/backends/vllm/deploy/lora/sync-lora-job.yaml

[error] 1-1: Copyright check failed: invalid/missing header in 'examples/backends/vllm/deploy/lora/sync-lora-job.yaml'.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: operator (amd64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: operator (arm64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: Build and Test - dynamo

Comment on lines +42 to +45
- name: AWS_ACCESS_KEY_ID
value: "minioadmin"
- name: AWS_SECRET_ACCESS_KEY
value: "minioadmin"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Move hardcoded AWS credentials to Kubernetes secrets.

Lines 42–45 expose AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as hardcoded plaintext values in the manifest. This creates a security risk and makes credentials visible in version control. Since the manifest already uses envFromSecret (line 17) for HF tokens, apply the same pattern for AWS credentials.

Apply this diff:

       envFromSecret: hf-token-secret
+      envFromSecretList:
+        - name: hf-token-secret
+        - name: minio-credentials

Then remove the hardcoded credentials:

           env:
             - name: DYN_LORA_ENABLED
               value: "true"
             - name: DYN_LORA_PATH
               value: "/tmp/dynamo_loras_minio"
             - name: DYN_SYSTEM_ENABLED
               value: "true"
             - name: DYN_SYSTEM_PORT
               value: "9090"
             - name: AWS_ENDPOINT
               value: "http://minio:9000"
-            - name: AWS_ACCESS_KEY_ID
-              value: "minioadmin"
-            - name: AWS_SECRET_ACCESS_KEY
-              value: "minioadmin"
             - name: AWS_REGION
               value: "us-east-1"
             - name: AWS_ALLOW_HTTP
               value: "true"
             - name: BUCKET_NAME
               value: "my-loras"

The minio-credentials secret (with AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY) will be injected via the envFromSecretList. Coordinate this with the fix in sync-lora-job.yaml.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In examples/backends/vllm/deploy/lora/agg_lora.yaml around lines 42 to 45, the
AWS credentials are hardcoded as plaintext environment variables; remove these
two entries and instead reference them via the existing secret injection pattern
by adding the secret name (minio-credentials) to the envFromSecretList so
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are supplied from that Kubernetes
Secret; after this change delete the hardcoded lines and ensure the
corresponding sync-lora-job.yaml is updated to create/expect the same
minio-credentials secret so both manifests use the same secret-based injection.

Comment on lines +1 to +10
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: codelion-recovery-lora
spec:
modelName: codelion/Qwen3-0.6B-accuracy-recovery-lora
baseModelName: Qwen/Qwen3-0.6B
modelType: lora
source:
uri: s3://my-loras/codelion/Qwen3-0.6B-accuracy-recovery-lora No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add missing copyright header to fix pipeline failure.

The file is missing the SPDX copyright header required by the copyright check. Compare with agg_lora.yaml (lines 1–2) for the correct format.

Apply this diff to add the copyright header:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
 apiVersion: nvidia.com/v1alpha1
 kind: DynamoModel
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: codelion-recovery-lora
spec:
modelName: codelion/Qwen3-0.6B-accuracy-recovery-lora
baseModelName: Qwen/Qwen3-0.6B
modelType: lora
source:
uri: s3://my-loras/codelion/Qwen3-0.6B-accuracy-recovery-lora
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: codelion-recovery-lora
spec:
modelName: codelion/Qwen3-0.6B-accuracy-recovery-lora
baseModelName: Qwen/Qwen3-0.6B
modelType: lora
source:
uri: s3://my-loras/codelion/Qwen3-0.6B-accuracy-recovery-lora
🧰 Tools
🪛 GitHub Actions: Copyright Checks

[error] 1-1: Copyright check failed: invalid/missing header in 'examples/backends/vllm/deploy/lora/lora-model.yaml'.

🤖 Prompt for AI Agents
In examples/backends/vllm/deploy/lora/lora-model.yaml around lines 1 to 10, the
file is missing the required SPDX copyright header; prepend the same two-line
header used in agg_lora.yaml (lines 1–2) to the very top of this file,
preserving exact text, spacing and a trailing newline so the repository
copyright check passes.

@@ -0,0 +1,33 @@
apiVersion: batch/v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add missing copyright header to fix pipeline failure.

The file is missing the SPDX copyright header required by the copyright check.

Apply this diff to add the copyright header:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
 apiVersion: batch/v1
 kind: Job
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
apiVersion: batch/v1
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion: batch/v1
kind: Job
🧰 Tools
🪛 Checkov (3.2.334)

[medium] 1-33: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-33: Minimize the admission of root containers

(CKV_K8S_23)

🪛 GitHub Actions: Copyright Checks

[error] 1-1: Copyright check failed: invalid/missing header in 'examples/backends/vllm/deploy/lora/sync-lora-job.yaml'.

🤖 Prompt for AI Agents
In examples/backends/vllm/deploy/lora/sync-lora-job.yaml around line 1, the file
is missing the required SPDX copyright header; add a copyright header as the
first lines of the file using the project standard template (replace YEAR and
OWNER accordingly), for example include an SPDX-FileCopyrightText line and an
SPDX-License-Identifier line (e.g. "SPDX-FileCopyrightText: 2025
YourOrganization" and "SPDX-License-Identifier: Apache-2.0") so the copyright
check passes.

Comment on lines +8 to +32
containers:
- name: uploader
image: python:3.10-slim
command:
- /bin/sh
- -c
- |
set -eux
pip install --no-cache-dir huggingface-hub awscli
hf download $MODEL_NAME --local-dir /tmp/lora
rm -rf /tmp/lora/.cache
aws --endpoint-url=http://minio:9000 s3 mb s3://my-loras || true
aws --endpoint-url=http://minio:9000 s3 sync /tmp/lora s3://my-loras/$MODEL_NAME
envFrom:
- secretRef:
name: hf-token-secret
env:
- name: AWS_ACCESS_KEY_ID
value: minioadmin
- name: AWS_SECRET_ACCESS_KEY
value: minioadmin
- name: MODEL_NAME
value: codelion/Qwen3-0.6B-accuracy-recovery-lora
# value: Neural-Hacker/Qwen3-Math-Reasoning-LoRA
restartPolicy: Never
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Enforce security constraints and move hardcoded credentials to secrets.

Two security concerns:

  1. Container runs as root without privilege escalation constraints. The python:3.10-slim image runs as root by default, and allowPrivilegeEscalation is not explicitly disabled. This violates the CKV_K8S_20 and CKV_K8S_23 checks.

  2. AWS credentials hardcoded in plain text. Lines 25–28 expose AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as hardcoded values in the manifest. This is a security and compliance risk. The pattern already uses envFrom.secretRef for HF tokens (lines 21–23), so credentials should follow the same approach.

Apply this diff:

       containers:
       - name: uploader
         image: python:3.10-slim
+        securityContext:
+          runAsNonRoot: true
+          runAsUser: 1000
+          allowPrivilegeEscalation: false
+          readOnlyRootFilesystem: true
         command:
         - /bin/sh
         - -c
         - |
           set -eux
           pip install --no-cache-dir huggingface-hub awscli
           hf download  $MODEL_NAME --local-dir /tmp/lora
           rm -rf /tmp/lora/.cache
           aws --endpoint-url=http://minio:9000 s3 mb s3://my-loras || true
           aws --endpoint-url=http://minio:9000 s3 sync /tmp/lora s3://my-loras/$MODEL_NAME
+        volumeMounts:
+        - name: tmp
+          mountPath: /tmp
         envFrom:
         - secretRef:
             name: hf-token-secret
+        - secretRef:
+            name: minio-credentials  # Create this secret with AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
         env:
         - name: MODEL_NAME
           value: codelion/Qwen3-0.6B-accuracy-recovery-lora
           # value: Neural-Hacker/Qwen3-Math-Reasoning-LoRA
-        - name: AWS_ACCESS_KEY_ID
-          value: minioadmin
-        - name: AWS_SECRET_ACCESS_KEY
-          value: minioadmin
         - name: AWS_REGION
           value: "us-east-1"
         - name: AWS_ALLOW_HTTP
           value: "true"
         - name: BUCKET_NAME
           value: "my-loras"
+      volumes:
+      - name: tmp
+        emptyDir: {}
       restartPolicy: Never

Create a Kubernetes secret with the MinIO credentials:

kubectl create secret generic minio-credentials \
  --from-literal=AWS_ACCESS_KEY_ID=minioadmin \
  --from-literal=AWS_SECRET_ACCESS_KEY=minioadmin
🤖 Prompt for AI Agents
In examples/backends/vllm/deploy/lora/sync-lora-job.yaml around lines 8 to 32,
the container runs as root without privilege escalation constraints and MinIO
AWS credentials are hardcoded; update the pod spec to set a securityContext for
the container (runAsNonRoot: true, allowPrivilegeEscalation: false, and set a
non-root runAsUser if needed) and remove the hardcoded
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env entries, replacing them with an
envFrom.secretRef that references a new Kubernetes secret (e.g.,
minio-credentials) created via the provided kubectl create secret generic
command so credentials are not in plain text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants