-
Notifications
You must be signed in to change notification settings - Fork 720
feat: add lora k8s deployment example #4714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThree new Kubernetes manifest files added for LoRA-enabled VLLM deployment configuration. Files define a DynamoGraphDeployment with compute services, a DynamoModel resource referencing a LoRA base model, and a Job for synchronizing LoRA models to MinIO storage. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (1)
examples/backends/vllm/deploy/lora/sync-lora-job.yaml (1)
29-31: Consider parameterizing MODEL_NAME or documenting manual override pattern.Line 31 shows a commented alternative
MODEL_NAMEvalue, suggesting users may need to swap this value. If this is intended as a template, consider adding a comment at the top of the file documenting how users should customize the manifest, or explore using Kustomize overlays or Helm values for parameterization.Add a comment near the top of the file to guide users:
apiVersion: batch/v1 kind: Job metadata: name: sync-hf-lora-to-minio +# NOTE: Before applying, update the MODEL_NAME value below to match your LoRA model. spec:
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
examples/backends/vllm/deploy/lora/agg_lora.yaml(1 hunks)examples/backends/vllm/deploy/lora/lora-model.yaml(1 hunks)examples/backends/vllm/deploy/lora/sync-lora-job.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.
📚 Learning: 2025-10-24T04:21:08.751Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.
Applied to files:
examples/backends/vllm/deploy/lora/lora-model.yaml
📚 Learning: 2025-09-04T19:03:06.643Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.
Applied to files:
examples/backends/vllm/deploy/lora/agg_lora.yaml
🪛 Checkov (3.2.334)
examples/backends/vllm/deploy/lora/sync-lora-job.yaml
[medium] 1-33: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-33: Minimize the admission of root containers
(CKV_K8S_23)
🪛 GitHub Actions: Copyright Checks
examples/backends/vllm/deploy/lora/lora-model.yaml
[error] 1-1: Copyright check failed: invalid/missing header in 'examples/backends/vllm/deploy/lora/lora-model.yaml'.
examples/backends/vllm/deploy/lora/sync-lora-job.yaml
[error] 1-1: Copyright check failed: invalid/missing header in 'examples/backends/vllm/deploy/lora/sync-lora-job.yaml'.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: operator (amd64)
- GitHub Check: trtllm (amd64)
- GitHub Check: operator (arm64)
- GitHub Check: sglang (arm64)
- GitHub Check: vllm (amd64)
- GitHub Check: vllm (arm64)
- GitHub Check: Build and Test - dynamo
| - name: AWS_ACCESS_KEY_ID | ||
| value: "minioadmin" | ||
| - name: AWS_SECRET_ACCESS_KEY | ||
| value: "minioadmin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move hardcoded AWS credentials to Kubernetes secrets.
Lines 42–45 expose AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as hardcoded plaintext values in the manifest. This creates a security risk and makes credentials visible in version control. Since the manifest already uses envFromSecret (line 17) for HF tokens, apply the same pattern for AWS credentials.
Apply this diff:
envFromSecret: hf-token-secret
+ envFromSecretList:
+ - name: hf-token-secret
+ - name: minio-credentialsThen remove the hardcoded credentials:
env:
- name: DYN_LORA_ENABLED
value: "true"
- name: DYN_LORA_PATH
value: "/tmp/dynamo_loras_minio"
- name: DYN_SYSTEM_ENABLED
value: "true"
- name: DYN_SYSTEM_PORT
value: "9090"
- name: AWS_ENDPOINT
value: "http://minio:9000"
- - name: AWS_ACCESS_KEY_ID
- value: "minioadmin"
- - name: AWS_SECRET_ACCESS_KEY
- value: "minioadmin"
- name: AWS_REGION
value: "us-east-1"
- name: AWS_ALLOW_HTTP
value: "true"
- name: BUCKET_NAME
value: "my-loras"The minio-credentials secret (with AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY) will be injected via the envFromSecretList. Coordinate this with the fix in sync-lora-job.yaml.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In examples/backends/vllm/deploy/lora/agg_lora.yaml around lines 42 to 45, the
AWS credentials are hardcoded as plaintext environment variables; remove these
two entries and instead reference them via the existing secret injection pattern
by adding the secret name (minio-credentials) to the envFromSecretList so
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are supplied from that Kubernetes
Secret; after this change delete the hardcoded lines and ensure the
corresponding sync-lora-job.yaml is updated to create/expect the same
minio-credentials secret so both manifests use the same secret-based injection.
| apiVersion: nvidia.com/v1alpha1 | ||
| kind: DynamoModel | ||
| metadata: | ||
| name: codelion-recovery-lora | ||
| spec: | ||
| modelName: codelion/Qwen3-0.6B-accuracy-recovery-lora | ||
| baseModelName: Qwen/Qwen3-0.6B | ||
| modelType: lora | ||
| source: | ||
| uri: s3://my-loras/codelion/Qwen3-0.6B-accuracy-recovery-lora No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing copyright header to fix pipeline failure.
The file is missing the SPDX copyright header required by the copyright check. Compare with agg_lora.yaml (lines 1–2) for the correct format.
Apply this diff to add the copyright header:
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| apiVersion: nvidia.com/v1alpha1 | |
| kind: DynamoModel | |
| metadata: | |
| name: codelion-recovery-lora | |
| spec: | |
| modelName: codelion/Qwen3-0.6B-accuracy-recovery-lora | |
| baseModelName: Qwen/Qwen3-0.6B | |
| modelType: lora | |
| source: | |
| uri: s3://my-loras/codelion/Qwen3-0.6B-accuracy-recovery-lora | |
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| # SPDX-License-Identifier: Apache-2.0 | |
| apiVersion: nvidia.com/v1alpha1 | |
| kind: DynamoModel | |
| metadata: | |
| name: codelion-recovery-lora | |
| spec: | |
| modelName: codelion/Qwen3-0.6B-accuracy-recovery-lora | |
| baseModelName: Qwen/Qwen3-0.6B | |
| modelType: lora | |
| source: | |
| uri: s3://my-loras/codelion/Qwen3-0.6B-accuracy-recovery-lora |
🧰 Tools
🪛 GitHub Actions: Copyright Checks
[error] 1-1: Copyright check failed: invalid/missing header in 'examples/backends/vllm/deploy/lora/lora-model.yaml'.
🤖 Prompt for AI Agents
In examples/backends/vllm/deploy/lora/lora-model.yaml around lines 1 to 10, the
file is missing the required SPDX copyright header; prepend the same two-line
header used in agg_lora.yaml (lines 1–2) to the very top of this file,
preserving exact text, spacing and a trailing newline so the repository
copyright check passes.
| @@ -0,0 +1,33 @@ | |||
| apiVersion: batch/v1 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing copyright header to fix pipeline failure.
The file is missing the SPDX copyright header required by the copyright check.
Apply this diff to add the copyright header:
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
apiVersion: batch/v1
kind: Job📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| apiVersion: batch/v1 | |
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| # SPDX-License-Identifier: Apache-2.0 | |
| apiVersion: batch/v1 | |
| kind: Job |
🧰 Tools
🪛 Checkov (3.2.334)
[medium] 1-33: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-33: Minimize the admission of root containers
(CKV_K8S_23)
🪛 GitHub Actions: Copyright Checks
[error] 1-1: Copyright check failed: invalid/missing header in 'examples/backends/vllm/deploy/lora/sync-lora-job.yaml'.
🤖 Prompt for AI Agents
In examples/backends/vllm/deploy/lora/sync-lora-job.yaml around line 1, the file
is missing the required SPDX copyright header; add a copyright header as the
first lines of the file using the project standard template (replace YEAR and
OWNER accordingly), for example include an SPDX-FileCopyrightText line and an
SPDX-License-Identifier line (e.g. "SPDX-FileCopyrightText: 2025
YourOrganization" and "SPDX-License-Identifier: Apache-2.0") so the copyright
check passes.
| containers: | ||
| - name: uploader | ||
| image: python:3.10-slim | ||
| command: | ||
| - /bin/sh | ||
| - -c | ||
| - | | ||
| set -eux | ||
| pip install --no-cache-dir huggingface-hub awscli | ||
| hf download $MODEL_NAME --local-dir /tmp/lora | ||
| rm -rf /tmp/lora/.cache | ||
| aws --endpoint-url=http://minio:9000 s3 mb s3://my-loras || true | ||
| aws --endpoint-url=http://minio:9000 s3 sync /tmp/lora s3://my-loras/$MODEL_NAME | ||
| envFrom: | ||
| - secretRef: | ||
| name: hf-token-secret | ||
| env: | ||
| - name: AWS_ACCESS_KEY_ID | ||
| value: minioadmin | ||
| - name: AWS_SECRET_ACCESS_KEY | ||
| value: minioadmin | ||
| - name: MODEL_NAME | ||
| value: codelion/Qwen3-0.6B-accuracy-recovery-lora | ||
| # value: Neural-Hacker/Qwen3-Math-Reasoning-LoRA | ||
| restartPolicy: Never |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enforce security constraints and move hardcoded credentials to secrets.
Two security concerns:
-
Container runs as root without privilege escalation constraints. The python:3.10-slim image runs as root by default, and
allowPrivilegeEscalationis not explicitly disabled. This violates the CKV_K8S_20 and CKV_K8S_23 checks. -
AWS credentials hardcoded in plain text. Lines 25–28 expose
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYas hardcoded values in the manifest. This is a security and compliance risk. The pattern already usesenvFrom.secretReffor HF tokens (lines 21–23), so credentials should follow the same approach.
Apply this diff:
containers:
- name: uploader
image: python:3.10-slim
+ securityContext:
+ runAsNonRoot: true
+ runAsUser: 1000
+ allowPrivilegeEscalation: false
+ readOnlyRootFilesystem: true
command:
- /bin/sh
- -c
- |
set -eux
pip install --no-cache-dir huggingface-hub awscli
hf download $MODEL_NAME --local-dir /tmp/lora
rm -rf /tmp/lora/.cache
aws --endpoint-url=http://minio:9000 s3 mb s3://my-loras || true
aws --endpoint-url=http://minio:9000 s3 sync /tmp/lora s3://my-loras/$MODEL_NAME
+ volumeMounts:
+ - name: tmp
+ mountPath: /tmp
envFrom:
- secretRef:
name: hf-token-secret
+ - secretRef:
+ name: minio-credentials # Create this secret with AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
env:
- name: MODEL_NAME
value: codelion/Qwen3-0.6B-accuracy-recovery-lora
# value: Neural-Hacker/Qwen3-Math-Reasoning-LoRA
- - name: AWS_ACCESS_KEY_ID
- value: minioadmin
- - name: AWS_SECRET_ACCESS_KEY
- value: minioadmin
- name: AWS_REGION
value: "us-east-1"
- name: AWS_ALLOW_HTTP
value: "true"
- name: BUCKET_NAME
value: "my-loras"
+ volumes:
+ - name: tmp
+ emptyDir: {}
restartPolicy: NeverCreate a Kubernetes secret with the MinIO credentials:
kubectl create secret generic minio-credentials \
--from-literal=AWS_ACCESS_KEY_ID=minioadmin \
--from-literal=AWS_SECRET_ACCESS_KEY=minioadmin🤖 Prompt for AI Agents
In examples/backends/vllm/deploy/lora/sync-lora-job.yaml around lines 8 to 32,
the container runs as root without privilege escalation constraints and MinIO
AWS credentials are hardcoded; update the pod spec to set a securityContext for
the container (runAsNonRoot: true, allowPrivilegeEscalation: false, and set a
non-root runAsUser if needed) and remove the hardcoded
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env entries, replacing them with an
envFrom.secretRef that references a new Kubernetes secret (e.g.,
minio-credentials) created via the provided kubectl create secret generic
command so credentials are not in plain text.
Overview:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.