generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 521
Migrate vLLM Ray Serve Container #5463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from all commits
Commits
Show all changes
63 commits
Select commit
Hold shift + click to select a range
8ea932b
try build
junpuf 50e9793
fix action
junpuf 47e7bf6
using long commit ref
junpuf f3e7416
install/update uv only if not already installed
junpuf 96d976b
update
junpuf e21334c
update
junpuf 082f67c
fix actionlint
junpuf a82924d
try inline cache
junpuf d82b4a1
fix
junpuf c7d65bc
use buildx
junpuf 09bfc63
per day cache refresh
junpuf 8a21087
update
junpuf 031a0e8
fix
junpuf df2d590
test
junpuf 2d59406
fix
junpuf 75a8f1a
try artifact
junpuf 65975f7
update docker command
junpuf ff4725e
fix command
junpuf 3dd1a99
fix command
junpuf 872029d
fix entrypoint
junpuf fadf714
update test
junpuf 557e649
fix command
junpuf 58aa567
checkout vllm
junpuf b071a75
update workflow
junpuf e362483
update
junpuf 369551b
fix
junpuf aeebfe8
try test
junpuf 18f2b64
fix typo
junpuf 9c3bc51
run basic terst
junpuf 4e43405
test
junpuf 94e16b6
use older version
junpuf d5d1ff3
check path
junpuf b137dea
partial clone
junpuf 0d8b5a5
update
junpuf f75fa37
update
junpuf 1ad77b4
update
junpuf a98f01c
update
junpuf 13a065d
refactor
junpuf b75b924
add dataset path
junpuf ff6bba4
try smart cleanup
junpuf 4357043
cleanup
junpuf 85cffdf
update
junpuf 12e2dc1
fix
junpuf ccb5a73
update script
junpuf 43c2232
enable Entrypoints Integration Test (LLM)
junpuf 15f0c89
update
junpuf e9fa11c
update
junpuf 60fd04f
update test
junpuf 56d85c1
add cleanup
junpuf 432917d
fix
junpuf e5ad9e6
update
junpuf 8e7a408
update
junpuf f5e61e3
update
junpuf 16d5f1e
update
junpuf c0a8c85
update workflow
junpuf af227b4
enable more test
junpuf 6ba7e45
update tests
junpuf c3cc99c
parallel tests
junpuf dcb9302
remove encoder decoder test
junpuf c7b284b
add hf token
junpuf 9807927
update
junpuf 11ead3b
remove push on main
junpuf 92f77d9
revert
junpuf File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,8 @@ | ||
| #!/bin/bash | ||
| set -e | ||
|
|
||
| curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR="/usr/local/bin" sh | ||
| uv self update | ||
| if ! command -v uv &> /dev/null; then | ||
| curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR="/usr/local/bin" sh | ||
| uv self update | ||
| fi | ||
| docker --version |
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,322 @@ | ||
| name: PR - vLLM RayServe | ||
|
|
||
| on: | ||
| pull_request: | ||
| branches: | ||
| - main | ||
| paths: | ||
| - "docker/**" | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| concurrency: | ||
| group: pr-${{ github.event.pull_request.number }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| check-changes: | ||
| runs-on: ubuntu-latest | ||
| outputs: | ||
| vllm-rayserve-ec2: ${{ steps.changes.outputs.vllm-rayserve-ec2 }} | ||
| steps: | ||
| - uses: actions/checkout@v5 | ||
| - uses: actions/setup-python@v6 | ||
| with: | ||
| python-version: "3.12" | ||
| - uses: pre-commit/action@v3.0.1 | ||
| with: | ||
| extra_args: --all-files | ||
| - name: Detect file changes | ||
| id: changes | ||
| uses: dorny/paths-filter@v3 | ||
| with: | ||
| filters: | | ||
| vllm-rayserve-ec2: | ||
| - "docker/vllm/Dockerfile.rayserve" | ||
|
|
||
| build-image: | ||
| needs: [check-changes] | ||
| if: needs.check-changes.outputs.vllm-rayserve-ec2 == 'true' | ||
| runs-on: | ||
| - codebuild-runner-${{ github.run_id }}-${{ github.run_attempt }} | ||
| fleet:x86-build-runner | ||
| steps: | ||
| - uses: actions/checkout@v5 | ||
| - run: .github/scripts/runner_setup.sh | ||
| - run: .github/scripts/buildkitd.sh | ||
| - name: ECR login | ||
| run: | | ||
| aws ecr get-login-password --region ${{ secrets.AWS_REGION }} | docker login --username AWS --password-stdin ${{ secrets.AWS_ACCOUNT_ID }}.dkr.ecr.${{ secrets.AWS_REGION }}.amazonaws.com | ||
|
|
||
| - name: Resolve image URI for build | ||
| run: | | ||
| IMAGE_URI=${{ secrets.AWS_ACCOUNT_ID }}.dkr.ecr.${{ secrets.AWS_REGION }}.amazonaws.com/ci:vllm-0.10.2-gpu-py312-cu128-ubuntu22.04-rayserve-ec2-pr-${{ github.event.pull_request.number }} | ||
| echo "Image URI to build: $IMAGE_URI" | ||
| echo "IMAGE_URI=$IMAGE_URI" >> $GITHUB_ENV | ||
|
|
||
| - name: Build image | ||
| run: | | ||
| docker buildx build --progress plain \ | ||
| --build-arg CACHE_REFRESH="$(date +"%Y-%m-%d")" \ | ||
| --cache-to=type=inline \ | ||
| --cache-from=type=registry,ref=$IMAGE_URI \ | ||
| --tag $IMAGE_URI \ | ||
| --target vllm-rayserve-ec2 \ | ||
| -f docker/vllm/Dockerfile.rayserve . | ||
|
|
||
| - name: Docker Push and save image URI artifact | ||
| run: | | ||
| docker push $IMAGE_URI | ||
| docker rmi $IMAGE_URI | ||
| echo $IMAGE_URI > image_uri.txt | ||
|
|
||
| - name: Upload image URI artifact | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: vllm-rayserve-ec2-image-uri | ||
| path: image_uri.txt | ||
|
|
||
| regression-test: | ||
| needs: [build-image] | ||
| if: needs.build-image.result == 'success' | ||
| runs-on: | ||
| - codebuild-runner-${{ github.run_id }}-${{ github.run_attempt }} | ||
| fleet:x86-g6xl-runner | ||
| steps: | ||
| - name: Checkout DLC source | ||
| uses: actions/checkout@v5 | ||
|
|
||
| - name: ECR login | ||
| run: | | ||
| aws ecr get-login-password --region ${{ secrets.AWS_REGION }} | docker login --username AWS --password-stdin ${{ secrets.AWS_ACCOUNT_ID }}.dkr.ecr.${{ secrets.AWS_REGION }}.amazonaws.com | ||
|
|
||
| - name: Download image URI artifact | ||
| uses: actions/download-artifact@v4 | ||
| with: | ||
| name: vllm-rayserve-ec2-image-uri | ||
|
|
||
| - name: Resolve image URI for test | ||
| run: | | ||
| IMAGE_URI=$(cat image_uri.txt) | ||
| echo "Resolved image URI: $IMAGE_URI" | ||
| echo "IMAGE_URI=$IMAGE_URI" >> $GITHUB_ENV | ||
|
|
||
| - name: Pull image | ||
| run: | | ||
| docker pull $IMAGE_URI | ||
|
|
||
| - name: Checkout vLLM Tests | ||
| uses: actions/checkout@v5 | ||
| with: | ||
| repository: vllm-project/vllm | ||
| ref: v0.10.2 | ||
| path: vllm_source | ||
|
|
||
| - name: Start container | ||
| run: | | ||
| CONTAINER_ID=$(docker run -d -it --rm --gpus=all --entrypoint /bin/bash \ | ||
| -v ${HOME}/.cache/huggingface:/root/.cache/huggingface \ | ||
| -v ${HOME}/.cache/vllm:/root/.cache/vllm \ | ||
| -v ./vllm_source:/workdir --workdir /workdir \ | ||
| -e HUGGING_FACE_HUB_TOKEN=${{ secrets.HUGGING_FACE_HUB_TOKEN }} \ | ||
| ${IMAGE_URI}) | ||
| echo "CONTAINER_ID=$CONTAINER_ID" >> $GITHUB_ENV | ||
|
|
||
| - name: Setup for vLLM Test | ||
| run: | | ||
| docker exec ${CONTAINER_ID} sh -c ' | ||
| set -eux | ||
| uv pip install --system -r requirements/common.txt -r requirements/dev.txt --torch-backend=auto | ||
| uv pip install --system pytest pytest-asyncio | ||
| uv pip install --system -e tests/vllm_test_utils | ||
| uv pip install --system hf_transfer | ||
| mkdir src | ||
| mv vllm src/vllm | ||
| ' | ||
|
|
||
| - name: Run vLLM Tests | ||
| run: | | ||
| docker exec ${CONTAINER_ID} sh -c ' | ||
| set -eux | ||
| nvidia-smi | ||
|
|
||
| # Regression Test # 7min | ||
| cd /workdir/tests | ||
| uv pip install --system modelscope | ||
| pytest -v -s test_regression.py | ||
| ' | ||
|
|
||
| - name: Cleanup container and images | ||
| if: always() | ||
| run: | | ||
| docker rm -f ${CONTAINER_ID} || true | ||
| docker image prune -a --force --filter "until=24h" | ||
| docker system df | ||
|
|
||
| cuda-test: | ||
| needs: [build-image] | ||
| if: needs.build-image.result == 'success' | ||
| runs-on: | ||
| - codebuild-runner-${{ github.run_id }}-${{ github.run_attempt }} | ||
| fleet:x86-g6xl-runner | ||
| steps: | ||
| - name: Checkout DLC source | ||
| uses: actions/checkout@v5 | ||
|
|
||
| - name: ECR login | ||
| run: | | ||
| aws ecr get-login-password --region ${{ secrets.AWS_REGION }} | docker login --username AWS --password-stdin ${{ secrets.AWS_ACCOUNT_ID }}.dkr.ecr.${{ secrets.AWS_REGION }}.amazonaws.com | ||
|
|
||
| - name: Download image URI artifact | ||
| uses: actions/download-artifact@v4 | ||
| with: | ||
| name: vllm-rayserve-ec2-image-uri | ||
|
|
||
| - name: Resolve image URI for test | ||
| run: | | ||
| IMAGE_URI=$(cat image_uri.txt) | ||
| echo "Resolved image URI: $IMAGE_URI" | ||
| echo "IMAGE_URI=$IMAGE_URI" >> $GITHUB_ENV | ||
|
|
||
| - name: Pull image | ||
| run: | | ||
| docker pull $IMAGE_URI | ||
|
|
||
| - name: Checkout vLLM Tests | ||
| uses: actions/checkout@v5 | ||
| with: | ||
| repository: vllm-project/vllm | ||
| ref: v0.10.2 | ||
| path: vllm_source | ||
|
|
||
| - name: Start container | ||
| run: | | ||
| CONTAINER_ID=$(docker run -d -it --rm --gpus=all --entrypoint /bin/bash \ | ||
| -v ${HOME}/.cache/huggingface:/root/.cache/huggingface \ | ||
| -v ${HOME}/.cache/vllm:/root/.cache/vllm \ | ||
| -v ./vllm_source:/workdir --workdir /workdir \ | ||
| -e HUGGING_FACE_HUB_TOKEN=${{ secrets.HUGGING_FACE_HUB_TOKEN }} \ | ||
| ${IMAGE_URI}) | ||
| echo "CONTAINER_ID=$CONTAINER_ID" >> $GITHUB_ENV | ||
|
|
||
| - name: Setup for vLLM Test | ||
| run: | | ||
| docker exec ${CONTAINER_ID} sh -c ' | ||
| set -eux | ||
| uv pip install --system -r requirements/common.txt -r requirements/dev.txt --torch-backend=auto | ||
| uv pip install --system pytest pytest-asyncio | ||
| uv pip install --system -e tests/vllm_test_utils | ||
| uv pip install --system hf_transfer | ||
| mkdir src | ||
| mv vllm src/vllm | ||
| ' | ||
|
|
||
| - name: Run vLLM Tests | ||
| run: | | ||
| docker exec ${CONTAINER_ID} sh -c ' | ||
| set -eux | ||
| nvidia-smi | ||
|
|
||
| # Platform Tests (CUDA) # 4min | ||
| cd /workdir/tests | ||
| pytest -v -s cuda/test_cuda_context.py | ||
| ' | ||
|
|
||
| - name: Cleanup container and images | ||
| if: always() | ||
| run: | | ||
| docker rm -f ${CONTAINER_ID} || true | ||
| docker image prune -a --force --filter "until=24h" | ||
| docker system df | ||
|
|
||
| example-test: | ||
| needs: [build-image] | ||
| if: needs.build-image.result == 'success' | ||
| runs-on: | ||
| - codebuild-runner-${{ github.run_id }}-${{ github.run_attempt }} | ||
| fleet:x86-g6xl-runner | ||
| steps: | ||
| - name: Checkout DLC source | ||
| uses: actions/checkout@v5 | ||
|
|
||
| - name: ECR login | ||
| run: | | ||
| aws ecr get-login-password --region ${{ secrets.AWS_REGION }} | docker login --username AWS --password-stdin ${{ secrets.AWS_ACCOUNT_ID }}.dkr.ecr.${{ secrets.AWS_REGION }}.amazonaws.com | ||
|
|
||
| - name: Download image URI artifact | ||
| uses: actions/download-artifact@v4 | ||
| with: | ||
| name: vllm-rayserve-ec2-image-uri | ||
|
|
||
| - name: Resolve image URI for test | ||
| run: | | ||
| IMAGE_URI=$(cat image_uri.txt) | ||
| echo "Resolved image URI: $IMAGE_URI" | ||
| echo "IMAGE_URI=$IMAGE_URI" >> $GITHUB_ENV | ||
|
|
||
| - name: Pull image | ||
| run: | | ||
| docker pull $IMAGE_URI | ||
|
|
||
| - name: Checkout vLLM Tests | ||
| uses: actions/checkout@v5 | ||
| with: | ||
| repository: vllm-project/vllm | ||
| ref: v0.10.2 | ||
| path: vllm_source | ||
|
|
||
| - name: Start container | ||
| run: | | ||
| CONTAINER_ID=$(docker run -d -it --rm --gpus=all --entrypoint /bin/bash \ | ||
| -v ${HOME}/.cache/huggingface:/root/.cache/huggingface \ | ||
| -v ${HOME}/.cache/vllm:/root/.cache/vllm \ | ||
| -v ./vllm_source:/workdir --workdir /workdir \ | ||
| -e HUGGING_FACE_HUB_TOKEN=${{ secrets.HUGGING_FACE_HUB_TOKEN }} \ | ||
| ${IMAGE_URI}) | ||
| echo "CONTAINER_ID=$CONTAINER_ID" >> $GITHUB_ENV | ||
|
|
||
| - name: Setup for vLLM Test | ||
| run: | | ||
| docker exec ${CONTAINER_ID} sh -c ' | ||
| set -eux | ||
| uv pip install --system -r requirements/common.txt -r requirements/dev.txt --torch-backend=auto | ||
| uv pip install --system pytest pytest-asyncio | ||
| uv pip install --system -e tests/vllm_test_utils | ||
| uv pip install --system hf_transfer | ||
| mkdir src | ||
| mv vllm src/vllm | ||
| ' | ||
|
|
||
| - name: Run vLLM Tests | ||
| run: | | ||
| docker exec ${CONTAINER_ID} sh -c ' | ||
| set -eux | ||
| nvidia-smi | ||
|
|
||
| # Examples Test # 30min | ||
| cd /workdir/examples | ||
| pip install tensorizer # for tensorizer test | ||
| python3 offline_inference/basic/generate.py --model facebook/opt-125m | ||
| # python3 offline_inference/basic/generate.py --model meta-llama/Llama-2-13b-chat-hf --cpu-offload-gb 10 | ||
| python3 offline_inference/basic/chat.py | ||
| python3 offline_inference/prefix_caching.py | ||
| python3 offline_inference/llm_engine_example.py | ||
| python3 offline_inference/audio_language.py --seed 0 | ||
| python3 offline_inference/vision_language.py --seed 0 | ||
| python3 offline_inference/vision_language_pooling.py --seed 0 | ||
| python3 offline_inference/vision_language_multi_image.py --seed 0 | ||
| VLLM_USE_V1=0 python3 others/tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 others/tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors | ||
| python3 offline_inference/encoder_decoder_multimodal.py --model-type whisper --seed 0 | ||
| python3 offline_inference/basic/classify.py | ||
| python3 offline_inference/basic/embed.py | ||
| python3 offline_inference/basic/score.py | ||
| VLLM_USE_V1=0 python3 offline_inference/profiling.py --model facebook/opt-125m run_num_steps --num-steps 2 | ||
| ' | ||
|
|
||
| - name: Cleanup container and images | ||
| if: always() | ||
| run: | | ||
| docker rm -f ${CONTAINER_ID} || true | ||
| docker image prune -a --force --filter "until=24h" | ||
| docker system df | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's a way to DRY these steps. These are going to be used repeatedly across multiple stages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
later we can refactor common patterns into callable workflows or other things