generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 522
[SGLang][SageMaker][GPU] SGLang 0.5.5 Release #5450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 40 commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
a8b06f4
inital commit
sirutBuasai 2ea4c54
update sglang container and entrypoint
sirutBuasai 2b624d3
add buildspec.yaml
sirutBuasai d81a343
tmp test qwen
sirutBuasai 110edb6
Merge branch 'sgl' of https://github.com/sirutBuasai/deep-learning-co…
sirutBuasai dae1cec
Merge branch 'master' into sgl
sirutBuasai 5ea6132
revert vllm
sirutBuasai 91cf705
fix sm path
sirutBuasai ef528b4
fix sglang entrpoint
sirutBuasai 3d95345
Merge branch 'master' into sgl
sirutBuasai 62eaf27
finalize dockerfile
sirutBuasai 2031b39
add toml file
sirutBuasai 1352c62
add get job type func
sirutBuasai f803c15
use dict job type
sirutBuasai b6716a2
add sglang
sirutBuasai ca48eb4
fix target name
sirutBuasai 3234774
Merge branch 'master' into sgl
sirutBuasai c6927ad
add tests to buildspec
sirutBuasai dd97fc1
fix test runner and get framework func
sirutBuasai e24c955
add job type
sirutBuasai b4444a9
fix sanity and security tests
sirutBuasai d9bf7c1
revert run new tests
sirutBuasai 71b1182
formatting
sirutBuasai 2f86d52
fix jobtype func and add sglang general integration sagemaker dir
sirutBuasai 456bdc6
add sglang and vllm to frameworks
sirutBuasai 7309d67
add skip general types
sirutBuasai 2ed025f
fix cuda compat and entrypoint
sirutBuasai 49c31fa
Merge branch 'sgl' of https://github.com/sirutBuasai/deep-learning-co…
sirutBuasai 5637095
fix dlc container type
sirutBuasai cce1e87
install boto3
sirutBuasai 1927956
add sglang to types
sirutBuasai 8aa5c9c
sgl fix bug
sirutBuasai a95e10c
add pytest
sirutBuasai ad5e24d
add print debug
sirutBuasai c89a8f5
add conftest
sirutBuasai eb524f7
fix conftest
sirutBuasai 1c13adb
fix fixtures
sirutBuasai cd8a500
printing responses
sirutBuasai d7e0f05
fix endpoint name
sirutBuasai 481fa34
remove sm local
sirutBuasai f2a1eb0
revert sglang
sirutBuasai 4b60ba1
Merge branch 'master' into sgl
sirutBuasai 3dfcb32
revert new test structure
sirutBuasai 5d33f6e
fix syntax
sirutBuasai File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| #!/bin/bash | ||
| # Check if telemetry file exists before executing | ||
| # Execute telemetry script if it exists, suppress errors | ||
| bash /usr/local/bin/bash_telemetry.sh >/dev/null 2>&1 || true | ||
|
|
||
| if command -v nvidia-smi >/dev/null 2>&1 && command -v nvcc >/dev/null 2>&1; then | ||
| bash /usr/local/bin/start_cuda_compat.sh | ||
| fi | ||
|
|
||
| echo "Starting server" | ||
|
|
||
| PREFIX="SM_SGLANG_" | ||
| ARG_PREFIX="--" | ||
|
|
||
| ARGS=() | ||
|
|
||
| while IFS='=' read -r key value; do | ||
| arg_name=$(echo "${key#"${PREFIX}"}" | tr '[:upper:]' '[:lower:]' | tr '_' '-') | ||
|
|
||
| ARGS+=("${ARG_PREFIX}${arg_name}") | ||
| if [ -n "$value" ]; then | ||
| ARGS+=("$value") | ||
| fi | ||
| done < <(env | grep "^${PREFIX}") | ||
|
|
||
| # Add default port only if not already set | ||
| if ! [[ " ${ARGS[@]} " =~ " --port " ]]; then | ||
| ARGS+=(--port "${SM_SGLANG_PORT:-8080}") | ||
| fi | ||
|
|
||
| # Add default host only if not already set | ||
| if ! [[ " ${ARGS[@]} " =~ " --host " ]]; then | ||
| ARGS+=(--host "${SM_SGLANG_HOST:-0.0.0.0}") | ||
| fi | ||
|
|
||
| # Add default model-path only if not already set | ||
| if ! [[ " ${ARGS[@]} " =~ " --model-path " ]]; then | ||
| ARGS+=(--model-path "${SM_SGLANG_MODEL_PATH:-/opt/ml/model}") | ||
| fi | ||
|
|
||
| echo "Running command: exec python3 -m sglang.launch_server ${ARGS[@]}" | ||
| exec python3 -m sglang.launch_server "${ARGS[@]}" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| #!/bin/bash | ||
|
|
||
| verlte() { | ||
| [ "$1" = "$2" ] && return 1 || [ "$1" = "$(echo -e "$1\n$2" | sort -V | head -n1)" ] | ||
| } | ||
|
|
||
| COMPAT_FILE=/usr/local/cuda/compat/libcuda.so.1 | ||
| if [ -f $COMPAT_FILE ]; then | ||
| CUDA_COMPAT_MAX_DRIVER_VERSION=$(readlink $COMPAT_FILE | cut -d'.' -f 3-) | ||
| echo "CUDA compat package should be installed for NVIDIA driver smaller than ${CUDA_COMPAT_MAX_DRIVER_VERSION}" | ||
| NVIDIA_DRIVER_VERSION=$(sed -n 's/^NVRM.*Kernel Module *\([0-9.]*\).*$/\1/p' /proc/driver/nvidia/version 2>/dev/null || true) | ||
| if [ -z "$NVIDIA_DRIVER_VERSION" ]; then | ||
| NVIDIA_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0 2>/dev/null || true) | ||
| fi | ||
| echo "Current installed NVIDIA driver version is ${NVIDIA_DRIVER_VERSION}" | ||
| if verlte $NVIDIA_DRIVER_VERSION $CUDA_COMPAT_MAX_DRIVER_VERSION; then | ||
| echo "Adding CUDA compat to LD_LIBRARY_PATH" | ||
| export LD_LIBRARY_PATH=/usr/local/cuda/compat:$LD_LIBRARY_PATH | ||
| echo $LD_LIBRARY_PATH | ||
| else | ||
| echo "Skipping CUDA compat setup as newer NVIDIA driver is installed" | ||
| fi | ||
| else | ||
| echo "Skipping CUDA compat setup as package not found" | ||
| fi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| account_id: &ACCOUNT_ID <set-$ACCOUNT_ID-in-environment> | ||
| prod_account_id: &PROD_ACCOUNT_ID 763104351884 | ||
| region: ®ION <set-$REGION-in-environment> | ||
| framework: &FRAMEWORK sglang | ||
| version: &VERSION "0.5.5" | ||
| short_version: &SHORT_VERSION "0.5" | ||
| arch_type: &ARCH_TYPE x86_64 | ||
| autopatch_build: "False" | ||
|
|
||
| repository_info: | ||
| build_repository: &BUILD_REPOSITORY | ||
| image_type: &IMAGE_TYPE gpu | ||
| root: . | ||
| repository_name: &REPOSITORY_NAME !join [ pr, "-", *FRAMEWORK ] | ||
| repository: &REPOSITORY !join [ *ACCOUNT_ID, .dkr.ecr., *REGION, .amazonaws.com/, *REPOSITORY_NAME ] | ||
| release_repository_name: &RELEASE_REPOSITORY_NAME !join [ *FRAMEWORK ] | ||
| release_repository: &RELEASE_REPOSITORY !join [ *PROD_ACCOUNT_ID, .dkr.ecr., *REGION, .amazonaws.com/, *RELEASE_REPOSITORY_NAME ] | ||
|
|
||
| context: | ||
| build_context: &BUILD_CONTEXT | ||
| deep_learning_container: | ||
| source: src/deep_learning_container.py | ||
| target: deep_learning_container.py | ||
| install_efa: | ||
| source: scripts/install_efa.sh | ||
| target: install_efa.sh | ||
| start_cuda_compat: | ||
| source: sglang/build_artifacts/start_cuda_compat.sh | ||
| target: start_cuda_compat.sh | ||
| sagemaker_entrypoint: | ||
| source: sglang/build_artifacts/sagemaker_entrypoint.sh | ||
| target: sagemaker_entrypoint.sh | ||
|
|
||
| images: | ||
| sglang_sm: | ||
| <<: *BUILD_REPOSITORY | ||
| context: | ||
| <<: *BUILD_CONTEXT | ||
| image_size_baseline: 26000 | ||
| device_type: &DEVICE_TYPE gpu | ||
| cuda_version: &CUDA_VERSION cu129 | ||
| python_version: &DOCKER_PYTHON_VERSION py3 | ||
| tag_python_version: &TAG_PYTHON_VERSION py312 | ||
| os_version: &OS_VERSION ubuntu22.04 | ||
| tag: !join [ *VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *CUDA_VERSION, "-", *OS_VERSION, "-sagemaker" ] | ||
| latest_release_tag: !join [ *VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *CUDA_VERSION, "-", *OS_VERSION, "-sagemaker" ] | ||
| docker_file: !join [ *FRAMEWORK, /, *ARCH_TYPE, /, *DEVICE_TYPE, /Dockerfile ] | ||
| target: sglang-sagemaker | ||
| build: true | ||
| enable_common_stage_build: false | ||
| test_configs: | ||
| test_platforms: | ||
| - sanity | ||
| - security | ||
| - sagemaker |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| buildspec_pointer: buildspec-sm.yml |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,105 @@ | ||
| FROM lmsysorg/sglang:v0.5.5-cu129-amd64 AS base | ||
|
|
||
| # ==================================================== | ||
| # ====================== common ====================== | ||
| # ==================================================== | ||
|
|
||
| ARG PYTHON="python3" | ||
| ARG EFA_VERSION="1.43.3" | ||
|
|
||
| LABEL maintainer="Amazon AI" | ||
| LABEL dlc_major_version="1" | ||
|
|
||
| ENV DEBIAN_FRONTEND=noninteractive \ | ||
| LANG=C.UTF-8 \ | ||
| LC_ALL=C.UTF-8 \ | ||
| DLC_CONTAINER_TYPE=general \ | ||
| # Python won’t try to write .pyc or .pyo files on the import of source modules | ||
| # Force stdin, stdout and stderr to be totally unbuffered. Good for logging | ||
| PYTHONDONTWRITEBYTECODE=1 \ | ||
| PYTHONUNBUFFERED=1 \ | ||
| PYTHONIOENCODING=UTF-8 \ | ||
| LD_LIBRARY_PATH="/usr/local/lib:/opt/amazon/ofi-nccl/lib/x86_64-linux-gnu:/opt/amazon/openmpi/lib:/opt/amazon/efa/lib:/usr/local/cuda/lib64:${LD_LIBRARY_PATH}" \ | ||
| PATH="/opt/amazon/openmpi/bin:/opt/amazon/efa/bin:/usr/local/cuda/bin:${PATH}" | ||
|
|
||
| WORKDIR / | ||
|
|
||
| # Copy artifacts | ||
| # =============== | ||
| COPY deep_learning_container.py /usr/local/bin/deep_learning_container.py | ||
| COPY bash_telemetry.sh /usr/local/bin/bash_telemetry.sh | ||
| COPY install_efa.sh install_efa.sh | ||
| COPY start_cuda_compat.sh /usr/local/bin/start_cuda_compat.sh | ||
|
|
||
| RUN chmod +x /usr/local/bin/deep_learning_container.py \ | ||
| && chmod +x /usr/local/bin/bash_telemetry.sh \ | ||
| && chmod +x /usr/local/bin/start_cuda_compat.sh | ||
|
|
||
| # Install cuda compat | ||
| # ==================== | ||
| # RUN apt-get update \ | ||
| # && apt-get -y upgrade --only-upgrade systemd \ | ||
| # && apt-get install -y --allow-change-held-packages --no-install-recommends \ | ||
| # cuda-compat-12-9 \ | ||
| # && rm -rf /var/lib/apt/lists/* \ | ||
| # && apt-get clean | ||
|
|
||
| # Install EFA and remove vulnerable nvjpeg | ||
| # ========================================= | ||
| RUN bash install_efa.sh ${EFA_VERSION} \ | ||
| && rm install_efa.sh \ | ||
| && mkdir -p /tmp/nvjpeg \ | ||
| && cd /tmp/nvjpeg \ | ||
| # latest cu12 libnvjpeg available is cu124 | ||
| && wget https://developer.download.nvidia.com/compute/cuda/redist/libnvjpeg/linux-x86_64/libnvjpeg-linux-x86_64-12.4.0.76-archive.tar.xz \ | ||
| && tar -xvf libnvjpeg-linux-x86_64-12.4.0.76-archive.tar.xz \ | ||
| && rm -rf /usr/local/cuda/targets/x86_64-linux/lib/libnvjpeg* \ | ||
| && rm -rf /usr/local/cuda/targets/x86_64-linux/include/nvjpeg.h \ | ||
| && cp libnvjpeg-linux-x86_64-12.4.0.76-archive/lib/libnvjpeg* /usr/local/cuda/targets/x86_64-linux/lib/ \ | ||
| && cp libnvjpeg-linux-x86_64-12.4.0.76-archive/include/* /usr/local/cuda/targets/x86_64-linux/include/ \ | ||
| && rm -rf /tmp/nvjpeg \ | ||
| # create symlink for python | ||
| && rm -rf /usr/bin/python \ | ||
| && ln -s /usr/bin/python3 /usr/bin/python \ | ||
| # remove cuobjdump and nvdisasm | ||
| && rm -rf /usr/local/cuda/bin/cuobjdump* \ | ||
| && rm -rf /usr/local/cuda/bin/nvdisasm* | ||
|
|
||
| # Run OSS compliance script | ||
| # ========================== | ||
| RUN echo 'source /usr/local/bin/bash_telemetry.sh' >> /etc/bash.bashrc \ | ||
| # OSS compliance - use Python zipfile instead of unzip | ||
| && HOME_DIR=/root \ | ||
| && curl -o ${HOME_DIR}/oss_compliance.zip https://aws-dlinfra-utilities.s3.amazonaws.com/oss_compliance.zip \ | ||
| && python3 -c "import zipfile, os; zipfile.ZipFile('/root/oss_compliance.zip').extractall('/root/'); os.remove('/root/oss_compliance.zip')" \ | ||
| && cp ${HOME_DIR}/oss_compliance/test/testOSSCompliance /usr/local/bin/testOSSCompliance \ | ||
| && chmod +x /usr/local/bin/testOSSCompliance \ | ||
| && chmod +x ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh \ | ||
| && ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh ${HOME_DIR} ${PYTHON} \ | ||
| # clean up | ||
| && rm -rf ${HOME_DIR}/oss_compliance* \ | ||
| && rm -rf /tmp/tmp* \ | ||
| && rm -rf /tmp/uv* \ | ||
| && rm -rf /var/lib/apt/lists/* \ | ||
| && rm -rf /root/.cache | true | ||
|
|
||
| # ======================================================= | ||
| # ====================== sagemaker ====================== | ||
| # ======================================================= | ||
|
|
||
| FROM base AS sglang-sagemaker | ||
|
|
||
| RUN dpkg -l | grep -E "cuda|nvidia|libnv" | awk '{print $2}' | xargs apt-mark hold \ | ||
| && apt-get update \ | ||
| && apt-get upgrade -y \ | ||
| && apt-get clean | ||
|
|
||
| RUN pip install --no-cache-dir -U \ | ||
| boto3 | ||
|
|
||
| RUN rm -rf /tmp/* | ||
|
|
||
| COPY sagemaker_entrypoint.sh /usr/local/bin/sagemaker_entrypoint.sh | ||
| RUN chmod +x /usr/local/bin/sagemaker_entrypoint.sh | ||
|
|
||
| ENTRYPOINT ["/usr/local/bin/sagemaker_entrypoint.sh"] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use digest pinning / checksum verification, since this is not an Amazon controlled image.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is by design since we want to consume security patching from upstream. Pinning with a digest version will prevent our downstream image from consuming these patches. By pinning to a specific version rather than
latestwe are restricting updates on core packages and only consume security patching.Moreover, docker containers are static post-build by design. This means that after build, the base layer is hashed and will remain static until we trigger a rebuild and re-release of this particular image. This will prevent potential security vulnerabilities that may sneak its way in from upstream.
We are ingesting the base image from this vendor (https://hub.docker.com/r/lmsysorg/sglang/tags) which is a sponsored OSS vendor on Docker hub. Hope this help provide credibility that we are consuming images from a trusted source similar to how we consume our other images from cuda base container or ubuntu base containers.