-
Notifications
You must be signed in to change notification settings - Fork 124
RHAIENG-1512: fix(make/test): PyTorch ROCm 6.3 tests #2597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughROCm target bumped from rocm6.2.4/rocm6.4 to rocm6.3 across PyTorch/TorchVision pyproject deps, lockfiles (wheel URLs & hashes), test constraints, build-arg base image refs, and a couple of notebook imagestream ROCm version fields. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
jupyter/rocm/pytorch/ubi9-python-3.12/pyproject.toml(2 hunks)runtimes/rocm-pytorch/ubi9-python-3.12/pyproject.toml(2 hunks)tests/test_main.py(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Code static analysis
tests/test_main.py
[error] 67-67: Version of torch==2.7.1+rocm6.3 in pyproject.toml does not match version='2.7.1+rocm6.2.4' in pylock.toml.
[error] 67-67: Version of torch==2.7.1+rocm6.3 in pyproject.toml does not match version='2.7.1+rocm6.2.4' in pylock.toml.
runtimes/rocm-pytorch/ubi9-python-3.12/pyproject.toml
[error] 1-1: pyproject.toml specifies torch==2.7.1+rocm6.3, but pylock.toml contains torch==2.7.1+rocm6.2.4; inconsistent version between pyproject and pylock.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build (rocm-runtime-pytorch-ubi9-python-3.12, 3.12, linux/amd64, false) / build
- GitHub Check: build (rocm-jupyter-pytorch-ubi9-python-3.12, 3.12, linux/amd64, false) / build
- GitHub Check: Red Hat Konflux / odh-workbench-jupyter-pytorch-rocm-py312-ubi9-on-pull-request
- GitHub Check: Red Hat Konflux / odh-pipeline-runtime-pytorch-rocm-py312-ubi9-on-pull-request
🔇 Additional comments (3)
tests/test_main.py (1)
280-281: LGTM! Test exceptions correctly updated for ROCm 6.3.The test exceptions have been properly updated to allow
torch==2.7.1+rocm6.3andtorchvision==0.22.1+rocm6.3alongside the CUDA variants. This aligns with the dependency updates in the pyproject.toml files.Note: These tests will pass once the
pylock.tomlfiles are regenerated with the updated ROCm versions.jupyter/rocm/pytorch/ubi9-python-3.12/pyproject.toml (2)
9-10: Consistent version updates across ROCm images.These changes mirror the updates in
runtimes/rocm-pytorch/ubi9-python-3.12/pyproject.toml, maintaining consistency across both ROCm PyTorch images.As noted in the review of the runtimes file, the corresponding
jupyter/rocm/pytorch/ubi9-python-3.12/pylock.tomlmust also be regenerated usinguv lock.
62-64: LGTM! Index URL correctly updated.The PyTorch ROCm wheel index URL has been updated consistently with the dependency versions.
The Dockerfile verification suggested for the runtimes file applies here as well (Line 63 comment mentions keeping this in sync with the Dockerfile).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
jupyter/minimal/ubi9-python-3.12/build-args/rocm.conf(1 hunks)jupyter/rocm/pytorch/ubi9-python-3.12/build-args/rocm.conf(1 hunks)jupyter/rocm/pytorch/ubi9-python-3.12/pylock.toml(1 hunks)jupyter/rocm/pytorch/ubi9-python-3.12/pyproject.toml(2 hunks)manifests/base/jupyter-rocm-minimal-notebook-imagestream.yaml(1 hunks)manifests/base/jupyter-rocm-pytorch-notebook-imagestream.yaml(1 hunks)runtimes/rocm-pytorch/ubi9-python-3.12/pylock.toml(1 hunks)runtimes/rocm-pytorch/ubi9-python-3.12/pyproject.toml(2 hunks)tests/test_main.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- jupyter/rocm/pytorch/ubi9-python-3.12/build-args/rocm.conf
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_main.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: Red Hat Konflux / odh-workbench-jupyter-minimal-rocm-py312-ubi9-on-pull-request
- GitHub Check: Red Hat Konflux / odh-pipeline-runtime-pytorch-rocm-py312-ubi9-on-pull-request
- GitHub Check: Red Hat Konflux / odh-workbench-jupyter-minimal-cuda-py312-ubi9-on-pull-request
- GitHub Check: Red Hat Konflux / odh-workbench-jupyter-pytorch-rocm-py312-ubi9-on-pull-request
- GitHub Check: Red Hat Konflux / odh-workbench-jupyter-minimal-cpu-py312-ubi9-on-pull-request
- GitHub Check: build (cuda-jupyter-minimal-ubi9-python-3.12, 3.12, linux/arm64, false) / build
- GitHub Check: build (rocm-jupyter-minimal-ubi9-python-3.12, 3.12, linux/amd64, false) / build
- GitHub Check: build (jupyter-minimal-ubi9-python-3.12, 3.12, linux/amd64, false) / build
- GitHub Check: build (rocm-runtime-pytorch-ubi9-python-3.12, 3.12, linux/amd64, false) / build
- GitHub Check: build (rocm-jupyter-pytorch-ubi9-python-3.12, 3.12, linux/amd64, false) / build
- GitHub Check: build (cuda-jupyter-minimal-ubi9-python-3.12, 3.12, linux/amd64, false) / build
- GitHub Check: build (jupyter-minimal-ubi9-python-3.12, 3.12, linux/ppc64le, false) / build
- GitHub Check: build (jupyter-minimal-ubi9-python-3.12, 3.12, linux/s390x, false) / build
- GitHub Check: code-static-analysis
- GitHub Check: validation-of-sw-versions-in-imagestreams
- GitHub Check: pytest-tests
🔇 Additional comments (11)
manifests/base/jupyter-rocm-pytorch-notebook-imagestream.yaml (1)
23-23: LGTM! ROCm version corrected from v6.4 to v6.3.This change aligns with the PR objective to fix PyTorch ROCm 6.3 tests and is consistent with updates across other configuration files in this PR.
runtimes/rocm-pytorch/ubi9-python-3.12/pyproject.toml (2)
9-10: LGTM! PyTorch ROCm dependencies correctly updated to 6.3.The torch and torchvision dependencies have been correctly updated from
rocm6.2.4torocm6.3, and past reviews confirmed the correspondingpylock.tomlwas regenerated successfully.
63-66: No Dockerfile updates required—sync maintained through parameterized base image.The pyproject.toml update to rocm6.3 is correctly in sync. The
runtimes/rocm-pytorch/ubi9-python-3.12/Dockerfile.rocmuses a parameterizedBASE_IMAGEargument rather than hardcoding the ROCm version. The corresponding base image atbase-images/rocm/6.3/ubi9-python-3.12/Dockerfile.rocmalready definesROCM_VERSION=6.3.4, ensuring alignment without requiring additional Dockerfile edits.jupyter/minimal/ubi9-python-3.12/build-args/rocm.conf (1)
1-1: LGTM! BASE_IMAGE updated to ROCm 6.3.The base image tag has been correctly updated from
v6.2tov6.3, consistent with the ROCm version updates throughout this PR.runtimes/rocm-pytorch/ubi9-python-3.12/pylock.toml (3)
3667-3690: Lockfile consistency verified with pyproject.toml.The
pyproject.tomlcorrectly specifiestorch==2.7.1+rocm6.3andtorchvision==0.22.1+rocm6.3with the pytorch-rocm index pointing tohttps://download.pytorch.org/whl/rocm6.3. Thepylock.tomlpins and wheel URLs match these constraints exactly with no version conflicts. The lockfile updates are properly aligned and mechanically correct.Per the PR merge criteria,
make testshould have been run before requesting review to validate these wheels; this verification step lies outside static code analysis.
3681-3688: All torchvision wheels verified as accessible on PyTorch CDN.Verification confirms all 6 torchvision 0.22.1+rocm6.3 wheels (cp39, cp310, cp311, cp312, cp313, cp313t) return HTTP 200 status and are properly hosted on download.pytorch.org/whl/rocm6.3/. The wheel URLs follow the correct naming convention and are consistent with the rocm6.3 release. No accessibility or URL integrity issues detected.
3669-3676: Wheels verified as accessible from official PyTorch CDN.All torch 2.7.1+rocm6.3 and torchvision 0.22.1+rocm6.3 wheels in the lockfile are confirmed accessible from the official PyTorch CDN (HTTP 200). The version strings and URLs are internally consistent, the hash format is correct, and all required Python variants (cp39–cp313 including free-threaded) are present. The lockfile updates are valid.
jupyter/rocm/pytorch/ubi9-python-3.12/pyproject.toml (2)
9-10: Version bumps are consistent and properly coordinated.The ROCm version is updated uniformly across both dependencies (torch and torchvision) and the corresponding wheel index URL, maintaining consistency across the configuration.
Also applies to: 64-64
1-70: Verify thatmake testwas run and passed.The PR objectives require running
make testbefore requesting review, but the PR description does not include test results or confirmation. Please verify that the test suite passes with ROCm 6.3 and document the results (or link to CI logs if available). This is especially important for version bumps in dependencies like PyTorch that can affect downstream functionality.jupyter/rocm/pytorch/ubi9-python-3.12/pylock.toml (2)
4258-4266: All torchvision updates verified and consistent.The torchvision package at version
0.22.1+rocm6.3is correctly paired withtorch==2.7.1+rocm6.3. Wheel coverage matches across both packages (6 wheels: cp39, cp310, cp311, cp312, cp313, cp313t), and no conflicting torchvision versions exist elsewhere in the rocm variant. Local version identifiers (+rocm6.3) satisfy PyPI's base version requirements per PEP 440.
4246-4254: All wheel URLs are accessible and lockfile changes are consistent with pyproject.toml specifications.Verification confirms:
✅ pyproject.toml consistency: Both
jupyter/rocm/pytorch/ubi9-python-3.12/pyproject.tomlandruntimes/rocm-pytorch/ubi9-python-3.12/pyproject.tomlcorrectly specifytorch==2.7.1+rocm6.3andtorchvision==0.22.1+rocm6.3with PyTorch index URL pointing tohttps://download.pytorch.org/whl/rocm6.3.✅ Lockfile regeneration: Both pylock.toml files (jupyter and runtimes variants) consistently reflect their respective pyproject.toml specifications with version
2.7.1+rocm6.3.✅ Wheel availability: PyTorch server confirmed HTTP/2 200 responses for both torch and torchvision cp312 wheels, verifying wheels are accessible at the specified URLs.
✅ Cross-file consistency: Related lockfiles across parallel directory structures (jupyter/rocm/pytorch and runtimes/rocm-pytorch) maintain consistency.
The changes are correct and properly maintained across the codebase.
| opendatahub.io/notebook-software: | | ||
| [ | ||
| {"name": "ROCm", "version": "6.2"}, | ||
| {"name": "ROCm", "version": "6.3"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Version string format inconsistency across imagestream files.
This file uses "6.3" without a "v" prefix, while manifests/base/jupyter-rocm-pytorch-notebook-imagestream.yaml line 23 uses "v6.3" with the prefix. Consider standardizing the version string format across all imagestream manifests for consistency.
Run the following script to check version formatting consistency across imagestream files:
🏁 Script executed:
#!/bin/bash
# Check ROCm version string format in all imagestream manifests
rg -n '"ROCm".*"version".*"[v]?[0-9]\.[0-9]' manifests/base/ -A 0 -B 0Length of output: 687
Standardize ROCm version string format across imagestream manifests.
The version strings are inconsistent: jupyter-rocm-pytorch-notebook-imagestream.yaml and jupyter-rocm-tensorflow-notebook-imagestream.yaml use "v6.3" and "v6.2" (with "v" prefix), while jupyter-rocm-minimal-notebook-imagestream.yaml uses "6.3" and "6.2" (without prefix). Update lines 23 and 45 in manifests/base/jupyter-rocm-minimal-notebook-imagestream.yaml to use the "v" prefix for consistency across all imagestream files.
🤖 Prompt for AI Agents
In manifests/base/jupyter-rocm-minimal-notebook-imagestream.yaml around lines 23
and 45 the ROCm version strings lack the "v" prefix and must be standardized to
match other imagestreams; update the values at those lines from "6.3" and "6.2"
to "v6.3" and "v6.2" respectively so all ROCm version strings use the "v"
prefix.
|
@jiridanek: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: daniellutz, ide-developer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
https://issues.redhat.com/browse/RHAIENG-1504
Description
How Has This Been Tested?
Self checklist (all need to be checked):
make test(gmakeon macOS) before asking for reviewDockerfile.konfluxfiles should be done inodh/notebooksand automatically synced torhds/notebooks. For Konflux-specific changes, modifyDockerfile.konfluxfiles directly inrhds/notebooksas these require special attention in the downstream repository and flow to the upcoming RHOAI release.Merge criteria:
Summary by CodeRabbit