Skip to content

Commit 2f9b44d

Browse files
authored
Run Torchtitan ROCm workflow on cron schedule & push to Main branch only (#2016)
Addressing following issues in this PR- - Running Torchtitan ROCm workflow on cron schedule & only when push to Main branch. CUDA workflow will run as is. - Refactor Torchtitan test run to address older PR comment #1786 (comment)
1 parent edbf349 commit 2f9b44d

File tree

2 files changed

+12
-6
lines changed

2 files changed

+12
-6
lines changed

.github/workflows/integration_test_8gpu_features.yaml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,10 @@ permissions:
2626

2727
jobs:
2828
build-test:
29+
if: |
30+
matrix.gpu-arch-type == 'cuda' ||
31+
(matrix.gpu-arch-type == 'rocm' &&
32+
(github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'schedule'))
2933
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
3034
strategy:
3135
fail-fast: false
@@ -73,8 +77,7 @@ jobs:
7377
sudo mkdir -p "$RUNNER_TEMP/artifacts-to-be-uploaded"
7478
sudo chown -R $(id -u):$(id -g) "$RUNNER_TEMP/artifacts-to-be-uploaded"
7579
76-
export TEST_WITH_ROCM=$([[ "${{ matrix.gpu-arch-type }}" == "rocm" ]] && echo 1 || echo 0)
77-
python -m tests.integration_tests.run_tests --test_suite features $RUNNER_TEMP/artifacts-to-be-uploaded --ngpu 8
80+
python -m tests.integration_tests.run_tests --gpu_arch_type ${{ matrix.gpu-arch-type }} --test_suite features $RUNNER_TEMP/artifacts-to-be-uploaded --ngpu 8
7881
7982
rm -rf $RUNNER_TEMP/artifacts-to-be-uploaded/*/checkpoint
8083
rm -rf artifacts-to-be-uploaded/*/checkpoint

tests/integration_tests/run_tests.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,6 @@
2525
}
2626

2727

28-
TEST_WITH_ROCM = os.getenv("TEST_WITH_ROCM", "0") == "1"
29-
30-
3128
def _run_cmd(cmd):
3229
return subprocess.run([cmd], text=True, shell=True)
3330

@@ -92,7 +89,7 @@ def run_tests(args, test_list: list[OverrideDefinitions]):
9289
continue
9390

9491
# Skip the test for ROCm
95-
if TEST_WITH_ROCM and test_flavor.skip_rocm_test:
92+
if args.gpu_arch_type == "rocm" and test_flavor.skip_rocm_test:
9693
continue
9794

9895
# Check if we have enough GPUs
@@ -110,6 +107,12 @@ def main():
110107
parser.add_argument(
111108
"output_dir", help="Directory to dump results generated by tests"
112109
)
110+
parser.add_argument(
111+
"--gpu_arch_type",
112+
default="cuda",
113+
choices=["cuda", "rocm"],
114+
help="GPU architecture type. Must be specified as either 'cuda' or 'rocm'.",
115+
)
113116
parser.add_argument(
114117
"--test_suite",
115118
default="features",

0 commit comments

Comments
 (0)