Skip to content

Commit 29078f6

Browse files
renganxufacebook-github-bot
authored andcommitted
Backout D70075331
Summary: X-link: pytorch/pytorch#148824 The AOTI lowering for model 699109736 and other new models worked before D70075331, but failed after with error "RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul with transpose_mat1 1 transpose_mat2 0 m 4096 n 10 k 7936 mat1_ld 7936 mat2_ld 7936 result_ld 4096 abcType 2 computeType 68 scaleType 0" So we revert D70075331 as a workaround now. Reviewed By: chenyang78, adelesun Differential Revision: D70823254 fbshipit-source-id: f3025a7543b7b2299457f5a06091a6fbeb37dc0d
1 parent d1ba2ef commit 29078f6

File tree

1 file changed

+0
-9
lines changed

1 file changed

+0
-9
lines changed

userbenchmark/dynamo/dynamobench/common.py

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3592,15 +3592,6 @@ def run(runner, args, original_dir=None):
35923592
# some of the models do not support use_deterministic_algorithms
35933593
torch.use_deterministic_algorithms(True)
35943594
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"
3595-
if args.only is not None and args.only in {
3596-
"DebertaForQuestionAnswering",
3597-
"RobertaForQuestionAnswering",
3598-
"nvidia_deeprecommender",
3599-
"volo_d1_224",
3600-
}:
3601-
# These seem unhappy with numerics of larger cuBLASLt workspace
3602-
# sizes following #145130 (due to enabling split-k?)
3603-
torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False
36043595
torch.backends.cudnn.deterministic = True
36053596
torch.backends.cudnn.allow_tf32 = False
36063597
torch.backends.cudnn.benchmark = False

0 commit comments

Comments
 (0)