Backout D70075331

renganxu · facebook-github-bot · commit 29078f688dd0 · 2025-03-11T00:49:02.000-07:00
Summary: X-link: pytorch/pytorch#148824 The AOTI lowering for model 699109736 and other new models worked before D70075331, but failed after with error "RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul with transpose_mat1 1 transpose_mat2 0 m 4096 n 10 k 7936 mat1_ld 7936 mat2_ld 7936 result_ld 4096 abcType 2 computeType 68 scaleType 0" So we revert D70075331 as a workaround now. Reviewed By: chenyang78, adelesun Differential Revision: D70823254 fbshipit-source-id: f3025a7543b7b2299457f5a06091a6fbeb37dc0d
diff --git a/userbenchmark/dynamo/dynamobench/common.py b/userbenchmark/dynamo/dynamobench/common.py
@@ -3592,15 +3592,6 @@ def run(runner, args, original_dir=None):
             # some of the models do not support use_deterministic_algorithms
             torch.use_deterministic_algorithms(True)
         os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"
-        if args.only is not None and args.only in {
-            "DebertaForQuestionAnswering",
-            "RobertaForQuestionAnswering",
-            "nvidia_deeprecommender",
-            "volo_d1_224",
-        }:
-            # These seem unhappy with numerics of larger cuBLASLt workspace
-            # sizes following #145130 (due to enabling split-k?)
-            torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False
         torch.backends.cudnn.deterministic = True
         torch.backends.cudnn.allow_tf32 = False
         torch.backends.cudnn.benchmark = False