Refine error_handle for BatchLinearAlgebra Ops #2321

CuiYifeng · 2025-11-10T13:04:27Z

This PR is to remove work-around for a functionality regression in oneMKL 2025.2 and fix the error handling code that has never been executed with 2025.2.

Copilot

Pull request overview

This PR addresses error handling in BatchLinearAlgebra operations by removing a workaround for a oneMKL 2025.2 regression and fixing indexing bugs in the error handling code. The changes enable proper exception handling that was previously non-functional.

Key Changes:

Removed temporary skip list entries for inverse operation tests that were disabled due to oneMKL regression
Fixed incorrect indexing in error handling code to use exception IDs instead of loop indices

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
test/xpu/skip_list_common.py	Removed 8 test cases from skip list that were temporarily disabled due to oneMKL regression
src/ATen/native/xpu/mkl/BatchLinearAlgebra.cpp	Removed workaround code block and fixed indexing bug in error handlers to use ids[i] instead of i

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-01T08:50:17Z

src/ATen/native/xpu/mkl/BatchLinearAlgebra.cpp

          "\nDetail: ",
          e.detail());
-      info_cpu[i] = e.info();
+      info_cpu[ids[i]] = e.info();


Potential out-of-bounds access: ids[i] is used to index info_cpu, but there's no validation that ids[i] is within the bounds of info_cpu. This could cause a segfault or memory corruption if the batch exception contains an invalid matrix ID.

Copilot · 2025-12-01T08:50:18Z

src/ATen/native/xpu/mkl/BatchLinearAlgebra.cpp

    } catch (const sycl::exception& e) {
      TORCH_WARN("Caught SYCL exception:\nWhat: ", e.what(), "\nInfo: -1");
-      info_cpu[i] = -1;
+      info_cpu[ids[i]] = -1;


Potential out-of-bounds access: Same issue as above - ids[i] is used to index info_cpu without bounds checking. Consider adding validation that ids[i] < info_cpu.size() before indexing.

github-actions · 2025-12-22T04:52:18Z

Performance outliers, please check!

🟡 [80%, 90%), may be fluctuations

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
huggingface_float16_training	MT5ForConditionalGeneration	0.904954	0.882927

CuiYifeng · 2025-12-31T01:58:39Z

Fix the following cases:

        # Exception is temporarily unavailable due to regression in oneMKL
        "test_inv_errors_and_warnings_xpu_float32",
        "test_inv_errors_and_warnings_xpu_float64",
        "test_inverse_errors_large_xpu_float32",
        "test_inverse_errors_large_xpu_float64",
        "test_inverse_errors_xpu_float32",
        "test_inverse_errors_xpu_float64",
        "test_inv_ex_singular_xpu_float32",
        "test_inv_ex_singular_xpu_float64",

github-actions · 2025-12-31T05:41:38Z

Performance outliers, please check!

🟡 [80%, 90%), may be fluctuations

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
torchbench_bfloat16_training	dcgan	0.844164	0.827297
torchbench_bfloat16_training	resnet18	0.934244	0.835603
torchbench_bfloat16_training	resnext50_32x4d	0.952235	0.853000
torchbench_bfloat16_training	densenet121	0.804804	0.858987
torchbench_bfloat16_training	mnasnet1_0	1.021068	0.869143
torchbench_bfloat16_training	squeezenet1_1	1.009145	0.880740

github-actions · 2026-01-06T04:45:44Z

Performance outliers, please check!

🟡 [80%, 90%), may be fluctuations

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
torchbench_bfloat16_training	resnet18	0.902028	0.805277

github-actions · 2026-01-07T05:28:42Z

Performance outliers, please check!

🔴 [-1, 80%), should be regression

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
torchbench_bfloat16_training	densenet121	0.783036	0.80363

🟡 [80%, 90%), may be fluctuations

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
torchbench_bfloat16_training	dcgan	0.850267	0.801978
torchbench_bfloat16_training	mobilenet_v3_large	0.976237	0.825791
torchbench_bfloat16_training	resnet18	0.921155	0.839705
torchbench_bfloat16_training	squeezenet1_1	1.023057	0.886896
huggingface_bfloat16_training	MT5ForConditionalGeneration	0.915300	0.890694

github-actions · 2026-01-07T10:18:16Z

Performance outliers, please check!

🟡 [80%, 90%), may be fluctuations

Category	Model	Target vs. Baseline [Eager]	Target vs. Baseline [Inductor]
torchbench_bfloat16_training	resnet18	0.964222	0.831755

CuiYifeng added this to the PT2.10 milestone Nov 10, 2025

CuiYifeng added the mkl label Nov 10, 2025

CuiYifeng marked this pull request as ready for review December 1, 2025 08:49

Copilot AI review requested due to automatic review settings December 1, 2025 08:49

Copilot AI reviewed Dec 1, 2025

View reviewed changes

CuiYifeng force-pushed the yifeng/remove_mkl_wa branch 4 times, most recently from da52643 to cc9fcc0 Compare December 11, 2025 07:40

CuiYifeng force-pushed the yifeng/remove_mkl_wa branch 2 times, most recently from c266c55 to b3909ed Compare December 22, 2025 02:49

CuiYifeng force-pushed the yifeng/remove_mkl_wa branch from b3909ed to b25f933 Compare December 24, 2025 00:58

CuiYifeng force-pushed the yifeng/remove_mkl_wa branch from b25f933 to 28a22dc Compare December 31, 2025 01:59

CuiYifeng force-pushed the yifeng/remove_mkl_wa branch from 28a22dc to 9d1a2f8 Compare January 6, 2026 03:23

CuiYifeng force-pushed the yifeng/remove_mkl_wa branch from 9d1a2f8 to 9415070 Compare January 6, 2026 05:02

Remove work-around used for 2025.2

9b9d51d

CuiYifeng force-pushed the yifeng/remove_mkl_wa branch from 9415070 to 9b9d51d Compare January 7, 2026 07:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refine error_handle for BatchLinearAlgebra Ops #2321

Refine error_handle for BatchLinearAlgebra Ops #2321

Uh oh!

CuiYifeng commented Nov 10, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 1, 2025

Uh oh!

Copilot AI Dec 1, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

CuiYifeng commented Dec 31, 2025

Uh oh!

github-actions bot commented Dec 31, 2025

Uh oh!

github-actions bot commented Jan 6, 2026

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refine error_handle for BatchLinearAlgebra Ops #2321

Are you sure you want to change the base?

Refine error_handle for BatchLinearAlgebra Ops #2321

Uh oh!

Conversation

CuiYifeng commented Nov 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 22, 2025

Performance outliers, please check!

Uh oh!

CuiYifeng commented Dec 31, 2025

Uh oh!

github-actions bot commented Dec 31, 2025

Performance outliers, please check!

Uh oh!

github-actions bot commented Jan 6, 2026

Performance outliers, please check!

Uh oh!

github-actions bot commented Jan 7, 2026

Performance outliers, please check!

Uh oh!

github-actions bot commented Jan 7, 2026

Performance outliers, please check!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants