Raise XPU tolerances for bf16 ResNet & BotNet TorchBench (#170552)

generatedunixname499836121 · tbohutyn · meta-codesync[bot] · commit f6727daeb7d8 · 2025-12-18T00:06:36.000-08:00
Summary: Multiple TorchBench models on XPU fail accuracy tests due to numeric tolerance being too strict rather. Two contributing factors identified: 1. Measurement methodology change (PyTorch 2.6.0 enforcing cosine_similarity https://github.com/pytorch/pytorch/blob/main/benchmarks/dynamo/common.py#L2227) surfaced limitations and increased sensitivity in error checks for phlippe_resnet. 2. BatchNorm decomposition noise (~1e-5 RMSE per BN in fp16) accumulates through the iteration in botnet26t_256, pushing aggregate diffs beyond current thresholds. **Analysis** - phlippe_resnet failures reproduce across CPU and XPU; fp16 already uses higher tolerance, implying bf16 thresholds are misaligned. - Disabling BN decomposition brings botnet26t_256 outputs within tolerance; with decomposition enabled, cumulative numeric error is expected. - CI health indicates changes are non-disruptive; failures, where present, are unrelated to these PRs. Fixes intel/torch-xpu-ops#1799 Fixes intel/torch-xpu-ops#1305 X-link: pytorch/pytorch#170552 Approved by: https://github.com/EikanWang, https://github.com/desertfire Reviewed By: seemethere Differential Revision: D89434646 fbshipit-source-id: e5ce062b497201158578abb1bdebaac4b593dbfd Co-authored-by: Tomasz Bohutyn <tbohutyn@habana.ai>
diff --git a/userbenchmark/dynamo/dynamobench/timm_models.py b/userbenchmark/dynamo/dynamobench/timm_models.py
@@ -71,6 +71,10 @@ def pip_install(package):
     "mobilenetv3_large_100",
 }
 
+REQUIRE_HIGHER_TOLERANCE_FP16_XPU = {
+    "botnet26t_256",
+}
+
 REQUIRE_HIGHER_TOLERANCE_AMP = {}
 
 REQUIRE_EVEN_HIGHER_TOLERANCE = {
@@ -366,6 +370,12 @@ def get_tolerance_and_cosine_flag(self, is_training, current_device, name):
                 self.args.amp and name in REQUIRE_HIGHER_TOLERANCE_AMP
             ):
                 tolerance = 4 * 1e-2
+            elif (
+                name in REQUIRE_HIGHER_TOLERANCE_FP16_XPU
+                and self.args.float16
+                and current_device == "xpu"
+            ):
+                tolerance = 4 * 1e-2
             else:
                 tolerance = 1e-2
         return tolerance, cosine
diff --git a/userbenchmark/dynamo/dynamobench/torchbench.yaml b/userbenchmark/dynamo/dynamobench/torchbench.yaml
@@ -52,6 +52,7 @@ tolerance:
   # These models need higher tolerance for xpu devices with bf16
   higher_bf16_xpu:
     - squeezenet1_1
+    - phlippe_resnet
 
   freezing:
     # Similar logic to timm_models.py:get_tolerance_and_cosine_flag