[MLIR] Enable scalable vectorization for linalg.batch_matmul #172333

momchil-velikov · 2025-12-15T16:33:01Z

Also add a missing testcase for fixed size linalg.batch_matmul vectorization.

Also add a missing testcase for fixed size `linalg.batch_matmul` vectorization.

llvmbot · 2025-12-15T16:33:33Z

@llvm/pr-subscribers-mlir-linalg

@llvm/pr-subscribers-mlir

Author: Momchil Velikov (momchil-velikov)

Changes

Also add a missing testcase for fixed size linalg.batch_matmul vectorization.

Full diff: https://github.com/llvm/llvm-project/pull/172333.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp (+1)
(modified) mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir (+84)

diff --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index bb3bccdae0e14..4d7e45aa8036f 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -2640,6 +2640,7 @@ vectorizeScalableVectorPrecondition(Operation *op,
   // Cond 4: Only the following ops are supported in the
   // presence of scalable vectors
   return success(isElementwise(linalgOp) || isa<linalg::MatmulOp>(op) ||
+                 isa<linalg::BatchMatmulOp>(op) ||
                  isa<linalg::DepthwiseConv1DNwcWcOp>(op) ||
                  isa<linalg::MatvecOp>(op) || isa<linalg::Mmt4DOp>(op) ||
                  isa<linalg::BatchMmt4DOp>(op) ||
diff --git a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
index 170bae6141609..1f8762bd3b1ef 100644
--- a/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
+++ b/mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir
@@ -1725,3 +1725,87 @@ module attributes {transform.with_named_sequence} {
     transform.yield
   }
 }
+
+// -----
+
+func.func @batch_matmul(%A: memref<?x?x?xf32>, %B: memref<?x?x?xf32>, %C: memref<?x?x?xf32>) {
+  linalg.batch_matmul ins(%A, %B: memref<?x?x?xf32>, memref<?x?x?xf32>)
+                      outs(%C: memref<?x?x?xf32>)
+  return
+}
+
+// CHECK-LABEL: func.func @batch_matmul(
+// CHECK-SAME:  %[[A:.*]]: memref<?x?x?xf32>, %[[B:.*]]: memref<?x?x?xf32>, %[[C:.*]]: memref<?x?x?xf32>
+// CHECK:       %[[c0:.*]] = arith.constant 0 : index
+// CHECK:       %[[BATCH_DIM:.*]] = memref.dim %[[A]], %[[c0]] : memref<?x?x?xf32>
+// CHECK:       %[[c1:.*]] = arith.constant 1 : index
+// CHECK:       %[[M:.*]] = memref.dim %[[A]], %[[c1]] : memref<?x?x?xf32>
+// CHECK:       %[[c2:.*]] = arith.constant 2 : index
+// CHECK:       %[[N:.*]] = memref.dim %[[B]], %[[c2]] : memref<?x?x?xf32>
+// CHECK:       %[[c2_2:.*]] = arith.constant 2 : index
+// CHECK:       %[[K:.*]] = memref.dim %[[A]], %[[c2_2]] : memref<?x?x?xf32>
+// CHECK:       %[[c0_4:.*]] = arith.constant 0 : index
+// CHECK:       %[[P0:.*]] = ub.poison : f32
+// CHECK:       %[[MA:.*]] = vector.create_mask %[[BATCH_DIM]], %[[M]], %[[K]] : vector<4x8x4xi1>
+// CHECK:       %[[VA:.*]] = vector.mask %[[MA]] { vector.transfer_read %[[A]][%[[c0_4]], %[[c0_4]], %[[c0_4]]], %[[P0]] {in_bounds = [true, true, true, true], permutation_map = #{{.*}}} : memref<?x?x?xf32>, vector<4x8x16x4xf32> } : vector<4x8x4xi1> -> vector<4x8x16x4xf32>
+// CHECK:       %[[P1:.*]] = ub.poison : f32
+// CHECK:       %[[MB:.*]] = vector.create_mask %[[BATCH_DIM]], %[[K]], %[[N]] : vector<4x4x16xi1>
+// CHECK:       %[[VB:.*]] = vector.mask %[[MB]] { vector.transfer_read %[[B]][%[[c0_4]], %[[c0_4]], %[[c0_4]]], %[[P1]] {in_bounds = [true, true, true, true], permutation_map = #{{.*}}} : memref<?x?x?xf32>, vector<4x8x16x4xf32> } : vector<4x4x16xi1> -> vector<4x8x16x4xf32>
+// CHECK:       %[[P2:.*]] = ub.poison : f32
+// CHECK:       %[[MC:.*]] = vector.create_mask %[[BATCH_DIM]], %[[M]], %[[N]] : vector<4x8x16xi1>
+// CHECK:       %[[VC:.*]] = vector.mask %[[MC]] { vector.transfer_read %[[C]][%[[c0_4]], %[[c0_4]], %[[c0_4]]], %[[P2]] {in_bounds = [true, true, true]} : memref<?x?x?xf32>, vector<4x8x16xf32> } : vector<4x8x16xi1> -> vector<4x8x16xf32>
+// CHECK:       %[[MUL:.*]] = arith.mulf %[[VA]], %[[VB]] : vector<4x8x16x4xf32>
+// CHECK:       %[[MRED:.*]] = vector.create_mask %[[BATCH_DIM]], %[[M]], %[[N]], %[[K]] : vector<4x8x16x4xi1>
+// CHECK:       %[[RED:.*]] = vector.mask %[[MRED]] { vector.multi_reduction <add>, %[[MUL]], %[[VC]] [3] : vector<4x8x16x4xf32> to vector<4x8x16xf32> } : vector<4x8x16x4xi1> -> vector<4x8x16xf32>
+// CHECK:       %[[c0_5:.*]] = arith.constant 0 : index
+// CHECK:       vector.mask %[[MC]] { vector.transfer_write %[[RED]], %[[C]][%[[c0_5]], %[[c0_5]], %[[c0_5]]] {in_bounds = [true, true, true]} : vector<4x8x16xf32>, memref<?x?x?xf32> } : vector<4x8x16xi1>
+
+module attributes {transform.with_named_sequence} {
+  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
+    %matmul = transform.structured.match ops{["linalg.batch_matmul"]} in %arg1 : (!transform.any_op) -> !transform.any_op
+    transform.structured.vectorize %matmul vector_sizes [4, 8, 16, 4] : !transform.any_op
+    transform.yield
+  }
+}
+
+// -----
+
+func.func @batch_matmul_scalable(%A: memref<?x?x?xf32>, %B: memref<?x?x?xf32>, %C: memref<?x?x?xf32>) {
+  linalg.batch_matmul ins(%A, %B: memref<?x?x?xf32>, memref<?x?x?xf32>)
+                      outs(%C: memref<?x?x?xf32>)
+  return
+}
+
+// CHECK-LABEL: func.func @batch_matmul_scalable
+// CHECK-SAME:  (%[[A:.*]]: memref<?x?x?xf32>, %[[B:.*]]: memref<?x?x?xf32>, %[[C:.*]]: memref<?x?x?xf32>) {
+// CHECK:       %[[c0:.*]] = arith.constant 0 : index
+// CHECK:       %[[BATCH_DIM:.*]] = memref.dim %[[A]], %[[c0]] : memref<?x?x?xf32>
+// CHECK:       %[[c1:.*]] = arith.constant 1 : index
+// CHECK:       %[[M:.*]] = memref.dim %[[A]], %[[c1]] : memref<?x?x?xf32>
+// CHECK:       %[[c2:.*]] = arith.constant 2 : index
+// CHECK:       %[[N:.*]] = memref.dim %[[B]], %[[c2]] : memref<?x?x?xf32>
+// CHECK:       %[[c2_2:.*]] = arith.constant 2 : index
+// CHECK:       %[[K:.*]] = memref.dim %[[A]], %[[c2_2]] : memref<?x?x?xf32>
+// CHECK:       %[[c0_4:.*]] = arith.constant 0 : index
+// CHECK:       %[[P0:.*]] = ub.poison : f32
+// CHECK:       %[[MA:.*]] = vector.create_mask %[[BATCH_DIM]], %[[M]], %[[K]] : vector<4x8x4xi1>
+// CHECK:       %[[VA:.*]] = vector.mask %[[MA]] { vector.transfer_read %[[A]][%[[c0_4]], %[[c0_4]], %[[c0_4]]], %[[P0]] {in_bounds = [true, true, true, true], permutation_map = #{{.*}}} : memref<?x?x?xf32>, vector<4x8x[16]x4xf32> } : vector<4x8x4xi1> -> vector<4x8x[16]x4xf32>
+// CHECK:       %[[P1:.*]] = ub.poison : f32
+// CHECK:       %[[MB:.*]] = vector.create_mask %[[BATCH_DIM]], %[[K]], %[[N]] : vector<4x4x[16]xi1>
+// CHECK:       %[[VB:.*]] = vector.mask %[[MB]] { vector.transfer_read %[[B]][%[[c0_4]], %[[c0_4]], %[[c0_4]]], %[[P1]] {in_bounds = [true, true, true, true], permutation_map = #{{.*}}} : memref<?x?x?xf32>, vector<4x8x[16]x4xf32> } : vector<4x4x[16]xi1> -> vector<4x8x[16]x4xf32>
+// CHECK:       %[[P2:.*]] = ub.poison : f32
+// CHECK:       %[[MC:.*]] = vector.create_mask %[[BATCH_DIM]], %[[M]], %[[N]] : vector<4x8x[16]xi1>
+// CHECK:       %[[VC:.*]] = vector.mask %[[MC]] { vector.transfer_read %[[C]][%[[c0_4]], %[[c0_4]], %[[c0_4]]], %[[P2]] {in_bounds = [true, true, true]} : memref<?x?x?xf32>, vector<4x8x[16]xf32> } : vector<4x8x[16]xi1> -> vector<4x8x[16]xf32>
+// CHECK:       %[[MUL:.*]] = arith.mulf %[[VA]], %[[VB]] : vector<4x8x[16]x4xf32>
+// CHECK:       %[[MRED:.*]] = vector.create_mask %[[BATCH_DIM]], %[[M]], %[[N]], %[[K]] : vector<4x8x[16]x4xi1>
+// CHECK:       %[[RED:.*]] = vector.mask %[[MRED]] { vector.multi_reduction <add>, %[[MUL]], %[[VC]] [3] : vector<4x8x[16]x4xf32> to vector<4x8x[16]xf32> } : vector<4x8x[16]x4xi1> -> vector<4x8x[16]xf32>
+// CHECK:       %[[c0_5:.*]] = arith.constant 0 : index
+// CHECK:       vector.mask %[[MC]] { vector.transfer_write %[[RED]], %[[C]][%[[c0_5]], %[[c0_5]], %[[c0_5]]] {in_bounds = [true, true, true]} : vector<4x8x[16]xf32>, memref<?x?x?xf32> } : vector<4x8x[16]xi1>
+
+module attributes {transform.with_named_sequence} {
+  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
+    %matmul = transform.structured.match ops{["linalg.batch_matmul"]} in %arg1 : (!transform.any_op) -> !transform.any_op
+    transform.structured.vectorize %matmul vector_sizes [4, 8, [16], 4] : !transform.any_op
+    transform.yield
+  }
+}

rengolin · 2025-12-15T16:50:02Z

Looks trivial to me. @arun-thmn any feedback?

What about batch_reduce_matmul? Would that be as trivial, too?

arun-thmn · 2025-12-16T12:05:46Z

Looks trivial to me. @arun-thmn any feedback?

What about batch_reduce_matmul? Would that be as trivial, too?

+1 for batch.reduce_matmul

banach-space · 2025-12-16T17:08:58Z

Looks trivial to me. @arun-thmn any feedback?
What about batch_reduce_matmul? Would that be as trivial, too?

+1 for batch.reduce_matmul

Is that a blocker for you? I am not against it, but as a rule of thumb, we only enable Ops that we actually run. This way, we have 100% confidence that everything works (i.e. lowering all the way to LLVM).

Perhaps that's too conservative - ultimately, this is just unblocking Linalg -> Vector lowering - but this way we avoid "scalable" vectors being enabled quite high-up (Linalg) and then not working further down the lowering pipeline.

banach-space

Thanks, LGTM % formatting

Please wait for +1 from either @rengolin or @arun-thmn before landing.

banach-space · 2025-12-16T17:02:36Z

mlir/test/Dialect/Linalg/vectorization/linalg-ops.mlir

+
+// CHECK-LABEL: func.func @batch_matmul(
+// CHECK-SAME:  %[[A:.*]]: memref<?x?x?xf32>, %[[B:.*]]: memref<?x?x?xf32>, %[[C:.*]]: memref<?x?x?xf32>
+// CHECK:       %[[c0:.*]] = arith.constant 0 : index


Let's use CAPS for all LIT variables

Suggested change

// CHECK: %[[c0:.*]] = arith.constant 0 : index

// CHECK: %[[C0:.*]] = arith.constant 0 : index

[MLIR] Enable scalable vectorization for linalg.batch_matmul

de5f876

Also add a missing testcase for fixed size `linalg.batch_matmul` vectorization.

momchil-velikov requested review from Groverkss, banach-space, dcaballe and nicolasvasilache as code owners December 15, 2025 16:33

llvmbot added mlir:linalg mlir labels Dec 15, 2025

banach-space approved these changes Dec 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MLIR] Enable scalable vectorization for linalg.batch_matmul #172333

[MLIR] Enable scalable vectorization for linalg.batch_matmul #172333

momchil-velikov commented Dec 15, 2025

Uh oh!

llvmbot commented Dec 15, 2025 •

edited

Loading

Uh oh!

rengolin commented Dec 15, 2025

Uh oh!

arun-thmn commented Dec 16, 2025

Uh oh!

banach-space commented Dec 16, 2025

Uh oh!

banach-space left a comment

Uh oh!

banach-space Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	// CHECK: %[[c0:.*]] = arith.constant 0 : index
	// CHECK: %[[C0:.*]] = arith.constant 0 : index

[MLIR] Enable scalable vectorization for linalg.batch_matmul #172333

Are you sure you want to change the base?

[MLIR] Enable scalable vectorization for linalg.batch_matmul #172333

Conversation

momchil-velikov commented Dec 15, 2025

Uh oh!

llvmbot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rengolin commented Dec 15, 2025

Uh oh!

arun-thmn commented Dec 16, 2025

Uh oh!

banach-space commented Dec 16, 2025

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

banach-space Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

llvmbot commented Dec 15, 2025 •

edited

Loading