Skip to content

Conversation

@yucai-intel
Copy link
Contributor

@yucai-intel yucai-intel commented Dec 9, 2025

UT Failure Analysis
#2412
The current RuntimeError of test_nested_tensor_dense_elementwise_embedding_dim_128_xpu_float16 stems from an upstream dispatcher check that explicitly rejects the required Nested Tensor + Dense Tensor mixed-type signature, preventing our kernel from being called.

Resolution
This PR implements the high-performance op_dense_esuhm XPU kernel, enabling core elementwise binary operations (such as add and mul} for the Nested Tensor broadcasting case on XPU devices. The kernel correctly handles the [B, *, D] op [B, 1, D] geometry.

Next Step
This PR provides the necessary compute infrastructure, and its merger is a critical prerequisite for the subsequent PR pytorch/pytorch#169928 that will fix the dispatcher logic to enable full functionality.

@yucai-intel
Copy link
Contributor Author

UT result:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants