Skip to content
136 changes: 136 additions & 0 deletions main/acle.md
Original file line number Diff line number Diff line change
Expand Up @@ -467,6 +467,10 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin
* Added feature test macro for FEAT_CSSC.
* Added support for modal 8-bit floating point matrix multiply-accumulate widening intrinsics.
* Added support for 16-bit floating point matrix multiply-accumulate widening intrinsics.
* Added [**Alpha**](#current-status-and-anticipated-changes)
support for SVE2.2 (FEAT_SVE2p2)
* Added [**Alpha**](#current-status-and-anticipated-changes)
support for SME2.2 (FEAT_SME2p2).

### References

Expand Down Expand Up @@ -1980,6 +1984,10 @@ are available. This implies that `__ARM_FEATURE_SVE` is nonzero.
are available and if the associated [ACLE features]
(#sme-language-extensions-and-intrinsics) are supported.

`__ARM_FEATURE_SVE2p2` is defined to 1 if the FEAT_SVE2p2 instructions
are available and if the associated [ACLE features]
(#sme-language-extensions-and-intrinsics) are supported.

#### NEON-SVE Bridge macro

`__ARM_NEON_SVE_BRIDGE` is defined to 1 if the [`<arm_neon_sve_bridge.h>`](#arm_neon_sve_bridge.h)
Expand All @@ -2002,6 +2010,7 @@ of SME has an associated preprocessor macro, given in the table below:
| FEAT_SME | __ARM_FEATURE_SME |
| FEAT_SME2 | __ARM_FEATURE_SME2 |
| FEAT_SME2p1 | __ARM_FEATURE_SME2p1 |
| FEAT_SME2p2 | __ARM_FEATURE_SME2p2 |

Each macro is defined if there is hardware support for the associated
architecture feature and if all of the [ACLE
Expand Down Expand Up @@ -2674,6 +2683,7 @@ be found in [[BA]](#BA).
| [`__ARM_FEATURE_SVE2_SM3`](#sm3-extension) | SVE2 support for the SM3 cryptographic extension (FEAT_SVE_SM3) | 1 |
| [`__ARM_FEATURE_SVE2_SM4`](#sm4-extension) | SVE2 support for the SM4 cryptographic extension (FEAT_SVE_SM4) | 1 |
| [`__ARM_FEATURE_SVE2p1`](#sve2) | SVE version 2.1 (FEAT_SVE2p1)
| [`__ARM_FEATURE_SVE2p2`](#sve2) | SVE version 2.2 (FEAT_SVE2p2)
| [`__ARM_FEATURE_SYSREG128`](#bit-system-registers) | Support for 128-bit system registers (FEAT_SYSREG128) | 1 |
| [`__ARM_FEATURE_UNALIGNED`](#unaligned-access-supported-in-hardware) | Hardware support for unaligned access | 1 |
| [`__ARM_FP`](#hardware-floating-point) | Hardware floating-point | 1 |
Expand Down Expand Up @@ -12927,6 +12937,41 @@ Zero ZA vector groups
__arm_streaming __arm_inout("za");
```

### SME2.2 instruction intrinsics

The specification for SME2.2 are in
[**Alpha** state](#current-status-and-anticipated-changes) and might change or be
extended in the future.

The intrinsics in this section are defined by the header file
[`<arm_sme.h>`](#arm_sme.h) when `__ARM_FEATURE_SME2p2` is defined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these instructions do not operate on ZA array, I think they should be in arm_sve.h.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After internal discusion this whole system of specifying header files will be overhauled, so no changes necessary here until then


#### FMUL

Multi-vector floating-point multiply

``` c
// Variants are also available for:
// [_single_f32_x2]
Copy link
Contributor

@Lukacma Lukacma Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should split this into x2 and x4 variants. I think this might confuse users as to what type to use for arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split them up. Does this look okay?

// [_single_f64_x2]
svfloat16x2_t svmul[_single_f16_x2](svfloat16x2_t zd, svfloat16_t zm) __arm_streaming;

// Variants are also available for:
// [_single_f32_x4]
// [_single_f64_x4]
svfloat16x4_t svmul[_single_f16_x4](svfloat16x4_t zd, svfloat16_t zm) __arm_streaming;

// Variants are also available for:
// [_f32_x2]
// [_f64_x2]
svfloat16x2_t svmul[_f16_x2](svfloat16x2_t zd, svfloat16x2_t zm) __arm_streaming;

// Variants are also available for:
// [_f32_x4]
// [_f64_x4]
svfloat16x4_t svmul[_f16_x4](svfloat16x4_t zd, svfloat16x4_t zm) __arm_streaming;
```

### Streaming-compatible versions of standard routines

ACLE provides the following streaming-compatible functions,
Expand Down Expand Up @@ -13476,6 +13521,97 @@ While (resulting in predicate tuple)
svboolx2_t svwhilelt_b8[_s64]_x2(int64_t rn, int64_t rm);
```

### SVE2.2 and SME2.2 instruction intrinsics

The specification for SVE2.2 and SME2.2 are in
[**Alpha** state](#current-status-and-anticipated-changes) and might change or be
extended in the future.

The functions in this section are defined by either the header file
[`<arm_sve.h>`](#arm_sve.h) or [`<arm_sme.h>`](#arm_sme.h)
when `__ARM_FEATURE_SVE2p2` or `__ARM_FEATURE_SME2p2` is defined, respectively.

#### FCVTXNT, FCVTLT, FCVTNT, BFCVTNT

Zeroing forms of convert instructions.

```c

// Variant is available for _f64_f32
svfloat32_t svcvtlt_f32[_f16]_z (svbool_t pg, svfloat16_t op);

// Variants are available for:
// _f32_f64, _bf16_f32
svfloat16_t svcvtnt_f16[_f32]_z (svfloat16_t even, svbool_t pg, svfloat32_t op);

svfloat32_t svcvtxnt_f32[_f64]_z (svfloat32_t even, svbool_t pg, svfloat64_t op);
```

#### FRINT32X, FRINT32Z, FRINT64X, FRINT64Z

Round to integral floating-point values.

```c

// Variant is available for _f64
svfloat32_t svrint32x[_f32]_z(svbool_t pg, svfloat32_t zn);
// Variant is available for _f64
svfloat32_t svrint32x[_f32]_x(svbool_t pg, svfloat32_t zn);
// Variant is available for _f64
svfloat32_t svrint32x[_f32]_m(svfloat32_t inactive, svbool_t pg, svfloat32_t zn);

// Variant is available for _f64
svfloat32_t svrint32z[_f32]_z(svbool_t pg, svfloat32_t zn);
// Variant is available for _f64
svfloat32_t svrint32z[_f32]_x(svbool_t pg, svfloat32_t zn);
// Variant is available for _f64
svfloat32_t svrint32z[_f32]_m(svfloat32_t inactive, svbool_t pg, svfloat32_t zn);

// Variant is available for _f64
svfloat32_t svrint64x[_f32]_z(svbool_t pg, svfloat32_t zn);
// Variant is available for _f64
svfloat32_t svrint64x[_f32]_x(svbool_t pg, svfloat32_t zn);
// Variant is available for _f64
svfloat32_t svrint64x[_f32]_m(svfloat32_t inactive, svbool_t pg, svfloat32_t zn);

// Variant is available for _f64
svfloat32_t svrint64z[_f32]_z(svbool_t pg, svfloat32_t zn);
// Variant is available for _f64
svfloat32_t svrint64z[_f32]_x(svbool_t pg, svfloat32_t zn);
// Variant is available for _f64
svfloat32_t svrint64z[_f32]_m(svfloat32_t inactive, svbool_t pg, svfloat32_t zn);
```

#### COMPACT, EXPAND

Copy active vector elements to/from lower-numbered elements.

``` c
// Variants are available for:
// _s8, _s16, _u16, _mf8, _bf16, _f16
svuint8_t svcompact[_u8](svbool_t pg, svuint8_t zn);

// Variants are available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64
// _mf8, _bf16, _f16, _f32, _f64
svuint8_t svexpand[_u8](svbool_t pg, svuint8_t zn);

```

#### FIRSTP, LASTP

Scalar index of first/last true predicate element (predicated).

``` c
// Variants are available for:
// _b16, _b32, _b64
int64_t svfirstp_b8(svbool_t pg, svbool_t pn);

// Variants are available for:
// _b16, _b32, _b64
int64_t svlastp_b8(svbool_t pg, svbool_t pn);

```

### SME2 maximum and minimum absolute value

Expand Down
Loading