Linear map generation for some muls and using the method of four russians #3

FawadHa1der · 2025-07-22T14:18:41Z

In sumcheck there are scenarios(like when the verifier issues a challenge) when one operand in a multiplication is fixed and repeated many times. This pull request aims to exploit and reduce redundant computations. This technique is especially very useful in bitsliced setting.

When one of the multiplication operand is repeated, we can generate a linear map of the multiplication. i.e there are no non-linear(AND) ops in the map. The map (referenced as constant mul matrix in code) is generated by multiplying the constant with basis elements of the field. This one time cost amortized over many multiplications where the same fixed operand is used.

Since the constant mul matrix is a linear map. Now we can use the "method of four russians" on it and reduce the number of XOR operations significantly.

There are lot of optimizations that still can be done. But this PR is a minimal proof of concept and with localized changes. For NUM_VARS: 28 COMPOSITION_SIZE: 4 we get about 11% improvement in raw computation.

…icient map and applying method of four russians

FawadHa1der · 2025-07-22T14:20:04Z

Probably needs a little more cleanup. I would love to work further on this and have it merged. Please let me know whatever cleanup and modifications the team wants.

FawadHa1der · 2025-08-15T16:07:14Z

The same technique can also be applied to normal non-bitsliced data. After the pre-computation 128 bit mul can be done in 16 XORS + 16 lookups. or 32 XORS + 32 lookups. Example C code below.

void build_byte_tables_from_cols(const __uint128_t cols[128], __uint128_t T[16][256])
{
for (int pos = 0; pos < 16; ++pos) {
T[pos][0] = 0;
for (int v = 1; v < 256; ++v) {
int lsb = v & -v;
int bit = __builtin_ctz(lsb); // 0..7
T[pos][v] = T[pos][v ^ lsb] ^ cols[pos*8 + bit];
}
}
}

__uint128_t mul_const_neon_bytes(const __uint128_t T[16][256], __uint128_t X)
{
uint8_t xb[16];
u128_to_bytes_le(X, xb); // may be we can remove this?

uint64x2_t acc = vdupq_n_u64(0);
for (int pos = 0; pos < 16; ++pos) {
uint64x2_t t = vld1q_u64((const uint64_t*)&T[pos][ xb[pos] ]);
acc = veorq_u64(acc, t);
}

uint64_t out64[2];
vst1q_u64(out64, acc);
__uint128_t y = (__uint128_t)out64[0] | ((__uint128_t)out64[1] << 64);
return y;
}

FawadHa1der added 3 commits July 17, 2025 16:39

tower height 2 bitsliced mul via M4R lookups

2d370bc

temp tower height 7 solution

29b5d5d

exploit one mul operand being repeated by generating a constant coeff…

9344306

…icient map and applying method of four russians

clean up old commented code

f6052a8

FawadHa1der force-pushed the matrix_constant_mul branch from 00f11c0 to f6052a8 Compare July 22, 2025 14:36

better function names

71c1870

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Linear map generation for some muls and using the method of four russians #3

Linear map generation for some muls and using the method of four russians #3

Uh oh!

FawadHa1der commented Jul 22, 2025 •

edited

Loading

Uh oh!

FawadHa1der commented Jul 22, 2025

Uh oh!

FawadHa1der commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Linear map generation for some muls and using the method of four russians #3

Are you sure you want to change the base?

Linear map generation for some muls and using the method of four russians #3

Uh oh!

Conversation

FawadHa1der commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FawadHa1der commented Jul 22, 2025

Uh oh!

FawadHa1der commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FawadHa1der commented Jul 22, 2025 •

edited

Loading