Skip to content

Conversation

@GunaDD
Copy link
Contributor

@GunaDD GunaDD commented Dec 5, 2025

Description

New Keccak with Xorin and Keccakf chip and opcode

Checklist

  • I have performed a self-review of my own code
  • Add negative tests for xorin chip
  • Add negative tests for keccakf chip
  • Add unit test to CI
  • Add new guest code for E2E test to CI (the keccak example is updated, but I am thinking of adding another one)
  • Check with Ayush if I implemented the SizedRecord trait correctly
  • Rebase to include Zach's new Plonky3 update and update the keccakf trace gen to not have to transpose any more before giving it into the input

To reviewer: I will still have to complete the above checklist. But you can start reviewing if you would like to.

Closes INT-5017, INT-5721, INT-5720, INT-5718, INT-5717, INT-5646, INT-5018

@GunaDD GunaDD changed the base branch from main to develop-v1.6.0 December 5, 2025 20:01
@GunaDD GunaDD marked this pull request as draft December 5, 2025 22:07
@branch-rebase-bot branch-rebase-bot bot force-pushed the develop-v1.6.0 branch 5 times, most recently from 7d7039c to 05a6d51 Compare December 8, 2025 19:48
@shuklaayush shuklaayush changed the base branch from develop-v1.6.0 to develop-new-keccak December 16, 2025 22:16
let limb = u16_idx % U64_LIMBS;
let y = i / 5;
let x = i % 5;
let x = i / 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why'd you change this? unless you updated the plonky3 commit; this is just definitional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because somehow some bus which was unbalanced before became balanced after this change but I am still trying to understand what is the correct way to do it since the memory read / write busses are still unbalanced because the data being written after the keccakf operation does not match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fn generate_trace_rows_for_perm<F: PrimeField64>(rows: &mut [KeccakCols<F>], input: [u64; 25]) {
    let mut current_state: [[u64; 5]; 5] = unsafe { transmute(input) };

    let initial_state: [[[F; 4]; 5]; 5] =
        array::from_fn(|y| array::from_fn(|x| u64_to_16_bit_limbs(current_state[x][y])));

turns out inside plonky3's generate_trace_rows the input gets transposed which is why the bus becoming balanced makes sense now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out in the current keccak trace gen we initially transpose the state first

// We need to transpose state matrices due to a plonky3 issue: https://github.com/Plonky3/Plonky3/issues/672
// Note: the fix for this issue will be a commit after the major Field crate refactor PR https://github.com/Plonky3/Plonky3/pull/640
// which will require a significant refactor to switch to.
let state = from_fn(|i| {
let x = i / 5;
let y = i % 5;
states[block_idx][x + 5 * y]
});

which is why we don't swap y and x there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out to fix it I just have to transpose the input to the plonky3 trace gen and since the transpose of the transpose is the original matrix, everything will be as expected. fixed it here

// the reason we give the transpose instead is inside, plonky3 transpose the input
// so transpose of transpose fixes it
let p3_trace: RowMajorMatrix<F> = generate_trace_rows(vec![preimage_buffer_bytes_u64_transpose], 0);
row[..NUM_KECCAK_PERM_COLS].copy_from_slice(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonathanpwang when the plonky3 thing gets fixed the change to the code should be to just revert into passing in vec![preimage_buffer_bytes_u64 instead of the transpose to generate_trace_rows

@codspeed-hq
Copy link

codspeed-hq bot commented Dec 18, 2025

CodSpeed Performance Report

Merging #2303 will improve performances by ×2.2

Comparing feat/new-keccak (e1c71cf) with main (77adf7e)1

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

⚡ 1 improvement
✅ 23 untouched
⏩ 36 skipped2

Benchmarks breakdown

Mode Benchmark BASE HEAD Change
WallTime benchmark_execute_metered[quicksort] 19.5 ms 9 ms ×2.2

Footnotes

  1. No successful run was found on develop-new-keccak (77adf7e) during the generation of this report, so main (77adf7e) was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

  2. 36 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

…go.toml - place it back when cuda tracegen is merged
@github-actions
Copy link

group app.proof_time_ms app.cycles app.cells_used leaf.proof_time_ms leaf.cycles leaf.cells_used
verify_fibair 226 322,610 2,058,654 - - -
fibonacci 1,045 1,500,209 2,100,402 - - -
regex 2,313 4,137,502 17,695,216 - - -
ecrecover 744 122,859 2,263,998 - - -
pairing 1,494 1,745,742 25,408,302 - - -

Commit: e1c71cf

Benchmark Workflow

@GunaDD GunaDD marked this pull request as ready for review December 18, 2025 05:05
Copy link
Collaborator

@shuklaayush shuklaayush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped a few comments. still reviewing

}

#[inline(always)]
unsafe fn execute_e12_impl<F: PrimeField32, CTX: ExecutionCtxTrait, const IS_E1: bool>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use IS_E1 to switch between slice and normal reads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for pointing this out

let x = i % 5;

let state_limb: AB::Expr = local.inner.preimage[y][x][limb].into();
let hi: AB::Expr = local.preimage_state_hi[i * U64_LIMBS + limb].into();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these need to be range checked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it needs to be and I forgot about it. thank you for pointing this out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now wondering what constraints the correctness of preimage_state_hi as the upper u8 limb of preimage in the current keccak256 chip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants