OptWBMatrix is a fork of WBMatrix that introduces instruction-level and algorithmic optimizations to improve performance.
- The functions
xorU4/8/16/32/64/128/256that compute the bit-parity (i.e., the sum of bits modulo 2 inU4/8/16/32/64/128/256) were renamed toparityU4/8/16/32/64/128/256for clarity. - To simplify compilation and side-by-side benchmarking, every function in this fork (whether optimized or not) was renamed with the prefix
opt_. This ensures no name clashes with the originals. - As the parity and matrix-transpose operations form the computational foundation of this library, optimizations were applied first to:
parityU32/64/128/256,MattransM8/16/32/64/128/256, then extended to:MatMulVecM128,MatMulMatM64/128/256.
- Equivalence between the original and optimized implementations were tested with 1000 random inputs per function pair.
- Benchmarking was conducted with 100 repetitions per function comparison, where input data was freshly randomized for each repetition.
The table below summarizes performance.
Original (ms)andOptimized (ms)are average runtimes across repetitions.Best Run (%)andWorst Run (%)are the extremes of speedup observed across repetitions.- Negative worst-case values indicate cases where the optimized implementation ran slower than the baseline, typically due to benchmarking noise.
| Function | Original (ms) | Optimized (ms) | Avg Speedup (%) | Median Speedup (%) | Median Times Faster | Trimmed Mean Speedup (%) | Best Run (%) | Worst Run (%) |
|---|---|---|---|---|---|---|---|---|
| xorU32 vs parityU32 | 996.90 | 201.58 | 79.78 | 80.04 | 5.01x | 79.85 | 82.88 | 69.02 |
| xorU64 vs parityU64 | 255.59 | 40.71 | 84.02 | 83.87 | 6.20x | 83.96 | 90.34 | 82.64 |
| xorU128 vs parityU128 | 150.56 | 31.45 | 79.17 | 79.73 | 4.93x | 79.50 | 85.37 | 40.79 |
| xorU256 vs parityU256 | 81.86 | 17.19 | 78.99 | 79.22 | 4.81x | 79.07 | 84.40 | 65.85 |
| MattransM8 vs Opt MattransM8 | 89.40 | 36.62 | 60.01 | 65.12 | 2.86x | 65.11 | 72.67 | -452.38 |
| MattransM16 vs Opt MattransM16 | 213.58 | 175.37 | 17.87 | 27.80 | 1.38x | 21.96 | 45.96 | -410.65 |
| MattransM32 vs Opt MattransM32 | 56.48 | 40.53 | 26.61 | 26.92 | 1.36x | 27.29 | 70.77 | -84.62 |
| MattransM64 vs Opt MattransM64 | 125.32 | 92.04 | 25.94 | 25.41 | 1.34x | 25.72 | 56.54 | 17.21 |
| MattransM128 vs Opt MattransM128 | 46.83 | 41.34 | 11.69 | 11.11 | 1.12x | 11.84 | 37.50 | -29.21 |
| MattransM256 vs Opt MattransM256 | 188.69 | 168.67 | 9.61 | 9.84 | 1.10x | 9.40 | 65.05 | -25.97 |
| MatMulVecM128 vs Opt MatMulVecM128 | 442.97 | 93.53 | 78.71 | 79.44 | 4.86x | 79.12 | 92.83 | 24.75 |
| MatMulMatM64 vs Opt MatMulMatM64 | 1171.56 | 242.89 | 79.27 | 79.46 | 4.86x | 79.45 | 82.42 | 58.28 |
| MatMulMatM128 vs Opt MatMulMatM128 | 545.55 | 115.41 | 78.85 | 78.90 | 4.73x | 78.84 | 79.05 | 78.68 |
| MatMulMatM256 vs Opt MatMulMatM256 | 1164.92 | 282.11 | 75.78 | 75.77 | 4.12x | 75.78 | 76.00 | 75.60 |
Summary:
- Parity functions achieve ~5-6x speedups (~80% reduction in runtime).
- Matrix transposes yields modest improvements (1.1-2.9x) with more variability.
- Matrix multiplications show ~4-5x speedups, highly consistent.
| Function | Data size per repetition |
|---|---|
| xorU32 vs parityU32 | 50,000,000 |
| xorU64 vs parityU64 | 10,000,000 |
| xorU128 vs parityU128 | 5,000,000 |
| xorU256 vs parityU256 | 1,250,000 |
| MattransM8 vs Opt MattransM8 | 1,000,000 |
| MattransM16 vs Opt MattransM16 | 1,000,000 |
| MattransM32 vs Opt MattransM32 | 100,000 |
| MattransM64 vs Opt MattransM64 | 100,000 |
| MattransM128 vs Opt MattransM128 | 10,000 |
| MattransM256 vs Opt MattransM256 | 10,000 |
| MatMulVecM128 vs Opt MatMulVecM128 | 100,000 |
| MatMulMatM64 vs Opt MatMulMatM64 | 10,000 |
| MatMulMatM128 vs Opt MatMulMatM128 | 1000 |
| MatMulMatM256 vs Opt MatMulMatM256 | 500 |
- Speedup (%) $$ \text{Speedup}(%) = 100 \times \frac{\text{origTime} - \text{optTime}}{\text{origTime}} $$
- Times Faster $$ \text{TimesFaster} = \frac{\text{origTime}}{\text{optTime}} $$
-
Trimmed Mean Speedup
Defined as the mean speedup with the maximum and minumum values removed:
$$
\text{TrimmedMean}(s) = \frac{1}{n-2} \sum_{i=2}^{n-1} s_{(i)},
$$
where
$s_{(i)}$ are the sorted speedups.
An Optimized Matrix Library for White-Box Block Cipher Implementations.
Contains the matrix operations related to the white-box block cipher implementation and provides thorough test cases for their performance and accuracy. The test cases also include the Chow et al.'s white-box AES and Xiao-Lai's white-box SM4 implementations built by WBMatrix, NTL, and M4RI, respectively.
A preview version was released at Nexus-TYF/WBMatrix. But this repository is the latest version of WBMatrix and is relevant to the paper "WBMatrix: An Optimized Matrix Library for White-Box Block Cipher Implementations" by Yufeng Tang, Zheng Gong, Tao Sun, Jinhai Chen and Zhe Liu in IEEE Transactions on Computers.
DOI: 10.1109/TC.2022.3152449.
$ git clone https://github.com/scnucrypto/WBMatrix.git
- Matrix-Vector multiplication.
- Matrix-Matrix multiplication.
- Generation of an invertible Matrix with its inverse matrix (pairwise invertible matrices).
- Generation of the pairwise invertible affine transformations.
- Matrix transpositon.
- Affine transformation.
- Encodings concatenation.
- Encodings conversion.
- WBMatrix.h The declaration of the main functions.
- struture.h Data structure of the matrix and affine functions.
- random.h The declaration of the random functions.
- affineU8(Aff8 aff, uint8_t arr) affine transformation for an uint8_t number arr, and returns an uint8_t result.
- affinemixM8(Aff8 aff, Aff8 preaff_inv, Aff8 *mixaff) affine conversion between aff and preaff_inv, result is set in mixaff.
- affinecomM8to32(Aff8 aff1, Aff8 aff2, Aff8 aff3, Aff8 aff4, Aff32 *aff) affine concatenation, the matrix part of aff consists of the submatrices on its diagonal, while the vector part of aff consists of the subvectors.
- copyM8(M8 Mat1, M8 *Mat2) replicates the matrix Mat1 to Mat2.
- flipbitM8(M8 *Mat, int i, int j) flips the (i, j) bit in matrix Mat.
- genMatpairM8(M8 *Mat, M8 *Mat_inv) generates an invertible matrix Mat and its inverse matrix Mat_inv.
- genaffinepairM8(Aff8 *aff, Aff8 *aff_inv) generates an affine transformation aff and its inversion aff_inv.
- identityM8(M8 *Mat) converts the matrix Mat into an identity matrix.
- invsM8(M8 Mat, M8 *Mat_inv) calculates the inversion of Mat by Gaussian elimination method, result is set in Mat_inv.
- isinvertM8(M8 Mat) determines if the matrix is invertible (1 for positive).
- MatMulVecM8(M8 Mat, V8 Vec, V8 *ans) multiplication between a matrix Mat and a vertor Vec, result is set in ans.
- MatMulNumM8(M8 Mat, uint8_t n) multiplication between a matrix Mat and a number n, returns a number.
- MatMulMatM8(M8 Mat1, M8 Mat2, M8 *Mat) multiplication between a matrix Mat1 and a matrix Mat2, result is set in Mat.
- MatAddMatM8(M8 Mat1, M8 Mat2, M8 *Mat) addition between the matrix Mat1 and Mat2, result is set in Mat.
- MattransM8(M8 Mat, M8 *Mat_trans) transpositon for a matrix Mat, result is set in Mat_trans.
- readbitM8(M8 Mat, int i, int j) extracts the (i, j) bit in matrix Mat, returns 0/1.
- setbitM8(M8 *Mat, int i, int j, int bit) assigns the (i, j) bit a value bit (0/1).
- initM8(M8 *Mat) converts all the elements of the matrix Mat into 0.
- randM8(M8 *Mat) generates a random matrix Mat.
- printbitM8(M8 Mat) prints all the elements of the matrix Mat.
- isequalM8(M8 Mat1, M8 Mat2) determines if the matrix Mat1 is equal to Mat2 (1 for positive).
- initV8(V8 *Vec) converts all the elements of the vector Vec into 0.
- randV8(V8 *Vec) generates a random vector Vec.
- VecAddVecV8(V8 Vec1, V8 Vec2, V8 *Vec) addition between the vector Vec1 and Vec2, result is set in Vec.
- HWU8(uint8_t n) calculates the Hamming Weight of a number n.
M8 mat[3]; //defines an 8-bit matrix.
genMatpairM8(&mat[0], &mat[1]); //generates the pairwise invertible matrices.
MatMulMatM8(mat[0], mat[1], &mat[2]); //matrix-matrix multiplication.
printM8(mat[2]); //prints the matrix.
- github1_M4RI The performance test for matrix operation and for the generation of the pairwise invertible matirces by M4RI library.
- github(x) The performance test for the generation of an invertible matrix or the computation of its invertion by the implementations on Github.
- NTL The performance test for matrix operation and for the generation of the pairwise invertible matirces by NTL library.
- randomness A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications (NIST Special Publication 800-22
Revision 1a).
- WBAES A Chow et al.'s white-box AES implementation and its performance test built by WBMatrix, NTL, and M4RI respectively.
- WBSM4 A Xiao-Lai's white-box SM4 implementation and its performance test built by WBMatrix, NTL, and M4RI respectively.
- Accuracy_test.c Accuracy test for the matrix operations in WBMatrix.
- BasisMatrixMethod_test.c Performance test for the generation of the pairwise invertible matrices by Basis Matrix Method.
- LowMCMethod_text.cpp Performance test for the generation of the pairwise invertible matrices by LowMC Method and Gaussian Elimination method.
- RGEMethod_test.c Performance test for the generation of the pairwise invertible matrices by Reverse Gaussian Elimination Method and Gaussian Elimination method.
- RLUDMethod_test.c Performance test for the generation of the pairwise invertible matrices by Reverse LU Decomposition Method and Gaussian Elimination method.
- WBGEMethod_test.c Performance test for the generation of the pairwise invertible matrices by Randomly Generate and Verify Method and Gaussian Elimination method.
- WBMatrixMatOp_test.c Performance test for the matrix operations in WBMatrix.
- WBMatrixMethod_test.c Performance test for the generation of the pairwise invertible matrices by WBMatrix Method.
$ mkdir build
$ cd build
$ cmake ..
$ make
$ ./WBMM
- NTL
- M4RI
- SMx-SM4
- WhiteBoxAES
- sp800_22_tests
- Inverse-matrix
- Inverse-Matrix
- parallelMatrixInversion
- InvertibleMatrix
- Inverse-of-Matrix
- inverseMatrix
- lowmc
Last Updated : 2022/02/20
WBMatrix Version: 3.3.0
Upgrade history:
(2019/12/9)
- Added: An invertible matrix is generated from an initialized matrix (now just supports for 8/32bits operations).
- Fixed: Unifies the API.
- Added: The adjustable generation times in inverse.h.
- Added: Uses initinvbaseM(8/32)() function to generate an initialized invertible matrix and its trails are recorded in basetrailM(8/32).
8bits default value is 10,
32bits default value is 30,
which represent the operation times. - Added: If not use the initialized function then each matrix is generated from an identity matrix with the default times.
- Added: Copy function to replace the identify function.
(2019/12/10)
- Added: 16/64/128bits inverse matrix functions.
New method has been covered.
(2019/12/11)
- Added: 16/64bit affine transformation.
- Added: 128bit affine transformation.
No retrun value because of its special structure.
(2019/12/12)
- Added: 16/64/128bit affine combination operation.
(2019/12/16)
- Added: the header files for a defination of the matices.
(2019/12/17)
- Fixed: Error fixes.
- Added: The parameters for initializing the intermediate matrix function.
inverse.h has the max times and min times for selection.
(2020/01/08)
- Added: Matrix addition function.
(2020/01/10)
- Improved: File tidying.
- Added: WBMatrix test.
- Added: Matrix Basis Method test.
(2020/01/12)
- Added: 128bit test for matrix basis method.
(2020/01/18)
- Added: Updates the test case of the generation of an invertible matrix and the computation of its inverse matrix.
- Added: Invertible funcions: Matrix Basis Method, WBMatrix Method, Reverse Gaussian Elimination Method.
- Added: Inverse functions: WBMatrix Method, Matrix Basis Method.
(2020/01/20)
- Added: CMakeLists.txt
- Added: M4RI Method.
(2020/01/21)
- Improved: Organizes file structure, especially fixs the structure.h and .c errors.
(2020/01/22)
- Improved: Deletes xor.h.
(2020/01/30)
- Added: Gaussian elimination Method (Based on WBMatrix).
- Improved: Changes the generation function of a random Matrix.
(2020/01/31)
- Added: Reverse LU Decomposition Method.
(2020/02/01)
- Improved: Functions for the generation of a random matrix.
(2020/02/02)
- Added: Comparison test on github.
- Added: Accuracy Test.
- Improved: Parameter Orders of the affinemix function.
(2020/02/07)
- Fixed: Multipe defination of the global variables.
- Added: Function for random seed.
- Added: WBAES.
(2020/02/09)
- Fixed: Poor randomness of the random matrix function.
- Added: Function for estimating the invertibility of a matrix.
(2020/02/16)
- Added: New test cases from github.
(2020/03/05)
- Added: Performance test cases of M4RI: basic arithmetic with matrix.
- Added: Performance test cases of NTL.
- Added: Performance test cases of WBMatrix.
(2020/03/06)
- Added: Vector addition funcion.
- Fixed: Accuracy test mode.
- Improved: Replaces the rotation with a logical-AND.
(2020/03/07)
- Added: WBAES by M4RI.
(2020/03/09)
- Added: WBAES by WBMatrix.
(2020/03/10)
- Added: WBSM4 by M4RI.
- Fixed: The release version of WBAES (WBMatrix version).
- Added: WBSM4 by WBMatrix.
(2020/03/11)
- Added: WBSM4 by NTL.
- Improved: Clean-up NTL files.
(2020/03/15)
- Added: Release on github.
(2020/04/15)
- Added: Supports for returning Hamming Weight.
- Added: An example for mitigating DCA attack.
(2020/06/22)
- Added: The references of the articles and implementations.
- Fixed: Errors of the random function in Linux.
(2020/06/25)
- Added: Randomness test cases (Special Publication 800-22 Revision 1a).
(2020/07/01)
- Fixed: Updates the random functions.
(2020/07/31)
- Added: Updates the new method for generating the pairwise invetible matrices.
- Added: Bitwise operation (read/flip/set) functions.
- Added: The function for calculating the inversion of an invertible matrix by Gaussian elimination method.
(2020/08/01)
- Added: Supports for 4-bit matrix operations.
- Added: 8to64, 8to128, 16to64, 32to128, 16to128 concatenation functions.
(2020/08/09)
- Fixed: Errors of the comments in misc.c.
- Added: 4-bit test cases.
(2020/08/10)
- Added: Supports for C++.
- Added: LowMC Method.
(2020/08/24)
- Fixed: Free from C99.
(2020/09/29)
- Added: A new matrix transposition function.
(2021/01/12)
- Added: Supports for partial 256-bit operations.
- Added: Partial 256-bit test cases.
