OptWBMatrix

OptWBMatrix is a fork of WBMatrix that introduces instruction-level and algorithmic optimizations to improve performance.

Overview of Changes (2025/08/29)

The functions xorU4/8/16/32/64/128/256 that compute the bit-parity (i.e., the sum of bits modulo 2 in U4/8/16/32/64/128/256) were renamed to parityU4/8/16/32/64/128/256 for clarity.
To simplify compilation and side-by-side benchmarking, every function in this fork (whether optimized or not) was renamed with the prefix opt_. This ensures no name clashes with the originals.
As the parity and matrix-transpose operations form the computational foundation of this library, optimizations were applied first to:
- parityU32/64/128/256,
- MattransM8/16/32/64/128/256, then extended to:
- MatMulVecM128,
- MatMulMatM64/128/256.
Equivalence between the original and optimized implementations were tested with 1000 random inputs per function pair.
Benchmarking was conducted with 100 repetitions per function comparison, where input data was freshly randomized for each repetition.

Benchmark Results

The table below summarizes performance.

Original (ms) and Optimized (ms) are average runtimes across repetitions.
Best Run (%) and Worst Run (%) are the extremes of speedup observed across repetitions.
Negative worst-case values indicate cases where the optimized implementation ran slower than the baseline, typically due to benchmarking noise.

Function	Original (ms)	Optimized (ms)	Avg Speedup (%)	Median Speedup (%)	Median Times Faster	Trimmed Mean Speedup (%)	Best Run (%)	Worst Run (%)
xorU32 vs parityU32	996.90	201.58	79.78	80.04	5.01x	79.85	82.88	69.02
xorU64 vs parityU64	255.59	40.71	84.02	83.87	6.20x	83.96	90.34	82.64
xorU128 vs parityU128	150.56	31.45	79.17	79.73	4.93x	79.50	85.37	40.79
xorU256 vs parityU256	81.86	17.19	78.99	79.22	4.81x	79.07	84.40	65.85
MattransM8 vs Opt MattransM8	89.40	36.62	60.01	65.12	2.86x	65.11	72.67	-452.38
MattransM16 vs Opt MattransM16	213.58	175.37	17.87	27.80	1.38x	21.96	45.96	-410.65
MattransM32 vs Opt MattransM32	56.48	40.53	26.61	26.92	1.36x	27.29	70.77	-84.62
MattransM64 vs Opt MattransM64	125.32	92.04	25.94	25.41	1.34x	25.72	56.54	17.21
MattransM128 vs Opt MattransM128	46.83	41.34	11.69	11.11	1.12x	11.84	37.50	-29.21
MattransM256 vs Opt MattransM256	188.69	168.67	9.61	9.84	1.10x	9.40	65.05	-25.97
MatMulVecM128 vs Opt MatMulVecM128	442.97	93.53	78.71	79.44	4.86x	79.12	92.83	24.75
MatMulMatM64 vs Opt MatMulMatM64	1171.56	242.89	79.27	79.46	4.86x	79.45	82.42	58.28
MatMulMatM128 vs Opt MatMulMatM128	545.55	115.41	78.85	78.90	4.73x	78.84	79.05	78.68
MatMulMatM256 vs Opt MatMulMatM256	1164.92	282.11	75.78	75.77	4.12x	75.78	76.00	75.60

Summary:

Parity functions achieve ~5-6x speedups (~80% reduction in runtime).
Matrix transposes yields modest improvements (1.1-2.9x) with more variability.
Matrix multiplications show ~4-5x speedups, highly consistent.

Benchmark Data Sizes

Function	Data size per repetition
xorU32 vs parityU32	50,000,000
xorU64 vs parityU64	10,000,000
xorU128 vs parityU128	5,000,000
xorU256 vs parityU256	1,250,000
MattransM8 vs Opt MattransM8	1,000,000
MattransM16 vs Opt MattransM16	1,000,000
MattransM32 vs Opt MattransM32	100,000
MattransM64 vs Opt MattransM64	100,000
MattransM128 vs Opt MattransM128	10,000
MattransM256 vs Opt MattransM256	10,000
MatMulVecM128 vs Opt MatMulVecM128	100,000
MatMulMatM64 vs Opt MatMulMatM64	10,000
MatMulMatM128 vs Opt MatMulMatM128	1000
MatMulMatM256 vs Opt MatMulMatM256	500

Metric Definitions

Speedup (%) $$ \text{Speedup}(%) = 100 \times \frac{\text{origTime} - \text{optTime}}{\text{origTime}} $$
Times Faster $$ \text{TimesFaster} = \frac{\text{origTime}}{\text{optTime}} $$
Trimmed Mean Speedup Defined as the mean speedup with the maximum and minumum values removed: $$ \text{TrimmedMean}(s) = \frac{1}{n-2} \sum_{i=2}^{n-1} s_{(i)}, $$ where $s_{(i)}$ are the sorted speedups.

WBMatrix

An Optimized Matrix Library for White-Box Block Cipher Implementations.

Contains the matrix operations related to the white-box block cipher implementation and provides thorough test cases for their performance and accuracy. The test cases also include the Chow et al.'s white-box AES and Xiao-Lai's white-box SM4 implementations built by WBMatrix, NTL, and M4RI, respectively.

A preview version was released at Nexus-TYF/WBMatrix. But this repository is the latest version of WBMatrix and is relevant to the paper "WBMatrix: An Optimized Matrix Library for White-Box Block Cipher Implementations" by Yufeng Tang, Zheng Gong, Tao Sun, Jinhai Chen and Zhe Liu in IEEE Transactions on Computers.

DOI: 10.1109/TC.2022.3152449.

Applications

Clone

$ git clone https://github.com/scnucrypto/WBMatrix.git

Matrix Library

Supports For Following Operations (4/8/16/32/64/128/256 bits)

Matrix-Vector multiplication.
Matrix-Matrix multiplication.
Generation of an invertible Matrix with its inverse matrix (pairwise invertible matrices).
Generation of the pairwise invertible affine transformations.
Matrix transpositon.
Affine transformation.
Encodings concatenation.
Encodings conversion.

Header Files

WBMatrix.h The declaration of the main functions.
struture.h Data structure of the matrix and affine functions.
random.h The declaration of the random functions.

Main Functions (8bit in Example)

affineU8(Aff8 aff, uint8_t arr) affine transformation for an uint8_t number arr, and returns an uint8_t result.
affinemixM8(Aff8 aff, Aff8 preaff_inv, Aff8 *mixaff) affine conversion between aff and preaff_inv, result is set in mixaff.
affinecomM8to32(Aff8 aff1, Aff8 aff2, Aff8 aff3, Aff8 aff4, Aff32 *aff) affine concatenation, the matrix part of aff consists of the submatrices on its diagonal, while the vector part of aff consists of the subvectors.
copyM8(M8 Mat1, M8 *Mat2) replicates the matrix Mat1 to Mat2.
flipbitM8(M8 *Mat, int i, int j) flips the (i, j) bit in matrix Mat.
genMatpairM8(M8 *Mat, M8 *Mat_inv) generates an invertible matrix Mat and its inverse matrix Mat_inv.
genaffinepairM8(Aff8 *aff, Aff8 *aff_inv) generates an affine transformation aff and its inversion aff_inv.
identityM8(M8 *Mat) converts the matrix Mat into an identity matrix.
invsM8(M8 Mat, M8 *Mat_inv) calculates the inversion of Mat by Gaussian elimination method, result is set in Mat_inv.
isinvertM8(M8 Mat) determines if the matrix is invertible (1 for positive).
MatMulVecM8(M8 Mat, V8 Vec, V8 *ans) multiplication between a matrix Mat and a vertor Vec, result is set in ans.
MatMulNumM8(M8 Mat, uint8_t n) multiplication between a matrix Mat and a number n, returns a number.
MatMulMatM8(M8 Mat1, M8 Mat2, M8 *Mat) multiplication between a matrix Mat1 and a matrix Mat2, result is set in Mat.
MatAddMatM8(M8 Mat1, M8 Mat2, M8 *Mat) addition between the matrix Mat1 and Mat2, result is set in Mat.
MattransM8(M8 Mat, M8 *Mat_trans) transpositon for a matrix Mat, result is set in Mat_trans.
readbitM8(M8 Mat, int i, int j) extracts the (i, j) bit in matrix Mat, returns 0/1.
setbitM8(M8 *Mat, int i, int j, int bit) assigns the (i, j) bit a value bit (0/1).
initM8(M8 *Mat) converts all the elements of the matrix Mat into 0.
randM8(M8 *Mat) generates a random matrix Mat.
printbitM8(M8 Mat) prints all the elements of the matrix Mat.
isequalM8(M8 Mat1, M8 Mat2) determines if the matrix Mat1 is equal to Mat2 (1 for positive).
initV8(V8 *Vec) converts all the elements of the vector Vec into 0.
randV8(V8 *Vec) generates a random vector Vec.
VecAddVecV8(V8 Vec1, V8 Vec2, V8 *Vec) addition between the vector Vec1 and Vec2, result is set in Vec.
HWU8(uint8_t n) calculates the Hamming Weight of a number n.

Code Examples

M8 mat[3]; //defines an 8-bit matrix.
genMatpairM8(&mat[0], &mat[1]); //generates the pairwise invertible matrices.
MatMulMatM8(mat[0], mat[1], &mat[2]); //matrix-matrix multiplication.
printM8(mat[2]); //prints the matrix.

Included library

RandomSequence

Test Cases

Folder Introduction

github1_M4RI The performance test for matrix operation and for the generation of the pairwise invertible matirces by M4RI library.
github(x) The performance test for the generation of an invertible matrix or the computation of its invertion by the implementations on Github.
NTL The performance test for matrix operation and for the generation of the pairwise invertible matirces by NTL library.
randomness A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications (NIST Special Publication 800-22 Revision 1a).
WBAES A Chow et al.'s white-box AES implementation and its performance test built by WBMatrix, NTL, and M4RI respectively.
WBSM4 A Xiao-Lai's white-box SM4 implementation and its performance test built by WBMatrix, NTL, and M4RI respectively.

File Introduction

Accuracy_test.c Accuracy test for the matrix operations in WBMatrix.
BasisMatrixMethod_test.c Performance test for the generation of the pairwise invertible matrices by Basis Matrix Method.
LowMCMethod_text.cpp Performance test for the generation of the pairwise invertible matrices by LowMC Method and Gaussian Elimination method.
RGEMethod_test.c Performance test for the generation of the pairwise invertible matrices by Reverse Gaussian Elimination Method and Gaussian Elimination method.
RLUDMethod_test.c Performance test for the generation of the pairwise invertible matrices by Reverse LU Decomposition Method and Gaussian Elimination method.
WBGEMethod_test.c Performance test for the generation of the pairwise invertible matrices by Randomly Generate and Verify Method and Gaussian Elimination method.
WBMatrixMatOp_test.c Performance test for the matrix operations in WBMatrix.
WBMatrixMethod_test.c Performance test for the generation of the pairwise invertible matrices by WBMatrix Method.

Build

$ mkdir build
$ cd build
$ cmake ..
$ make

Run

$ ./WBMM

Included libraries

Last Updated : 2022/02/20
WBMatrix Version: 3.3.0

Upgrade history:
(2019/12/9)

Added: An invertible matrix is generated from an initialized matrix (now just supports for 8/32bits operations).
Fixed: Unifies the API.
Added: The adjustable generation times in inverse.h.
Added: Uses initinvbaseM(8/32)() function to generate an initialized invertible matrix and its trails are recorded in basetrailM(8/32).
8bits default value is 10,
32bits default value is 30,
which represent the operation times.
Added: If not use the initialized function then each matrix is generated from an identity matrix with the default times.
Added: Copy function to replace the identify function.

(2019/12/10)

Added: 16/64/128bits inverse matrix functions.
New method has been covered.

(2019/12/11)

Added: 16/64bit affine transformation.
Added: 128bit affine transformation.
No retrun value because of its special structure.

(2019/12/12)

Added: 16/64/128bit affine combination operation.

(2019/12/16)

Added: the header files for a defination of the matices.

(2019/12/17)

Fixed: Error fixes.
Added: The parameters for initializing the intermediate matrix function.
inverse.h has the max times and min times for selection.

(2020/01/08)

Added: Matrix addition function.

(2020/01/10)

Improved: File tidying.
Added: WBMatrix test.
Added: Matrix Basis Method test.

(2020/01/12)

Added: 128bit test for matrix basis method.

(2020/01/18)

Added: Updates the test case of the generation of an invertible matrix and the computation of its inverse matrix.
Added: Invertible funcions: Matrix Basis Method, WBMatrix Method, Reverse Gaussian Elimination Method.
Added: Inverse functions: WBMatrix Method, Matrix Basis Method.

(2020/01/20)

Added: CMakeLists.txt
Added: M4RI Method.

(2020/01/21)

Improved: Organizes file structure, especially fixs the structure.h and .c errors.

(2020/01/22)

Improved: Deletes xor.h.

(2020/01/30)

Added: Gaussian elimination Method (Based on WBMatrix).
Improved: Changes the generation function of a random Matrix.

(2020/01/31)

Added: Reverse LU Decomposition Method.

(2020/02/01)

Improved: Functions for the generation of a random matrix.

(2020/02/02)

Added: Comparison test on github.
Added: Accuracy Test.
Improved: Parameter Orders of the affinemix function.

(2020/02/07)

Fixed: Multipe defination of the global variables.
Added: Function for random seed.
Added: WBAES.

(2020/02/09)

Fixed: Poor randomness of the random matrix function.
Added: Function for estimating the invertibility of a matrix.

(2020/02/16)

Added: New test cases from github.

(2020/03/05)

Added: Performance test cases of M4RI: basic arithmetic with matrix.
Added: Performance test cases of NTL.
Added: Performance test cases of WBMatrix.

(2020/03/06)

Added: Vector addition funcion.
Fixed: Accuracy test mode.
Improved: Replaces the rotation with a logical-AND.

(2020/03/07)

Added: WBAES by M4RI.

(2020/03/09)

Added: WBAES by WBMatrix.

(2020/03/10)

Added: WBSM4 by M4RI.
Fixed: The release version of WBAES (WBMatrix version).
Added: WBSM4 by WBMatrix.

(2020/03/11)

Added: WBSM4 by NTL.
Improved: Clean-up NTL files.

(2020/03/15)

Added: Release on github.

(2020/04/15)

Added: Supports for returning Hamming Weight.
Added: An example for mitigating DCA attack.

(2020/06/22)

Added: The references of the articles and implementations.
Fixed: Errors of the random function in Linux.

(2020/06/25)

Added: Randomness test cases (Special Publication 800-22 Revision 1a).

(2020/07/01)

Fixed: Updates the random functions.

(2020/07/31)

Added: Updates the new method for generating the pairwise invetible matrices.
Added: Bitwise operation (read/flip/set) functions.
Added: The function for calculating the inversion of an invertible matrix by Gaussian elimination method.

(2020/08/01)

Added: Supports for 4-bit matrix operations.
Added: 8to64, 8to128, 16to64, 32to128, 16to128 concatenation functions.

(2020/08/09)

Fixed: Errors of the comments in misc.c.
Added: 4-bit test cases.

(2020/08/10)

Added: Supports for C++.
Added: LowMC Method.

(2020/08/24)

Fixed: Free from C99.

(2020/09/29)

Added: A new matrix transposition function.

(2021/01/12)

Added: Supports for partial 256-bit operations.
Added: Partial 256-bit test cases.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
include		include
src		src
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
benchmark_results.png		benchmark_results.png
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OptWBMatrix

Overview of Changes (2025/08/29)

Benchmark Results

Benchmark Data Sizes

Metric Definitions

WBMatrix

Applications

Clone

Matrix Library

Supports For Following Operations (4/8/16/32/64/128/256 bits)

Header Files

Main Functions (8bit in Example)

Code Examples

Included library

Test Cases

Folder Introduction

File Introduction

Build

Run

Included libraries

About

Uh oh!

Releases

Packages

Languages

License

traffictse/OptWBMatrix

Folders and files

Latest commit

History

Repository files navigation

OptWBMatrix

Overview of Changes (2025/08/29)

Benchmark Results

Benchmark Data Sizes

Metric Definitions

WBMatrix

Applications

Clone

Matrix Library

Supports For Following Operations (4/8/16/32/64/128/256 bits)

Header Files

Main Functions (8bit in Example)

Code Examples

Included library

Test Cases

Folder Introduction

File Introduction

Build

Run

Included libraries

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages