Skip to content

traffictse/OptWBMatrix

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OptWBMatrix

OptWBMatrix is a fork of WBMatrix that introduces instruction-level and algorithmic optimizations to improve performance.

Overview of Changes (2025/08/29)

  1. The functions xorU4/8/16/32/64/128/256 that compute the bit-parity (i.e., the sum of bits modulo 2 in U4/8/16/32/64/128/256) were renamed to parityU4/8/16/32/64/128/256 for clarity.
  2. To simplify compilation and side-by-side benchmarking, every function in this fork (whether optimized or not) was renamed with the prefix opt_. This ensures no name clashes with the originals.
  3. As the parity and matrix-transpose operations form the computational foundation of this library, optimizations were applied first to:
    • parityU32/64/128/256,
    • MattransM8/16/32/64/128/256, then extended to:
    • MatMulVecM128,
    • MatMulMatM64/128/256.
  4. Equivalence between the original and optimized implementations were tested with 1000 random inputs per function pair.
  5. Benchmarking was conducted with 100 repetitions per function comparison, where input data was freshly randomized for each repetition.

Benchmark Results

The table below summarizes performance.

  • Original (ms) and Optimized (ms) are average runtimes across repetitions.
  • Best Run (%) and Worst Run (%) are the extremes of speedup observed across repetitions.
  • Negative worst-case values indicate cases where the optimized implementation ran slower than the baseline, typically due to benchmarking noise.
Function Original (ms) Optimized (ms) Avg Speedup (%) Median Speedup (%) Median Times Faster Trimmed Mean Speedup (%) Best Run (%) Worst Run (%)
xorU32 vs parityU32 996.90 201.58 79.78 80.04 5.01x 79.85 82.88 69.02
xorU64 vs parityU64 255.59 40.71 84.02 83.87 6.20x 83.96 90.34 82.64
xorU128 vs parityU128 150.56 31.45 79.17 79.73 4.93x 79.50 85.37 40.79
xorU256 vs parityU256 81.86 17.19 78.99 79.22 4.81x 79.07 84.40 65.85
MattransM8 vs Opt MattransM8 89.40 36.62 60.01 65.12 2.86x 65.11 72.67 -452.38
MattransM16 vs Opt MattransM16 213.58 175.37 17.87 27.80 1.38x 21.96 45.96 -410.65
MattransM32 vs Opt MattransM32 56.48 40.53 26.61 26.92 1.36x 27.29 70.77 -84.62
MattransM64 vs Opt MattransM64 125.32 92.04 25.94 25.41 1.34x 25.72 56.54 17.21
MattransM128 vs Opt MattransM128 46.83 41.34 11.69 11.11 1.12x 11.84 37.50 -29.21
MattransM256 vs Opt MattransM256 188.69 168.67 9.61 9.84 1.10x 9.40 65.05 -25.97
MatMulVecM128 vs Opt MatMulVecM128 442.97 93.53 78.71 79.44 4.86x 79.12 92.83 24.75
MatMulMatM64 vs Opt MatMulMatM64 1171.56 242.89 79.27 79.46 4.86x 79.45 82.42 58.28
MatMulMatM128 vs Opt MatMulMatM128 545.55 115.41 78.85 78.90 4.73x 78.84 79.05 78.68
MatMulMatM256 vs Opt MatMulMatM256 1164.92 282.11 75.78 75.77 4.12x 75.78 76.00 75.60

Summary:

Performance Speedups

  • Parity functions achieve ~5-6x speedups (~80% reduction in runtime).
  • Matrix transposes yields modest improvements (1.1-2.9x) with more variability.
  • Matrix multiplications show ~4-5x speedups, highly consistent.

Benchmark Data Sizes

Function Data size per repetition
xorU32 vs parityU32 50,000,000
xorU64 vs parityU64 10,000,000
xorU128 vs parityU128 5,000,000
xorU256 vs parityU256 1,250,000
MattransM8 vs Opt MattransM8 1,000,000
MattransM16 vs Opt MattransM16 1,000,000
MattransM32 vs Opt MattransM32 100,000
MattransM64 vs Opt MattransM64 100,000
MattransM128 vs Opt MattransM128 10,000
MattransM256 vs Opt MattransM256 10,000
MatMulVecM128 vs Opt MatMulVecM128 100,000
MatMulMatM64 vs Opt MatMulMatM64 10,000
MatMulMatM128 vs Opt MatMulMatM128 1000
MatMulMatM256 vs Opt MatMulMatM256 500

Metric Definitions

  1. Speedup (%) $$ \text{Speedup}(%) = 100 \times \frac{\text{origTime} - \text{optTime}}{\text{origTime}} $$
  2. Times Faster $$ \text{TimesFaster} = \frac{\text{origTime}}{\text{optTime}} $$
  3. Trimmed Mean Speedup Defined as the mean speedup with the maximum and minumum values removed: $$ \text{TrimmedMean}(s) = \frac{1}{n-2} \sum_{i=2}^{n-1} s_{(i)}, $$ where $s_{(i)}$ are the sorted speedups.

WBMatrix

An Optimized Matrix Library for White-Box Block Cipher Implementations.

Contains the matrix operations related to the white-box block cipher implementation and provides thorough test cases for their performance and accuracy. The test cases also include the Chow et al.'s white-box AES and Xiao-Lai's white-box SM4 implementations built by WBMatrix, NTL, and M4RI, respectively.

A preview version was released at Nexus-TYF/WBMatrix. But this repository is the latest version of WBMatrix and is relevant to the paper "WBMatrix: An Optimized Matrix Library for White-Box Block Cipher Implementations" by Yufeng Tang, Zheng Gong, Tao Sun, Jinhai Chen and Zhe Liu in IEEE Transactions on Computers.

DOI: 10.1109/TC.2022.3152449.

Applications

  1. CEJO White-box AES

  2. Table Redundancy Method for White-box AES

  3. Xiao-Lai White-box SM4

  4. Xiao-Lai White-box AES

  5. Improved Masking for White-box AES

  6. WBMatrix based LowMC

  7. Bai-Wu White-box SM4

Clone

$ git clone https://github.com/scnucrypto/WBMatrix.git

Matrix Library

Supports For Following Operations (4/8/16/32/64/128/256 bits)

  • Matrix-Vector multiplication.
  • Matrix-Matrix multiplication.
  • Generation of an invertible Matrix with its inverse matrix (pairwise invertible matrices).
  • Generation of the pairwise invertible affine transformations.
  • Matrix transpositon.
  • Affine transformation.
  • Encodings concatenation.
  • Encodings conversion.

Header Files

  • WBMatrix.h The declaration of the main functions.
  • struture.h Data structure of the matrix and affine functions.
  • random.h The declaration of the random functions.

Main Functions (8bit in Example)

  • affineU8(Aff8 aff, uint8_t arr) affine transformation for an uint8_t number arr, and returns an uint8_t result.
  • affinemixM8(Aff8 aff, Aff8 preaff_inv, Aff8 *mixaff) affine conversion between aff and preaff_inv, result is set in mixaff.
  • affinecomM8to32(Aff8 aff1, Aff8 aff2, Aff8 aff3, Aff8 aff4, Aff32 *aff) affine concatenation, the matrix part of aff consists of the submatrices on its diagonal, while the vector part of aff consists of the subvectors.
  • copyM8(M8 Mat1, M8 *Mat2) replicates the matrix Mat1 to Mat2.
  • flipbitM8(M8 *Mat, int i, int j) flips the (i, j) bit in matrix Mat.
  • genMatpairM8(M8 *Mat, M8 *Mat_inv) generates an invertible matrix Mat and its inverse matrix Mat_inv.
  • genaffinepairM8(Aff8 *aff, Aff8 *aff_inv) generates an affine transformation aff and its inversion aff_inv.
  • identityM8(M8 *Mat) converts the matrix Mat into an identity matrix.
  • invsM8(M8 Mat, M8 *Mat_inv) calculates the inversion of Mat by Gaussian elimination method, result is set in Mat_inv.
  • isinvertM8(M8 Mat) determines if the matrix is invertible (1 for positive).
  • MatMulVecM8(M8 Mat, V8 Vec, V8 *ans) multiplication between a matrix Mat and a vertor Vec, result is set in ans.
  • MatMulNumM8(M8 Mat, uint8_t n) multiplication between a matrix Mat and a number n, returns a number.
  • MatMulMatM8(M8 Mat1, M8 Mat2, M8 *Mat) multiplication between a matrix Mat1 and a matrix Mat2, result is set in Mat.
  • MatAddMatM8(M8 Mat1, M8 Mat2, M8 *Mat) addition between the matrix Mat1 and Mat2, result is set in Mat.
  • MattransM8(M8 Mat, M8 *Mat_trans) transpositon for a matrix Mat, result is set in Mat_trans.
  • readbitM8(M8 Mat, int i, int j) extracts the (i, j) bit in matrix Mat, returns 0/1.
  • setbitM8(M8 *Mat, int i, int j, int bit) assigns the (i, j) bit a value bit (0/1).
  • initM8(M8 *Mat) converts all the elements of the matrix Mat into 0.
  • randM8(M8 *Mat) generates a random matrix Mat.
  • printbitM8(M8 Mat) prints all the elements of the matrix Mat.
  • isequalM8(M8 Mat1, M8 Mat2) determines if the matrix Mat1 is equal to Mat2 (1 for positive).
  • initV8(V8 *Vec) converts all the elements of the vector Vec into 0.
  • randV8(V8 *Vec) generates a random vector Vec.
  • VecAddVecV8(V8 Vec1, V8 Vec2, V8 *Vec) addition between the vector Vec1 and Vec2, result is set in Vec.
  • HWU8(uint8_t n) calculates the Hamming Weight of a number n.

Code Examples

M8 mat[3]; //defines an 8-bit matrix.
genMatpairM8(&mat[0], &mat[1]); //generates the pairwise invertible matrices.
MatMulMatM8(mat[0], mat[1], &mat[2]); //matrix-matrix multiplication.
printM8(mat[2]); //prints the matrix.

Included library

  1. RandomSequence

Test Cases

Folder Introduction

  • github1_M4RI The performance test for matrix operation and for the generation of the pairwise invertible matirces by M4RI library.
  • github(x) The performance test for the generation of an invertible matrix or the computation of its invertion by the implementations on Github.
  • NTL The performance test for matrix operation and for the generation of the pairwise invertible matirces by NTL library.
  • randomness A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications (NIST Special Publication 800-22 Revision 1a).
  • WBAES A Chow et al.'s white-box AES implementation and its performance test built by WBMatrix, NTL, and M4RI respectively.
  • WBSM4 A Xiao-Lai's white-box SM4 implementation and its performance test built by WBMatrix, NTL, and M4RI respectively.

File Introduction

  • Accuracy_test.c Accuracy test for the matrix operations in WBMatrix.
  • BasisMatrixMethod_test.c Performance test for the generation of the pairwise invertible matrices by Basis Matrix Method.
  • LowMCMethod_text.cpp Performance test for the generation of the pairwise invertible matrices by LowMC Method and Gaussian Elimination method.
  • RGEMethod_test.c Performance test for the generation of the pairwise invertible matrices by Reverse Gaussian Elimination Method and Gaussian Elimination method.
  • RLUDMethod_test.c Performance test for the generation of the pairwise invertible matrices by Reverse LU Decomposition Method and Gaussian Elimination method.
  • WBGEMethod_test.c Performance test for the generation of the pairwise invertible matrices by Randomly Generate and Verify Method and Gaussian Elimination method.
  • WBMatrixMatOp_test.c Performance test for the matrix operations in WBMatrix.
  • WBMatrixMethod_test.c Performance test for the generation of the pairwise invertible matrices by WBMatrix Method.

Build

$ mkdir build
$ cd build
$ cmake ..
$ make

Run

$ ./WBMM

Included libraries

  1. NTL
  2. M4RI
  3. SMx-SM4
  4. WhiteBoxAES
  5. sp800_22_tests
  6. Inverse-matrix
  7. Inverse-Matrix
  8. parallelMatrixInversion
  9. InvertibleMatrix
  10. Inverse-of-Matrix
  11. inverseMatrix
  12. lowmc

Last Updated : 2022/02/20
WBMatrix Version: 3.3.0


Upgrade history:
(2019/12/9)

  1. Added: An invertible matrix is generated from an initialized matrix (now just supports for 8/32bits operations).
  2. Fixed: Unifies the API.
  3. Added: The adjustable generation times in inverse.h.
  4. Added: Uses initinvbaseM(8/32)() function to generate an initialized invertible matrix and its trails are recorded in basetrailM(8/32).
    8bits default value is 10,
    32bits default value is 30,
    which represent the operation times.
  5. Added: If not use the initialized function then each matrix is generated from an identity matrix with the default times.
  6. Added: Copy function to replace the identify function.

(2019/12/10)

  1. Added: 16/64/128bits inverse matrix functions.
    New method has been covered.

(2019/12/11)

  1. Added: 16/64bit affine transformation.
  2. Added: 128bit affine transformation.
    No retrun value because of its special structure.

(2019/12/12)

  1. Added: 16/64/128bit affine combination operation.

(2019/12/16)

  1. Added: the header files for a defination of the matices.

(2019/12/17)

  1. Fixed: Error fixes.
  2. Added: The parameters for initializing the intermediate matrix function.
    inverse.h has the max times and min times for selection.

(2020/01/08)

  1. Added: Matrix addition function.

(2020/01/10)

  1. Improved: File tidying.
  2. Added: WBMatrix test.
  3. Added: Matrix Basis Method test.

(2020/01/12)

  1. Added: 128bit test for matrix basis method.

(2020/01/18)

  1. Added: Updates the test case of the generation of an invertible matrix and the computation of its inverse matrix.
  2. Added: Invertible funcions: Matrix Basis Method, WBMatrix Method, Reverse Gaussian Elimination Method.
  3. Added: Inverse functions: WBMatrix Method, Matrix Basis Method.

(2020/01/20)

  1. Added: CMakeLists.txt
  2. Added: M4RI Method.

(2020/01/21)

  1. Improved: Organizes file structure, especially fixs the structure.h and .c errors.

(2020/01/22)

  1. Improved: Deletes xor.h.

(2020/01/30)

  1. Added: Gaussian elimination Method (Based on WBMatrix).
  2. Improved: Changes the generation function of a random Matrix.

(2020/01/31)

  1. Added: Reverse LU Decomposition Method.

(2020/02/01)

  1. Improved: Functions for the generation of a random matrix.

(2020/02/02)

  1. Added: Comparison test on github.
  2. Added: Accuracy Test.
  3. Improved: Parameter Orders of the affinemix function.

(2020/02/07)

  1. Fixed: Multipe defination of the global variables.
  2. Added: Function for random seed.
  3. Added: WBAES.

(2020/02/09)

  1. Fixed: Poor randomness of the random matrix function.
  2. Added: Function for estimating the invertibility of a matrix.

(2020/02/16)

  1. Added: New test cases from github.

(2020/03/05)

  1. Added: Performance test cases of M4RI: basic arithmetic with matrix.
  2. Added: Performance test cases of NTL.
  3. Added: Performance test cases of WBMatrix.

(2020/03/06)

  1. Added: Vector addition funcion.
  2. Fixed: Accuracy test mode.
  3. Improved: Replaces the rotation with a logical-AND.

(2020/03/07)

  1. Added: WBAES by M4RI.

(2020/03/09)

  1. Added: WBAES by WBMatrix.

(2020/03/10)

  1. Added: WBSM4 by M4RI.
  2. Fixed: The release version of WBAES (WBMatrix version).
  3. Added: WBSM4 by WBMatrix.

(2020/03/11)

  1. Added: WBSM4 by NTL.
  2. Improved: Clean-up NTL files.

(2020/03/15)

  1. Added: Release on github.

(2020/04/15)

  1. Added: Supports for returning Hamming Weight.
  2. Added: An example for mitigating DCA attack.

(2020/06/22)

  1. Added: The references of the articles and implementations.
  2. Fixed: Errors of the random function in Linux.

(2020/06/25)

  1. Added: Randomness test cases (Special Publication 800-22 Revision 1a).

(2020/07/01)

  1. Fixed: Updates the random functions.

(2020/07/31)

  1. Added: Updates the new method for generating the pairwise invetible matrices.
  2. Added: Bitwise operation (read/flip/set) functions.
  3. Added: The function for calculating the inversion of an invertible matrix by Gaussian elimination method.

(2020/08/01)

  1. Added: Supports for 4-bit matrix operations.
  2. Added: 8to64, 8to128, 16to64, 32to128, 16to128 concatenation functions.

(2020/08/09)

  1. Fixed: Errors of the comments in misc.c.
  2. Added: 4-bit test cases.

(2020/08/10)

  1. Added: Supports for C++.
  2. Added: LowMC Method.

(2020/08/24)

  1. Fixed: Free from C99.

(2020/09/29)

  1. Added: A new matrix transposition function.

(2021/01/12)

  1. Added: Supports for partial 256-bit operations.
  2. Added: Partial 256-bit test cases.

Releases

No releases published

Packages

No packages published

Languages

  • C 94.0%
  • C++ 3.5%
  • Python 2.4%
  • CMake 0.1%