🌲 SparseFlow v0.2.0

Generalized N:M Sparse Compiler for AI Inference (MLIR + CPU Runtime)

SparseFlow is a next-generation MLIR-based compiler that detects and exploits generalized structured sparsity (N:M) in AI workloads.

Unlike traditional sparse libraries (limited to 2:4 or fully unstructured), SparseFlow supports any N:M block pattern and achieves massive CPU acceleration using compile-time analysis + custom sparse kernels.

🚀 Key Features (v0.2.0)

✅ Generalized N:M Sparsity

Supports the following patterns out of the box:

1:4
2:4
2:8
4:16
8:32

✅ MLIR Compiler Integration

SPA Pass — Static sparsity analysis
Rewrite Pass — Converts dense matmuls → sparse kernels
Export Pass — Dumps metadata
Pluggable runtime lowering

✅ Optimized CPU Runtime

5 hand-tuned OpenMP kernels
Contiguous block loads
Branch-free inner loops
High cache locality
Designed for future SIMD + GPU backend

✅ Real Performance

SparseFlow achieves 9×–20× speedup on CPU for realistic matrix sizes, significantly outperforming typical sparse CPU libraries.

📊 Detailed Benchmarks

Full dense vs N:M sparse CPU benchmark tables (all patterns and sizes) are in:

BENCHMARK_RESULTS_v0.2.md

📊 Benchmark Results (REAL HARDWARE)

Benchmarks compare dense vs SparseFlow sparse kernels on CPU.

Matrix Size	Typical Speedup	Peak Speedup
256×256	3×–8×	8×
512×512	8×–12×	12×
1024×1024	9×–20×	20×

Stable patterns frequently hit:

1:4 → ~18×
2:8 → ~18×
4:16 → ~20×

These numbers are based on multiple runs and exclude outlier spikes.

🧪 Example Benchmark Output

Matrix Size: 1024×1024
┌─────────┬────────────┬────────────┬──────────┬───────────┐
│ Pattern │ Dense (ms) │ Sparse (ms)│ Speedup  │ Density   │
├─────────┼────────────┼────────────┼──────────┼───────────┤
│ 1:4     │ 12618.09   │ 670.56     │ 18.82×   │ 25%       │
│ 2:4     │ 14662.58   │ 1626.62    │ 9.01×    │ 50%       │
│ 2:8     │ 13843.85   │ 769.59     │ 17.99×   │ 25%       │
│ 4:16    │ 10886.07   │ 544.07     │ 20.01×   │ 25%       │
└─────────┴────────────┴────────────┴──────────┴───────────┘

🏗 Compiler Pipeline

SparseFlow transforms dense MLIR into sparse-optimized executable code:

PyTorch / ONNX → MLIR → SPA Pass → Rewrite Pass → LLVM → Sparse Runtime

1. SPA Pass

Identifies sparse regions and marks tensors with {n, m} metadata.

2. Rewrite Pass

Replaces linalg.matmul with:

func.call @sparse_matmul_N_M(...)

Dynamically choosing the correct sparse kernel.

3. Runtime

Backed by optimized C++/OpenMP kernels:

sparse_matmul_1_4
sparse_matmul_2_4
sparse_matmul_2_8
sparse_matmul_4_16
sparse_matmul_8_32

🧩 Supported Sparsity Patterns

A pattern N:M means:

For every M consecutive weights
Exactly N are non-zero
Zeros are static at compile time
Blocks are memory contiguous

This allows:

Predictable skipping
SIMD-friendly loads
Low branch divergence
Great cache efficiency

🔬 Example MLIR Input

%A = tensor<16x16xf32> {n = 2 : i32, m = 8 : i32}
%B = tensor<16x16xf32>
%C = tensor<16x16xf32>

%0 = linalg.matmul ins(%A, %B)

After Rewrite Pass:

func.call @sparse_matmul_2_8(%A, %B, %C, %m, %k, %n)

📦 Build Instructions

git clone https://github.com/MapleSilicon/SparseFlow
cd SparseFlow/compiler
mkdir build && cd build
cmake -DCMAKE_PREFIX_PATH=/usr/lib/llvm-19 ..
make -j8

Run benchmarks

cd ../../runtime/build
./benchmark_nm_runtime

🗺 Roadmap

v0.3 (Q1 2026) — GPU Acceleration

CUDA kernels
Tensor Core support
30–60× expected speedup

v0.4 (Q2 2026) — PyTorch Integration

Python bindings
torch.compile backend
Model zoo support

v0.5 (Q3 2026) — Production Deployment

Cloud provider pilots
Enterprise safety and tooling

🤝 Contact

Email: maplesilicon1@gmail.com
GitHub: https://github.com/MapleSilicon/SparseFlow
Author: Gourav Kumar

🌲 SparseFlow

Generalized Sparse Compute for AI.
Simple. Fast. Open.

📊 Detailed Benchmarks

Full dense vs N:M sparse CPU benchmark tables (all patterns, sizes) are in:

BENCHMARK_RESULTS_v0.2.md

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.github		.github
benchmarks		benchmarks
compiler		compiler
demos/week4		demos/week4
runtime		runtime
scripts		scripts
sparseflow_package		sparseflow_package
tests		tests
tools		tools
week3		week3
.gitignore		.gitignore
ANNOUNCEMENT.md		ANNOUNCEMENT.md
BENCHMARKS.md		BENCHMARKS.md
BENCHMARK_RESULTS_v0.2.md		BENCHMARK_RESULTS_v0.2.md
CI_TRIGGER_v0_2.txt		CI_TRIGGER_v0_2.txt
CMakeLists.txt		CMakeLists.txt
FINAL_v0.2_STATUS.txt		FINAL_v0.2_STATUS.txt
GITHUB_RELEASE_v0.2.0.md		GITHUB_RELEASE_v0.2.0.md
PERFORMANCE_RESULTS.md		PERFORMANCE_RESULTS.md
QUICK_DEMO.md		QUICK_DEMO.md
QUICK_START.md		QUICK_START.md
README.md		README.md
README.md.backup		README.md.backup
RELEASE_NOTES_v0.1.0.md		RELEASE_NOTES_v0.1.0.md
RELEASE_NOTES_v0.2.0.md		RELEASE_NOTES_v0.2.0.md
RELEASE_NOTES_v0.7.1.md		RELEASE_NOTES_v0.7.1.md
ROADMAP.md		ROADMAP.md
SPA_STATUS.md		SPA_STATUS.md
WEEK2_FINAL_REPORT.md		WEEK2_FINAL_REPORT.md
WEEK2_GOALS.md		WEEK2_GOALS.md
WEEK4_STATUS.md		WEEK4_STATUS.md
analyze_spa_json.py		analyze_spa_json.py
build_all.sh		build_all.sh
ci_trigger.txt		ci_trigger.txt
debug_sparseflow_status.sh		debug_sparseflow_status.sh
estimate_spa_speedup.py		estimate_spa_speedup.py
generate_graphs.py		generate_graphs.py
masked_matmul_loops.py		masked_matmul_loops.py
pytorch_model.mlir		pytorch_model.mlir
quick_check.sh		quick_check.sh
run_all_tests.sh		run_all_tests.sh
run_benchmarks.sh		run_benchmarks.sh
run_cpp_benchmark.sh		run_cpp_benchmark.sh
run_spa_v06_demo.sh		run_spa_v06_demo.sh
run_spa_v06_full_demo.sh		run_spa_v06_full_demo.sh
run_sparseflow_demo.sh		run_sparseflow_demo.sh
simple_test.mlir		simple_test.mlir
spa-runner.sh		spa-runner.sh
spa_linalg_test.mlir		spa_linalg_test.mlir
spa_runtime.py		spa_runtime.py
spa_runtime_optimized.py		spa_runtime_optimized.py
spa_runtime_scaled.py		spa_runtime_scaled.py
spa_sparsity.json		spa_sparsity.json
spa_v06_export.sh		spa_v06_export.sh
spa_v3_test.mlir		spa_v3_test.mlir
spa_v4_comprehensive.mlir		spa_v4_comprehensive.mlir
spa_v4_test.mlir		spa_v4_test.mlir
spa_v5_2d_test.mlir		spa_v5_2d_test.mlir
spa_v5_simple.mlir		spa_v5_simple.mlir
spa_v5_test.mlir		spa_v5_test.mlir
spa_v6_matmul_2d.mlir		spa_v6_matmul_2d.mlir
sparseflow-opt.sh		sparseflow-opt.sh
sync_with_origin.sh		sync_with_origin.sh
system_test.mlir		system_test.mlir
test_annotate_spa_gap.mlir		test_annotate_spa_gap.mlir
test_complete_system.sh		test_complete_system.sh
test_package.py		test_package.py
test_runner.mlir		test_runner.mlir
test_simple_runner.mlir		test_simple_runner.mlir
test_spa_v6_2d.mlir		test_spa_v6_2d.mlir
test_spa_v6_full_2d.mlir		test_spa_v6_full_2d.mlir
test_v0.1_vs_v0.2_comparison.sh		test_v0.1_vs_v0.2_comparison.sh
test_v0_2.sh		test_v0_2.sh
week3_final_completion.json		week3_final_completion.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌲 SparseFlow v0.2.0

Generalized N:M Sparse Compiler for AI Inference (MLIR + CPU Runtime)

🚀 Key Features (v0.2.0)

✅ Generalized N:M Sparsity

✅ MLIR Compiler Integration

✅ Optimized CPU Runtime

✅ Real Performance

📊 Detailed Benchmarks

📊 Benchmark Results (REAL HARDWARE)

🧪 Example Benchmark Output

🏗 Compiler Pipeline

1. SPA Pass

2. Rewrite Pass

3. Runtime

🧩 Supported Sparsity Patterns

🔬 Example MLIR Input

After Rewrite Pass:

📦 Build Instructions

Run benchmarks

🗺 Roadmap

v0.3 (Q1 2026) — GPU Acceleration

v0.4 (Q2 2026) — PyTorch Integration

v0.5 (Q3 2026) — Production Deployment

🤝 Contact

🌲 SparseFlow

📊 Detailed Benchmarks

About

Uh oh!

Releases 2

Packages

Languages

MapleSilicon/SparseFlow

Folders and files

Latest commit

History

Repository files navigation

🌲 SparseFlow v0.2.0

Generalized N:M Sparse Compiler for AI Inference (MLIR + CPU Runtime)

🚀 Key Features (v0.2.0)

✅ Generalized N:M Sparsity

✅ MLIR Compiler Integration

✅ Optimized CPU Runtime

✅ Real Performance

📊 Detailed Benchmarks

📊 Benchmark Results (REAL HARDWARE)

🧪 Example Benchmark Output

🏗 Compiler Pipeline

1. SPA Pass

2. Rewrite Pass

3. Runtime

🧩 Supported Sparsity Patterns

🔬 Example MLIR Input

After Rewrite Pass:

📦 Build Instructions

Run benchmarks

🗺 Roadmap

v0.3 (Q1 2026) — GPU Acceleration

v0.4 (Q2 2026) — PyTorch Integration

v0.5 (Q3 2026) — Production Deployment

🤝 Contact

🌲 SparseFlow

📊 Detailed Benchmarks

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages