High Performance High Bandwidth Sparse-Dense Matrix Multiplication on HBM-equipped FPGAs
HiSpMM is an FPGA-accelerated Sparse Matrix-Matrix Multiplication (SpMM) implementation targeting the Xilinx Alveo U280 accelerator card.
HiSpMM accelerator architecture showing Processing Element Groups (PEGs), Dense Row Distribution Network, Accumulator Groups (ACCGs), Arbiters, and Compute_C modules with HBM channel connections.
| Variant | PEs | A Ports | HBM Channels | Best For | Floorplan Strategy |
|---|---|---|---|---|---|
| HiSpMM-balanced | 80 | 10 | 22 | Balanced workloads | SLR_LEVEL_FLOORPLANNING |
| HiSpMM-imbalanced | 64 | 8 | 20 | Imbalanced workloads | HALF_SLR_LEVEL_FLOORPLANNING |
- High Performance: Leverages multiple processing elements (PEs) for parallel SpMM computation
- HBM Utilization: Efficiently uses up to 22 HBM channels for high memory bandwidth
- Two Design Variants: Optimized configurations for balanced and imbalanced workloads
- Pre-built Bitstreams: Ready-to-use FPGA bitstreams for immediate deployment
- Software Simulation: Test functionality without FPGA hardware
- TAPA-based Design: Task-parallel architecture
- FPGA: Xilinx Alveo U280 Data Center Accelerator Card
- Platform:
xilinx_u280_gen3x16_xdma_1_202211_1
- OS: Linux (Ubuntu 18.04 recommended)
- Xilinx Tools:
- Vitis 2023.2
- XRT (Xilinx Runtime) 2.14+
- Xilinx HLS (included in Vitis)
- TAPA: Task-Parallel HLS
- Compiler: g++ with C++17 support
- Libraries:
tapa- TAPA runtime libraryfrt- FRT (FPGA Runtime) libraryglog- Google Logging librarygflags- Google Commandline Flags libraryOpenCL- OpenCL runtime
Download and install Vitis Unified Software Platform and set up the environment:
source /opt/xilinx/Vitis/2022.2/settings64.sh
source /opt/xilinx/xrt/setup.shFollow the TAPA installation guide:
sudo apt-get install libgoogle-glog-dev libgflags-dev ocl-icd-opencl-devgit clone https://github.com/SFU-HiAccel/HiSpMM.git
cd HiSpMM# For balanced variant
cd HiSpMM-balanced
make host
# For imbalanced variant
cd HiSpMM-imbalanced
make hostPre-built bitstreams are provided. To rebuild from source:
# Step 1: Generate TAPA hardware object (.xo)
make tapa
# Step 2: Build bitstream (.xclbin)
make hw-buildNote: The variants have high resource utilization that may require multiple build attempts to succeed.
./hispmm [--bitstream=<path>] <matrix.mtx> <iterations> [dense_cols]| Argument | Description |
|---|---|
--bitstream |
Path to .xclbin file (omit for software simulation) |
<matrix.mtx> |
Input sparse matrix in Matrix Market format |
<iterations> |
Number of SpMM iterations to run |
[dense_cols] |
Number of columns in dense matrix B (default: N=128) |
Test the design without hardware:
# HiSpMM-balanced
cd HiSpMM-balanced
make host
make sw-test
# Or manually:
./hispmm matrices/airfoil_2d.mtx 1 32
# HiSpMM-imbalanced
cd HiSpMM-imbalanced
make host
make sw-test
# Or manually:
./hispmm matrices/hangGlider_3.mtx 1 32Run on actual FPGA hardware:
# HiSpMM-balanced
cd HiSpMM-balanced
./hispmm --bitstream=bitstream/SpMM_xilinx_u280_gen3x16_xdma_1_202211_1.xclbin matrices/airfoil_2d.mtx 1000
# HiSpMM-imbalanced
cd HiSpMM-imbalanced
./hispmm --bitstream=bitstream/SpMM_xilinx_u280_gen3x16_xdma_1_202211_1.xclbin matrices/hangGlider_3.mtx 1000If you rebuilt the bitstream:
make hw-test
# Or manually:
./hispmm --bitstream=vitis_run_hw/hispmm.xilinx_u280_gen3x16_xdma_1_202211_1.hw.xclbin <matrix.mtx> <iterations>| Target | Description |
|---|---|
make host |
Build the host application |
make tapa |
Generate TAPA hardware object (.xo) |
make hw-build |
Build FPGA bitstream from .xo |
make sw-test |
Run software simulation test |
make hw-test |
Run hardware test (requires FPGA) |
make clean |
Remove built executables |
Sparse matrices must be in Matrix Market (.mtx) format.
| Matrix | Location | Description |
|---|---|---|
airfoil_2d.mtx |
matrices/ |
2D airfoil mesh |
hangGlider_3.mtx |
matrices/ |
Optimal control problem |
This project is licensed under the BSD-3-Clause License - see the LICENSE file for details.
If you use HiSpMM in your research, please cite:
@article{hispmm,
author = {Sedigh Baroughi, Ahmad and Rajashekar, Manoj B. and Baranwal, Akhil R. and Fang, Zhenman},
title = {HiSpMM: High Performance High Bandwidth Sparse-Dense Matrix Multiplication on HBM-equipped FPGAs},
year = {2025},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {1936-7406},
url = {https://doi.org/10.1145/3774327},
doi = {10.1145/3774327},
note = {Just Accepted},
journal = {ACM Trans. Reconfigurable Technol. Syst.},
month = oct,
keywords = {SpMM, Imbalanced Workload, FPGA Accelerator, High Level Synthesis, Design Space Exploration}
}Release the Automation Tool.
- TAPA - Task-Parallel High-Level Synthesis framework
- Xilinx - FPGA platform and development tools
- SuiteSparse Matrix Collection - Test matrices
For questions or issues contact asa582@sfu.ca.
Repository: https://github.com/SFU-HiAccel/HiSpMM
