Row-Wise Product-Based Sparse Matrix Multiplication Hardware Accelerator With Optimal Load Balancing

Jong Hun Lee, Beomjin Park, Joonho Kong, Arslan Munir

Research output: Contribution to journalArticlepeer-review


Matrix multiplication is a main computation kernel of emerging workloads, such as deep neural networks and graph analytics. These workloads often exhibit high sparsity in data, which means a large portion of the elements in the data are zero-valued elements. Though systolic arrays have shown a significant performance and energy efficiency improvement over central processing units (CPUs) or graphic processing units (GPUs) when executing matrix multiplications, data sparsity is largely overlooked in the conventional systolic arrays. In this paper, we propose a row-wise product-based sparse matrix multiplication (SpMM) hardware accelerator for compressed sparse row (CSR)-formatted input matrices. Our hardware accelerator leverages row-wise product, which has advantages over inner-product or outer-product when executing the sparse matrix multiplications. As compared to the conventional systolic arrays, which cannot skip the ineffectual operations, our hardware accelerator only performs effectual operations with non-zero elements, improving the performance when executing SpMM. In addition, we also propose an optimal load balancing scheme when using multiple processing elements (PEs). Our load balancing scheme utilizes an operation count-based matrix tiling for parallel execution of the PEs and resource contention avoidance. According to our evaluation, our 32PE-SpMM accelerator shows 13.6× - 47.9× speedup over tensor processing unit (TPU)-like systolic arrays, on average. Furthermore, our operation count-based load balancing scheme shows better performance over the fixed tiling and non-zero element count-based tiling by up to 8.48% and 6.28%, respectively, with only up to 0.06% matrix tiling pre-processing latency overhead.

Original languageEnglish (US)
Pages (from-to)64547-64559
Number of pages13
JournalIEEE Access
StatePublished - 2022


  • load balancing
  • matrix tiling
  • row-wise product
  • Sparse matrix multiplication
  • speedup

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering


Dive into the research topics of 'Row-Wise Product-Based Sparse Matrix Multiplication Hardware Accelerator With Optimal Load Balancing'. Together they form a unique fingerprint.

Cite this