Sparse-TPU: Adapting systolic arrays for sparse matrices

Xin He, Subhankar Pal, Aporva Amarnath, Siying Feng, Dong Hyeon Park, Austin Rovinski, Haojie Ye, Yuhan Chen, Ronald Dreslinski, Trevor Mudge

Research output: Chapter in Book/Report/Conference proceedingConference contribution


While systolic arrays are widely used for dense-matrix operations, they are seldom used for sparse-matrix operations. In this paper, we show how a systolic array of Multiply-and-Accumulate (MAC) units, similar to Google's Tensor Processing Unit (TPU), can be adapted to efficiently handle sparse matrices. TPU-like accelerators are built upon a 2D array of MAC units and have demonstrated high throughput and efficiency for dense matrix multiplication, which is a key kernel in machine learning algorithms and is the target of the TPU. In this work, we employ a co-designed approach of first developing a packing technique to condense a sparse matrix and then propose a systolic array based system, Sparse-TPU, abbreviated to STPU, to accommodate the matrix computations for the packed denser matrix counterparts. To demonstrate the efficacy of our co-designed approach, we evaluate sparse matrix-vector multiplication on a broad set of synthetic and real-world sparse matrices. Experimental results show that STPU delivers 16.08X higher performance while consuming 4.39X and 19.79X lower energy for integer (int8) and floating point (float32) implementations, respectively, over a TPU baseline. Meanwhile, STPU has 12.93% area overhead and an average of 4.14% increase in dynamic energy over the TPU baseline for the float32 implementation.

Original languageEnglish (US)
Title of host publicationProceedings of the 34th ACM International Conference on Supercomputing, ICS 2020
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450379830
StatePublished - Jun 29 2020
Event34th ACM International Conference on Supercomputing, ICS 2020 - Barcelona, Spain
Duration: Jun 29 2020Jul 2 2020

Publication series

NameProceedings of the International Conference on Supercomputing


Conference34th ACM International Conference on Supercomputing, ICS 2020


  • application-specific hardware
  • hardware accelerators
  • hardware-software codesign
  • sparse matrix condensing
  • sparse matrix processing
  • systolic array

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'Sparse-TPU: Adapting systolic arrays for sparse matrices'. Together they form a unique fingerprint.

Cite this