A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

Subhankar Pal, Dong Hyeon Park, Siying Feng, Paul Gao, Jielun Tan, Austin Rovinski, Shaolin Xie, Chun Zhao, Aporva Amarnath, Timothy Wesley, Jonathan Beaumont, Kuan Yu Chen, Chaitali Chakrabarti, Michael Taylor, Trevor Mudge, David Blaauw, Hun Seok Kim, Ronald Dreslinski

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm×2.6 mm chip exhibits 12.6× (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and 17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.

Original languageEnglish (US)
Title of host publication2019 Symposium on VLSI Technology, VLSI Technology 2019 - Digest of Technical Papers
PublisherInstitute of Electrical and Electronics Engineers Inc.
PagesC150-C151
ISBN (Electronic)9784863487178
DOIs
StatePublished - Jun 2019
Event39th Symposium on VLSI Technology, VLSI Technology 2019 - Kyoto, Japan
Duration: Jun 9 2019Jun 14 2019

Publication series

NameDigest of Technical Papers - Symposium on VLSI Technology
Volume2019-June
ISSN (Print)0743-1562

Conference

Conference39th Symposium on VLSI Technology, VLSI Technology 2019
Country/TerritoryJapan
CityKyoto
Period6/9/196/14/19

Keywords

  • Sparse matrix multiplier
  • decoupled access-execution
  • reconfigurablility and accelerator
  • synthesizable crossbar

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm'. Together they form a unique fingerprint.

Cite this