A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

Subhankar Pal, Dong Hyeon Park, Siying Feng, Paul Gao, Jielun Tan, Austin Rovinski, Shaolin Xie, Chun Zhao, Aporva Amarnath, Timothy Wesley, Jonathan Beaumont, Kuan Yu Chen, Chaitali Chakrabarti, Michael Taylor, Trevor Mudge, David Blaauw, Hun Seok Kim, Ronald Dreslinski

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm × 2.6 mm chip exhibits 12.6 × (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.

Original languageEnglish (US)
Title of host publication2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers
PublisherInstitute of Electrical and Electronics Engineers Inc.
PagesC150-C151
ISBN (Electronic)9784863487185
DOIs
StatePublished - Jun 2019
Event33rd Symposium on VLSI Circuits, VLSI Circuits 2019 - Kyoto, Japan
Duration: Jun 9 2019Jun 14 2019

Publication series

NameIEEE Symposium on VLSI Circuits, Digest of Technical Papers
Volume2019-June

Conference

Conference33rd Symposium on VLSI Circuits, VLSI Circuits 2019
Country/TerritoryJapan
CityKyoto
Period6/9/196/14/19

Keywords

  • decoupled access-execution
  • reconfigurablility and accelerator
  • Sparse matrix multiplier
  • synthesizable crossbar

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm'. Together they form a unique fingerprint.

Cite this