TY - GEN
T1 - A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm
AU - Pal, Subhankar
AU - Park, Dong Hyeon
AU - Feng, Siying
AU - Gao, Paul
AU - Tan, Jielun
AU - Rovinski, Austin
AU - Xie, Shaolin
AU - Zhao, Chun
AU - Amarnath, Aporva
AU - Wesley, Timothy
AU - Beaumont, Jonathan
AU - Chen, Kuan Yu
AU - Chakrabarti, Chaitali
AU - Taylor, Michael
AU - Mudge, Trevor
AU - Blaauw, David
AU - Kim, Hun Seok
AU - Dreslinski, Ronald
N1 - Publisher Copyright:
© 2019 JSAP.
PY - 2019/6
Y1 - 2019/6
N2 - A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm × 2.6 mm chip exhibits 12.6 × (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.
AB - A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm × 2.6 mm chip exhibits 12.6 × (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.
KW - decoupled access-execution
KW - reconfigurablility and accelerator
KW - Sparse matrix multiplier
KW - synthesizable crossbar
UR - http://www.scopus.com/inward/record.url?scp=85073915110&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85073915110&partnerID=8YFLogxK
U2 - 10.23919/VLSIC.2019.8778147
DO - 10.23919/VLSIC.2019.8778147
M3 - Conference contribution
AN - SCOPUS:85073915110
T3 - IEEE Symposium on VLSI Circuits, Digest of Technical Papers
SP - C150-C151
BT - 2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 33rd Symposium on VLSI Circuits, VLSI Circuits 2019
Y2 - 9 June 2019 through 14 June 2019
ER -