TY - GEN
T1 - MASR
T2 - 28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019
AU - Gupta, Udit
AU - Reagen, Brandon
AU - Pentecost, Lillian
AU - Donato, Marco
AU - Tambe, Thierry
AU - Rush, Alexander M.
AU - Wei, Gu Yeon
AU - Brooks, David
N1 - Funding Information:
ACKNOWLEDGEMENTS This work was supported by the Applications Driving Architectures (ADA) Research Center, a JUMP Center cosponsored by SRC and DARPA, the NSF under CCF-1704834, and Intel Corporation.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - Recurrent neural networks (RNNs) are becoming the de-facto solution for speech recognition. RNNs exploit long-Term temporal relationships in data by applying repeated, learned transformations. Unlike fully-connected (FC) layers with single vector matrix operations, RNN layers consist of hundreds of such operations chained over time. This poses challenges unique to RNNs that are not found in convolutional neural networks(CNNs) or FC models, namely large dynamic activation. In this paper we present MASR, a principled and modular architecture that accelerates bidirectional RNNs for on-chip ASR. MASR is designed to exploit sparsity in both dynamic activations and static weights. The architecture is enhanced by a series of dynamic activation optimizations that enable compact storage, ensure no energy is wasted computing null operations, and maintain high MAC utilization for highly parallel accelerator designs. In comparison to current state-of-The-Art sparse neural network accelerators (e.g., EIE), MASR provides 2×area 3×energy, and 1.6×performance benefits. The modular nature of MASR enables designs that efficiently scale from resource-constrained low-power IoT applications to large-scale, highly parallel datacenter deployments.
AB - Recurrent neural networks (RNNs) are becoming the de-facto solution for speech recognition. RNNs exploit long-Term temporal relationships in data by applying repeated, learned transformations. Unlike fully-connected (FC) layers with single vector matrix operations, RNN layers consist of hundreds of such operations chained over time. This poses challenges unique to RNNs that are not found in convolutional neural networks(CNNs) or FC models, namely large dynamic activation. In this paper we present MASR, a principled and modular architecture that accelerates bidirectional RNNs for on-chip ASR. MASR is designed to exploit sparsity in both dynamic activations and static weights. The architecture is enhanced by a series of dynamic activation optimizations that enable compact storage, ensure no energy is wasted computing null operations, and maintain high MAC utilization for highly parallel accelerator designs. In comparison to current state-of-The-Art sparse neural network accelerators (e.g., EIE), MASR provides 2×area 3×energy, and 1.6×performance benefits. The modular nature of MASR enables designs that efficiently scale from resource-constrained low-power IoT applications to large-scale, highly parallel datacenter deployments.
KW - Accelerator
KW - Recurrent neural networks
KW - automatic speech recognition
KW - deep neural network
KW - sparsity
UR - http://www.scopus.com/inward/record.url?scp=85075451469&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075451469&partnerID=8YFLogxK
U2 - 10.1109/PACT.2019.00009
DO - 10.1109/PACT.2019.00009
M3 - Conference contribution
AN - SCOPUS:85075451469
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 1
EP - 14
BT - Proceedings - 2019 28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 21 September 2019 through 25 September 2019
ER -