TY - GEN
T1 - Mapping Systolic Arrays onto 3D Circuit Structures
T2 - 2018 IEEE Workshop on Signal Processing Systems, SiPS 2018
AU - Kung, H. T.
AU - McDanel, Bradley
AU - Zhang, Sai Qian
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/31
Y1 - 2018/12/31
N2 - In recent years, numerous designs have used systolic arrays to accelerate convolutional neural network (CNN) inference. In this work, we demonstrate that we can further speed up CNN inference and lower its power consumption by mapping systolic arrays onto 3D circuit structures as opposed to conventional 2D structures. Specifically, by operating in 3D space, a wide systolic array consisting of a number of subarrays can efficiently implement wide convolutional layers prevalent in state of the art CNNs. Additionally, by accumulating intermediate results along the third dimension, systolic arrays can process partitioned data channels in parallel with reduced data skew for lowered inference latency. We present a building block design using through-silicon vias (TSVs) for the 3D realization of systolic subarrays. We validate the 3D scheme using a 2.5D FPGA design and demonstrate that when mapped onto 3D structures wide systolic arrays can scale up in size without increasing wiring length in interconnecting subarrays. Further, by taking full advantage of 3D structures, we are able to pipeline inference across multiple layers of a CNN over a series of systolic arrays, dramatically reducing the inference time per input sample. These improvements lead to significantly reduced inference latency, which is especially important for real-time applications where it is common to process samples one at a time.
AB - In recent years, numerous designs have used systolic arrays to accelerate convolutional neural network (CNN) inference. In this work, we demonstrate that we can further speed up CNN inference and lower its power consumption by mapping systolic arrays onto 3D circuit structures as opposed to conventional 2D structures. Specifically, by operating in 3D space, a wide systolic array consisting of a number of subarrays can efficiently implement wide convolutional layers prevalent in state of the art CNNs. Additionally, by accumulating intermediate results along the third dimension, systolic arrays can process partitioned data channels in parallel with reduced data skew for lowered inference latency. We present a building block design using through-silicon vias (TSVs) for the 3D realization of systolic subarrays. We validate the 3D scheme using a 2.5D FPGA design and demonstrate that when mapped onto 3D structures wide systolic arrays can scale up in size without increasing wiring length in interconnecting subarrays. Further, by taking full advantage of 3D structures, we are able to pipeline inference across multiple layers of a CNN over a series of systolic arrays, dramatically reducing the inference time per input sample. These improvements lead to significantly reduced inference latency, which is especially important for real-time applications where it is common to process samples one at a time.
KW - 3D-IC implementation
KW - accelerator
KW - convolutional neural network (CNN)
KW - deep learning
KW - FPGA
KW - inference latency
KW - power consumption
KW - systolic array
KW - wiring length
UR - http://www.scopus.com/inward/record.url?scp=85061394353&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061394353&partnerID=8YFLogxK
U2 - 10.1109/SiPS.2018.8598454
DO - 10.1109/SiPS.2018.8598454
M3 - Conference contribution
AN - SCOPUS:85061394353
T3 - IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation
SP - 330
EP - 336
BT - Proceedings of the IEEE Workshop on Signal Processing Systems, SiPS 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 21 October 2018 through 24 October 2018
ER -