NeuroMAX: A High Throughput, Multi-Threaded, Log-Based Accelerator for Convolutional Neural Networks

Mahmood Azhar Qureshi, Arslan Munir

Research output: Contribution to journalConference articlepeer-review


Convolutional neural networks (CNNs) require high throughput hardware accelerators for real time applications owing to their huge computational cost. Most traditional CNN accelerators rely on single core, linear processing elements (PEs) in conjunction with 1D dataflows for accelerating convolution operations. This limits the maximum achievable ratio of peak throughput per PE count to unity. Most of the past works optimize their dataflows to attain close to a 100% hardware utilization to reach this ratio. In this paper, we introduce a high throughput, multi-threaded, log-based PE core. The designed core provides a 200% increase in peak throughput per PE count while only incurring a 6% increase in area overhead compared to a single, linear multiplier PE core with same output bit precision. We also present a 2D weight broadcast dataflow which exploits the multi-threaded nature of the PE cores to achieve a high hardware utilization per layer for various CNNs. The entire architecture, which we refer to as NeuroMAX, is implemented on Xilinx Zynq 7020 SoC at 200 MHz processing clock. Detailed analysis is performed on throughput, hardware utilization, area and power breakdown, and latency to show performance improvement compared to previous FPGA and ASIC designs.

Original languageEnglish (US)
Article number9256558
JournalIEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
StatePublished - Nov 2 2020
Event39th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2020 - Virtual, San Diego, United States
Duration: Nov 2 2020Nov 5 2020


  • Convolutional neural networks (CNNs)
  • hardware accelerator
  • hardware utilization
  • multi-threaded
  • throughput

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design


Dive into the research topics of 'NeuroMAX: A High Throughput, Multi-Threaded, Log-Based Accelerator for Convolutional Neural Networks'. Together they form a unique fingerprint.

Cite this