Q-gym: An Equality Saturation Framework for DNN Inference Exploiting Weight Repetition

Cheng Fu, Hanxian Huang, Bram Wasti, Chris Cummins, Riyadh Baghdadi, Kim Hazelwood, Yuandong Tian, Jishen Zhao, Hugh Leather

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The high computation cost is one of the key bottlenecks for adopting deep neural networks (DNNs) in different hardware. When client data are sensitive, privacy-preserving DNN evaluation method, such as homomorphic encryptions (HE), shows even more computation cost. Prior works employed weight repetition in quantized neural networks to save the computation of convolutions by memorizing or arithmetic factorization. However, such methods fail to fully exploit the exponential search space from factorizing and reusing computation. We propose Q-gym, a DNN framework consisting of two components. First, we propose a compiler, which leverages equality saturation to generate computation expressions for convolutional layers with a signifcant reduction in the number of operations. Second, we integrate the computation expressions with various parallelization methods to accelerate DNN inference on different hardware. We also employ the efcient expressions to accelerate DNN inference under HE. Extensive experiments show that Q-gym achieves 19.1%/68.9% more operation reductions compared to SumMerge and original DNNs. Also, computation expressions from Q-gym contribute to 2.56×/1.78× inference speedup on CPU/GPU compared to OneDNN and PyTorch GPU on average. For DNN evaluation under HE, Qgym reduces the homomorphic operations by 2.47×/1.30× relative to CryptoNet and FastCryptoNet for HE tasks with only 0.06% accuracy loss due to quantization.

Original languageEnglish (US)
Title of host publicationPACT 2022 - Proceedings of the 2022 International Conference on Parallel Architectures and Compilation Techniques
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages291-303
Number of pages13
ISBN (Electronic)9781450398688
DOIs
StatePublished - Oct 8 2022
Event31st International Conference on Parallel Architectures and Compilation Techniques, PACT 2022 - Chicago, United States
Duration: Oct 8 2022Oct 10 2022

Publication series

NameParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
ISSN (Print)1089-795X

Conference

Conference31st International Conference on Parallel Architectures and Compilation Techniques, PACT 2022
Country/TerritoryUnited States
CityChicago
Period10/8/2210/10/22

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Q-gym: An Equality Saturation Framework for DNN Inference Exploiting Weight Repetition'. Together they form a unique fingerprint.

Cite this