TY - GEN
T1 - Q-gym
T2 - 31st International Conference on Parallel Architectures and Compilation Techniques, PACT 2022
AU - Fu, Cheng
AU - Huang, Hanxian
AU - Wasti, Bram
AU - Cummins, Chris
AU - Baghdadi, Riyadh
AU - Hazelwood, Kim
AU - Tian, Yuandong
AU - Zhao, Jishen
AU - Leather, Hugh
N1 - Publisher Copyright:
© 2022 Association for Computing Machinery.
PY - 2022/10/8
Y1 - 2022/10/8
N2 - The high computation cost is one of the key bottlenecks for adopting deep neural networks (DNNs) in different hardware. When client data are sensitive, privacy-preserving DNN evaluation method, such as homomorphic encryptions (HE), shows even more computation cost. Prior works employed weight repetition in quantized neural networks to save the computation of convolutions by memorizing or arithmetic factorization. However, such methods fail to fully exploit the exponential search space from factorizing and reusing computation. We propose Q-gym, a DNN framework consisting of two components. First, we propose a compiler, which leverages equality saturation to generate computation expressions for convolutional layers with a signifcant reduction in the number of operations. Second, we integrate the computation expressions with various parallelization methods to accelerate DNN inference on different hardware. We also employ the efcient expressions to accelerate DNN inference under HE. Extensive experiments show that Q-gym achieves 19.1%/68.9% more operation reductions compared to SumMerge and original DNNs. Also, computation expressions from Q-gym contribute to 2.56×/1.78× inference speedup on CPU/GPU compared to OneDNN and PyTorch GPU on average. For DNN evaluation under HE, Qgym reduces the homomorphic operations by 2.47×/1.30× relative to CryptoNet and FastCryptoNet for HE tasks with only 0.06% accuracy loss due to quantization.
AB - The high computation cost is one of the key bottlenecks for adopting deep neural networks (DNNs) in different hardware. When client data are sensitive, privacy-preserving DNN evaluation method, such as homomorphic encryptions (HE), shows even more computation cost. Prior works employed weight repetition in quantized neural networks to save the computation of convolutions by memorizing or arithmetic factorization. However, such methods fail to fully exploit the exponential search space from factorizing and reusing computation. We propose Q-gym, a DNN framework consisting of two components. First, we propose a compiler, which leverages equality saturation to generate computation expressions for convolutional layers with a signifcant reduction in the number of operations. Second, we integrate the computation expressions with various parallelization methods to accelerate DNN inference on different hardware. We also employ the efcient expressions to accelerate DNN inference under HE. Extensive experiments show that Q-gym achieves 19.1%/68.9% more operation reductions compared to SumMerge and original DNNs. Also, computation expressions from Q-gym contribute to 2.56×/1.78× inference speedup on CPU/GPU compared to OneDNN and PyTorch GPU on average. For DNN evaluation under HE, Qgym reduces the homomorphic operations by 2.47×/1.30× relative to CryptoNet and FastCryptoNet for HE tasks with only 0.06% accuracy loss due to quantization.
UR - http://www.scopus.com/inward/record.url?scp=85147329622&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147329622&partnerID=8YFLogxK
U2 - 10.1145/3559009.3569673
DO - 10.1145/3559009.3569673
M3 - Conference contribution
AN - SCOPUS:85147329622
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 291
EP - 303
BT - PACT 2022 - Proceedings of the 2022 International Conference on Parallel Architectures and Compilation Techniques
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 8 October 2022 through 10 October 2022
ER -