TY - JOUR
T1 - Selective Network Linearization for Efficient Private Inference
AU - Cho, Minsu
AU - Joshi, Ameya
AU - Garg, Siddharth
AU - Reagen, Brandon
AU - Hegde, Chinmay
N1 - Funding Information:
This work was supported in part by the National Science Foundation (under grants CCF-2005804 and 1801495), USDA/NIFA (under grant 2021-67021-35329), the Applications Driving Architectures (ADA) Research Center, a JUMP Center co-sponsored by Semiconductor Research Consortium (SRC) and the Defense Advanced Research Projects Agency (DARPA).
Publisher Copyright:
Copyright © 2022 by the author(s)
PY - 2022
Y1 - 2022
N2 - Private inference (PI) enables inference directly on cryptographically secure data. While promising to address many privacy issues, it has seen limited use due to extreme runtimes. Unlike plaintext inference, where latency is dominated by FLOPs, in PI non-linear functions (namely ReLU) are the bottleneck. Thus, practical PI demands novel ReLU-aware optimizations. To reduce PI latency we propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. We evaluate our algorithm on several standard PI benchmarks. The results demonstrate up to 4.25% more accuracy (iso-ReLU count at 50K) or 2.2× less latency (iso-accuracy at 70%) than the current state of the art and advance the Pareto frontier across the latency-accuracy space. To complement empirical results, we present a “no free lunch” theorem that sheds light on how and when network linearization is possible while maintaining prediction accuracy. Public code is available at https://github.com/NYU-DICE-Lab/selective_network_linearization.
AB - Private inference (PI) enables inference directly on cryptographically secure data. While promising to address many privacy issues, it has seen limited use due to extreme runtimes. Unlike plaintext inference, where latency is dominated by FLOPs, in PI non-linear functions (namely ReLU) are the bottleneck. Thus, practical PI demands novel ReLU-aware optimizations. To reduce PI latency we propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. We evaluate our algorithm on several standard PI benchmarks. The results demonstrate up to 4.25% more accuracy (iso-ReLU count at 50K) or 2.2× less latency (iso-accuracy at 70%) than the current state of the art and advance the Pareto frontier across the latency-accuracy space. To complement empirical results, we present a “no free lunch” theorem that sheds light on how and when network linearization is possible while maintaining prediction accuracy. Public code is available at https://github.com/NYU-DICE-Lab/selective_network_linearization.
UR - http://www.scopus.com/inward/record.url?scp=85163098444&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163098444&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85163098444
SN - 2640-3498
VL - 162
SP - 3947
EP - 3961
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 39th International Conference on Machine Learning, ICML 2022
Y2 - 17 July 2022 through 23 July 2022
ER -