TY - GEN
T1 - Compact and Optimal Deep Learning with Recurrent Parameter Generators
AU - Wang, Jiayun
AU - Chen, Yubei
AU - Yu, Stella X.
AU - Cheung, Brian
AU - Lecun, Yann
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Deep learning has achieved tremendous success by training increasingly large models, which are then compressed for practical deployment. We propose a drastically different approach to compact and optimal deep learning: We decouple the Degrees of freedom (DoF) and the actual number of parameters of a model, optimize a small DoF with predefined random linear constraints for a large model of an arbitrary architecture, in one-stage end-to-end learning.Specifically, we create a recurrent parameter generator (RPG), which repeatedly fetches parameters from a ring and unpacks them onto a large model with random permutation and sign flipping to promote parameter decorrelation. We show that gradient descent can automatically find the best model under constraints with in fact faster convergence.Our extensive experimentation reveals a log-linear relationship between model DoF and accuracy. Our RPG demonstrates remarkable DoF reduction, and can be further pruned and quantized for additional run-time performance gain. For example, in terms of top-1 accuracy on ImageNet, RPG achieves 96% of ResNet18's performance with only 18% DoF (the equivalent of one convolutional layer) and 52% of ResNet34's performance with only 0.25% DoF! Our work shows significant potential of constrained neural opti-mization in compact and optimal deep learning.
AB - Deep learning has achieved tremendous success by training increasingly large models, which are then compressed for practical deployment. We propose a drastically different approach to compact and optimal deep learning: We decouple the Degrees of freedom (DoF) and the actual number of parameters of a model, optimize a small DoF with predefined random linear constraints for a large model of an arbitrary architecture, in one-stage end-to-end learning.Specifically, we create a recurrent parameter generator (RPG), which repeatedly fetches parameters from a ring and unpacks them onto a large model with random permutation and sign flipping to promote parameter decorrelation. We show that gradient descent can automatically find the best model under constraints with in fact faster convergence.Our extensive experimentation reveals a log-linear relationship between model DoF and accuracy. Our RPG demonstrates remarkable DoF reduction, and can be further pruned and quantized for additional run-time performance gain. For example, in terms of top-1 accuracy on ImageNet, RPG achieves 96% of ResNet18's performance with only 18% DoF (the equivalent of one convolutional layer) and 52% of ResNet34's performance with only 0.25% DoF! Our work shows significant potential of constrained neural opti-mization in compact and optimal deep learning.
KW - Algorithms: Machine learning architectures
KW - and algorithms (including transfer)
KW - formulations
UR - http://www.scopus.com/inward/record.url?scp=85149046380&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149046380&partnerID=8YFLogxK
U2 - 10.1109/WACV56688.2023.00389
DO - 10.1109/WACV56688.2023.00389
M3 - Conference contribution
AN - SCOPUS:85149046380
T3 - Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023
SP - 3889
EP - 3899
BT - Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023
Y2 - 3 January 2023 through 7 January 2023
ER -