TY - JOUR
T1 - Loss surfaces, mode connectivity, and fast ensembling of DNNs
AU - Garipov, Timur
AU - Izmailov, Pavel
AU - Podoprikhin, Dmitrii
AU - Vetrov, Dmitry
AU - Wilson, Andrew Gordon
N1 - Funding Information:
Timur Garipov was supported by Ministry of Education and Science of the Russian Federation (grant 14.756.31.0001). Timur Garipov and Dmitrii Podoprikhin were supported by Samsung Research, Samsung Electronics. Andrew Gordon Wilson and Pavel Izmailov were supported by Facebook Research and NSF IIS-1563887.
Funding Information:
Acknowledgements. Timur Garipov was supported by Ministry of Education and Science of the Russian Federation (grant 14.756.31.0001). Timur Garipov and Dmitrii Podoprikhin were supported by Samsung Research, Samsung Electronics. Andrew Gordon Wilson and Pavel Izmailov were supported by Facebook Research and NSF IIS-1563887.
Publisher Copyright:
© 2018 Curran Associates Inc.All rights reserved.
PY - 2018
Y1 - 2018
N2 - The loss functions of deep neural networks are complex and their geometric properties are not well understood. We show that the optima of these complex loss functions are in fact connected by simple curves over which training and test accuracy are nearly constant. We introduce a training procedure to discover these high-accuracy pathways between modes. Inspired by this new geometric insight, we also propose a new ensembling method entitled Fast Geometric Ensembling (FGE). Using FGE we can train high-performing ensembles in the time required to train a single model. We achieve improved performance compared to the recent state-of-the-art Snapshot Ensembles, on CIFAR-10, CIFAR-100, and ImageNet.
AB - The loss functions of deep neural networks are complex and their geometric properties are not well understood. We show that the optima of these complex loss functions are in fact connected by simple curves over which training and test accuracy are nearly constant. We introduce a training procedure to discover these high-accuracy pathways between modes. Inspired by this new geometric insight, we also propose a new ensembling method entitled Fast Geometric Ensembling (FGE). Using FGE we can train high-performing ensembles in the time required to train a single model. We achieve improved performance compared to the recent state-of-the-art Snapshot Ensembles, on CIFAR-10, CIFAR-100, and ImageNet.
UR - http://www.scopus.com/inward/record.url?scp=85064816133&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064816133&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85064816133
SN - 1049-5258
VL - 2018-December
SP - 8789
EP - 8798
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 32nd Conference on Neural Information Processing Systems, NeurIPS 2018
Y2 - 2 December 2018 through 8 December 2018
ER -