TY - GEN
T1 - Averaging weights leads to wider optima and better generalization
AU - Izmailov, Pavel
AU - Podoprikhin, Dmitrii
AU - Garipov, Timur
AU - Vetrov, Dmitry
AU - Wilson, Andrew Gordon
N1 - Publisher Copyright:
© 34th Conference on Uncertainty in Artificial Intelligence 2018. All rights reserved.
PY - 2018
Y1 - 2018
N2 - Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. We show that simple averaging of multiple points along the trajectory of SGD, with a cyclical or constant learning rate, leads to better generalization than conventional training. We also show that this Stochastic Weight Averaging (SWA) procedure finds much broader optima than SGD, and approximates the recent Fast Geometric Ensem-bling (FGE) approach with a single model. Using SWA we achieve notable improvement in test accuracy over conventional SGD training on a range of state-of-the-art residual networks, PyramidNets, DenseNets, and ShakeShake networks on CIFAR-10, CIFAR-100, and ImageNet. In short, SWA is extremely easy to implement, improves generalization, and has almost no computational overhead.
AB - Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. We show that simple averaging of multiple points along the trajectory of SGD, with a cyclical or constant learning rate, leads to better generalization than conventional training. We also show that this Stochastic Weight Averaging (SWA) procedure finds much broader optima than SGD, and approximates the recent Fast Geometric Ensem-bling (FGE) approach with a single model. Using SWA we achieve notable improvement in test accuracy over conventional SGD training on a range of state-of-the-art residual networks, PyramidNets, DenseNets, and ShakeShake networks on CIFAR-10, CIFAR-100, and ImageNet. In short, SWA is extremely easy to implement, improves generalization, and has almost no computational overhead.
UR - http://www.scopus.com/inward/record.url?scp=85059432227&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85059432227&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85059432227
T3 - 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018
SP - 876
EP - 885
BT - 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018
A2 - Silva, Ricardo
A2 - Globerson, Amir
A2 - Globerson, Amir
PB - Association For Uncertainty in Artificial Intelligence (AUAI)
T2 - 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018
Y2 - 6 August 2018 through 10 August 2018
ER -