TY - GEN
T1 - Scaling Properties of Deep Residual Networks
AU - Cohen, Alain Sam
AU - Cont, Rama
AU - Rossier, Alain
AU - Xu, Renyuan
N1 - Publisher Copyright:
Copyright © 2021 by the author(s)
PY - 2021
Y1 - 2021
N2 - Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
AB - Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
UR - http://www.scopus.com/inward/record.url?scp=85113451332&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113451332&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85113451332
T3 - Proceedings of Machine Learning Research
SP - 2039
EP - 2048
BT - Proceedings of the 38th International Conference on Machine Learning, ICML 2021
PB - ML Research Press
T2 - 38th International Conference on Machine Learning, ICML 2021
Y2 - 18 July 2021 through 24 July 2021
ER -