TY - GEN
T1 - Linearly Convergent Algorithms for Learning Shallow Residual Networks
AU - Jagatap, Gauri
AU - Hegde, Chinmay
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/7
Y1 - 2019/7
N2 - We propose and analyze algorithms for training ReLU networks with skipped connections. Skipped connections are the key feature of residual networks (or ResNets) which have been shown to provide superior performance in deep learning applications. We analyze two approaches for training such networks - gradient descent and alternating minimization - and compare convergence criteria of both methods. We show that under typical (Gaussianity) assumptions on the d-dimensional input data, both gradient descent and alternating minimization provably converge in a linearly convergent fashion, assuming any good enough initialization; moreover, we show that a simple "identity" initialization suffices. Furthermore, we provide statistical upper bounds which indicate that n = \tilde O( {{d^3}} ) suffice to achieve this convergence rate. To our knowledge, these constitute the first global parameter recovery guarantees for shallow ResNet-type networks with ReLU activations.
AB - We propose and analyze algorithms for training ReLU networks with skipped connections. Skipped connections are the key feature of residual networks (or ResNets) which have been shown to provide superior performance in deep learning applications. We analyze two approaches for training such networks - gradient descent and alternating minimization - and compare convergence criteria of both methods. We show that under typical (Gaussianity) assumptions on the d-dimensional input data, both gradient descent and alternating minimization provably converge in a linearly convergent fashion, assuming any good enough initialization; moreover, we show that a simple "identity" initialization suffices. Furthermore, we provide statistical upper bounds which indicate that n = \tilde O( {{d^3}} ) suffice to achieve this convergence rate. To our knowledge, these constitute the first global parameter recovery guarantees for shallow ResNet-type networks with ReLU activations.
UR - http://www.scopus.com/inward/record.url?scp=85073170367&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85073170367&partnerID=8YFLogxK
U2 - 10.1109/ISIT.2019.8849246
DO - 10.1109/ISIT.2019.8849246
M3 - Conference contribution
AN - SCOPUS:85073170367
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 1797
EP - 1801
BT - 2019 IEEE International Symposium on Information Theory, ISIT 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Symposium on Information Theory, ISIT 2019
Y2 - 7 July 2019 through 12 July 2019
ER -