Linearly Convergent Algorithms for Learning Shallow Residual Networks

Gauri Jagatap, Chinmay Hegde

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We propose and analyze algorithms for training ReLU networks with skipped connections. Skipped connections are the key feature of residual networks (or ResNets) which have been shown to provide superior performance in deep learning applications. We analyze two approaches for training such networks - gradient descent and alternating minimization - and compare convergence criteria of both methods. We show that under typical (Gaussianity) assumptions on the d-dimensional input data, both gradient descent and alternating minimization provably converge in a linearly convergent fashion, assuming any good enough initialization; moreover, we show that a simple "identity" initialization suffices. Furthermore, we provide statistical upper bounds which indicate that n = \tilde O( {{d^3}} ) suffice to achieve this convergence rate. To our knowledge, these constitute the first global parameter recovery guarantees for shallow ResNet-type networks with ReLU activations.

    Original languageEnglish (US)
    Title of host publication2019 IEEE International Symposium on Information Theory, ISIT 2019 - Proceedings
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages1797-1801
    Number of pages5
    ISBN (Electronic)9781538692912
    DOIs
    StatePublished - Jul 2019
    Event2019 IEEE International Symposium on Information Theory, ISIT 2019 - Paris, France
    Duration: Jul 7 2019Jul 12 2019

    Publication series

    NameIEEE International Symposium on Information Theory - Proceedings
    Volume2019-July
    ISSN (Print)2157-8095

    Conference

    Conference2019 IEEE International Symposium on Information Theory, ISIT 2019
    Country/TerritoryFrance
    CityParis
    Period7/7/197/12/19

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Information Systems
    • Modeling and Simulation
    • Applied Mathematics

    Fingerprint

    Dive into the research topics of 'Linearly Convergent Algorithms for Learning Shallow Residual Networks'. Together they form a unique fingerprint.

    Cite this