Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis

Thanh V. Nguyen, Raymond K.W. Wong, Chinmay Hegde

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Deep neural networks can achieve impressive performance in the regime where they are massively over-parameterized. Consequently, over the past year, there has been a growing interest in analyzing optimization and generalization properties of over-parameterized networks. However, the majority of existing work only applies to supervised learning. The role of over-parameterization in the unsupervised setting has by contrast gained far less attention. In this paper, we study the inductive bias of gradient descent for two-layer over-parameterized autoencoders with ReLU activation. We first provide theoretical evidence for the memorization phenomena observed in recent work using the property that infinitely wide neural networks under gradient descent evolve as linear models. We also analyze the gradient dynamics of the autoencoders in the finite-width setting. Starting from a randomly initialized autoencoder network, we rigorously prove the linear convergence of gradient descent in two weakly-trained and jointly-trained regimes. Our results indicate the considerable benefits of joint training over weak training in finding global optima, achieving a dramatic decrease in the required level of over-parameterization. Finally, we analyze the case of weight-tied autoencoders and prove that in the over-parameterized setting, training such networks from randomly initialized points leads to certain unexpected degeneracies.

    Original languageEnglish (US)
    JournalIEEE Transactions on Information Theory
    DOIs
    StateAccepted/In press - 2021

    Keywords

    • Convergence
    • Data models
    • Decoding
    • Heuristic algorithms
    • Kernel
    • Task analysis
    • Training
    • autoencoders
    • gradient dynamics
    • neural tangent kernel

    ASJC Scopus subject areas

    • Information Systems
    • Computer Science Applications
    • Library and Information Sciences

    Fingerprint

    Dive into the research topics of 'Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis'. Together they form a unique fingerprint.

    Cite this