Benefits of jointly training autoencoders: An improved neural tangent kernel analysis

Thanh V. Nguyen, Raymond K.W. Wong, Chinmay Hegde

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Deep neural networks can achieve impressive performance in the regime where they are massively over-parameterized. Consequently, over the past year, there has been a growing interest in analyzing optimization and generalization properties of over-parameterized networks. However, the majority of existing work only applies to supervised learning. The role of over-parameterization in the unsupervised setting has by contrast gained far less attention. In this paper, we study the inductive bias of gradient descent for two-layer over-parameterized autoencoders with ReLU activation. We first provide theoretical evidence for the memorization phenomena observed in recent work using the property that infinitely wide neural networks under gradient descent evolve as linear models. We also analyze the gradient dynamics of the autoencoders in the finite-width setting. Starting from a randomly initialized autoencoder network, we rigorously prove the linear convergence of gradient descent in two weakly-trained and jointly-trained regimes. Our results indicate the considerable benefits of joint training over weak training in finding global optima, achieving a dramatic decrease in the required level of over-parameterization. Finally, we analyze the case of weight-tied autoencoders and prove that in the over-parameterized setting, training such networks from randomly initialized points leads to certain unexpected degeneracies.

    Original languageEnglish (US)
    Article number9374468
    Pages (from-to)4669-4692
    Number of pages24
    JournalIEEE Transactions on Information Theory
    Volume67
    Issue number7
    DOIs
    StatePublished - Jul 2021

    Keywords

    • Convergence
    • Data models
    • Decoding
    • Heuristic algorithms
    • Kernel
    • Task analysis
    • Training
    • autoencoders
    • gradient dynamics
    • neural tangent kernel
    • Gradient dynamics
    • Autoencoders
    • Neural tangent kernel

    ASJC Scopus subject areas

    • Information Systems
    • Library and Information Sciences
    • Computer Science Applications

    Fingerprint

    Dive into the research topics of 'Benefits of jointly training autoencoders: An improved neural tangent kernel analysis'. Together they form a unique fingerprint.

    Cite this