TY - JOUR
T1 - Fast and Provable Algorithms for Learning Two-Layer Polynomial Neural Networks
AU - Soltani, Mohammadreza
AU - Hegde, Chinmay
N1 - Funding Information:
This work was supported by the National Science Foundation under Grants CCF-1566281 and CAREER CCF-1750920.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/7/1
Y1 - 2019/7/1
N2 - In this paper, we bridge the problem of (provably) learning shallow neural networks with the well-studied problem of low-rank matrix estimation. In particular, we consider two-layer networks with quadratic activations, and focus on the under-parameterized regime where the number of neurons in the hidden layer is smaller than the dimension of the input. Our main approach is to 'lift' the learning problem into a higher dimension, which enables us to borrow algorithmic techniques from low-rank matrix estimation. Using this intuition, we propose three novel, non-convex training algorithms. We support our algorithms with rigorous theoretical analysis, and show that all three enjoy a linear convergence, fast running time per iteration, and near-optimal sample complexity. Finally, we complement our theoretical results with numerical experiments.
AB - In this paper, we bridge the problem of (provably) learning shallow neural networks with the well-studied problem of low-rank matrix estimation. In particular, we consider two-layer networks with quadratic activations, and focus on the under-parameterized regime where the number of neurons in the hidden layer is smaller than the dimension of the input. Our main approach is to 'lift' the learning problem into a higher dimension, which enables us to borrow algorithmic techniques from low-rank matrix estimation. Using this intuition, we propose three novel, non-convex training algorithms. We support our algorithms with rigorous theoretical analysis, and show that all three enjoy a linear convergence, fast running time per iteration, and near-optimal sample complexity. Finally, we complement our theoretical results with numerical experiments.
KW - Deep learning
KW - approximate algorithms
KW - low-rank estimation
KW - rank-one projection
KW - sample complexity
UR - http://www.scopus.com/inward/record.url?scp=85066635653&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85066635653&partnerID=8YFLogxK
U2 - 10.1109/TSP.2019.2916743
DO - 10.1109/TSP.2019.2916743
M3 - Article
AN - SCOPUS:85066635653
SN - 1053-587X
VL - 67
SP - 3361
EP - 3371
JO - IRE Transactions on Audio
JF - IRE Transactions on Audio
IS - 13
M1 - 8713936
ER -