Abstract
In this paper, we bridge the problem of (provably) learning shallow neural networks with the well-studied problem of low-rank matrix estimation. In particular, we consider two-layer networks with quadratic activations, and focus on the under-parameterized regime where the number of neurons in the hidden layer is smaller than the dimension of the input. Our main approach is to 'lift' the learning problem into a higher dimension, which enables us to borrow algorithmic techniques from low-rank matrix estimation. Using this intuition, we propose three novel, non-convex training algorithms. We support our algorithms with rigorous theoretical analysis, and show that all three enjoy a linear convergence, fast running time per iteration, and near-optimal sample complexity. Finally, we complement our theoretical results with numerical experiments.
Original language | English (US) |
---|---|
Article number | 8713936 |
Pages (from-to) | 3361-3371 |
Number of pages | 11 |
Journal | IEEE Transactions on Signal Processing |
Volume | 67 |
Issue number | 13 |
DOIs | |
State | Published - Jul 1 2019 |
Keywords
- Deep learning
- approximate algorithms
- low-rank estimation
- rank-one projection
- sample complexity
ASJC Scopus subject areas
- Signal Processing
- Electrical and Electronic Engineering