Abstract
The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK). In this article, we show that the NTK allows one to gain precise insight into the Hessian of the cost of DNNs. When the NTK is fixed during training, we obtain a full characterization of the asymptotics of the spectrum of the Hessian, at initialization and during training. In the so-called mean-field limit, where the NTK is not fixed during training, we describe the first two moments of the Hessian at initialization.
Original language | English (US) |
---|---|
State | Published - 2020 |
Event | 8th International Conference on Learning Representations, ICLR 2020 - Addis Ababa, Ethiopia Duration: Apr 30 2020 → … |
Conference
Conference | 8th International Conference on Learning Representations, ICLR 2020 |
---|---|
Country/Territory | Ethiopia |
City | Addis Ababa |
Period | 4/30/20 → … |
ASJC Scopus subject areas
- Education
- Linguistics and Language
- Language and Linguistics
- Computer Science Applications