Abstract
Among all the supervised learning algorithms, back-propagation (BP) is probably the most wi(l)dely used. Classical non-linear programming methods generally use an estimate of the Hessian matrix (matrix of second derivatives) to compute the weight modification at each iteration. They are derived from the well known Newton-Raphson algorithm. We propose a very rough approximation to the Newton method which just uses the diagonal terms of the Hessian matrix. These terms give information about the curvature of the error surface in directions parallel to the weight space axes. This information can be used to scale the learning rates for each weight independently. We show that it is possible to approximate the diagonal terms of the Hessian matrix using a back-propagation procedure very similar to the one used for the first derivatives.
Original language | English (US) |
---|---|
Pages (from-to) | 168 |
Number of pages | 1 |
Journal | Neural Networks |
Volume | 1 |
Issue number | 1 SUPPL |
DOIs | |
State | Published - 1988 |
Event | International Neural Network Society 1988 First Annual Meeting - Boston, MA, USA Duration: Sep 6 1988 → Sep 10 1988 |
ASJC Scopus subject areas
- Cognitive Neuroscience
- Artificial Intelligence