Abstract
Low-precision arithmetic has had a transformative effect on the training of neural networks, reducing computation, memory and energy requirements. However, despite its promise, low-precision arithmetic has received little attention for Gaussian process (GP) training, largely because GPs require sophisticated linear algebra routines that are unstable in low-precision. We study the different failure modes that can occur when training GPs in half precision. To circumvent these failure modes, we propose a multi-faceted approach involving conjugate gradients with re-orthogonalization, mixed precision, and preconditioning. Our approach significantly improves the numerical stability and practical performance of conjugate gradients in low-precision over a wide range of settings, enabling GPs to train on 1.8 million data points in 10 hours on a single GPU, without requiring any sparse approximations.
Original language | English (US) |
---|---|
Pages (from-to) | 1306-1316 |
Number of pages | 11 |
Journal | Proceedings of Machine Learning Research |
Volume | 180 |
State | Published - 2022 |
Event | 38th Conference on Uncertainty in Artificial Intelligence, UAI 2022 - Eindhoven, Netherlands Duration: Aug 1 2022 → Aug 5 2022 |
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability