AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop

Jing Wang, Yunfei Teng, Anna Choromanska

Research output: Contribution to journalConference articlepeer-review

Abstract

Modern deep learning (DL) architectures are trained using variants of the SGD algorithm and typically rely on the user to manually drop the learning rate when the training curve saturates. In this paper, we develop an algorithm, that we call AutoDrop, that realizes the learning rate drop automatically and stems from the properties of the learning dynamics of DL systems. Specifically, it is motivated by the observation that the angular velocity of the model parameters, i.e., the velocity of the changes of the convergence direction, for a fixed learning rate initially increases rapidly and then progresses towards soft saturation. At saturation, the optimizer slows down thus the angular velocity saturation is a good indicator for dropping the learning rate. After the drop, the angular velocity “resets” and follows the pattern described above, increasing again until saturation. AutoDrop is built on this idea and drops the learning rate whenever the angular velocity saturates. The method is simple to implement, computationally cheap, and by design avoids the short-horizon bias problem. We show that AutoDrop achieves favorable performance compared to many different baseline manual and automatic learning rate schedulers, and matches the SOTA performance on all our experiments. On the theoretical front, we claim two contributions: we formulate the learning rate behavior based on the angular velocity and provide general convergence theory for the learning rate schedulers that decrease the learning rate step-wise, rather than continuously as is commonly analyzed.

Original languageEnglish (US)
Pages (from-to)3603-3629
Number of pages27
JournalProceedings of Machine Learning Research
Volume244
StatePublished - 2024
Event40th Conference on Uncertainty in Artificial Intelligence, UAI 2024 - Barcelona, Spain
Duration: Jul 15 2024Jul 19 2024

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop'. Together they form a unique fingerprint.

Cite this