Behavior of Limited Memory BFGS When Applied to Nonsmooth Functions and Their Nesterov Smoothings

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The motivation to study the behavior of limited-memory BFGS (L-BFGS) on nonsmooth optimization problems is based on two empirical observations: the widespread success of L-BFGS in solving large-scale smooth optimization problems, and the remarkable effectiveness of the full BFGS method in solving small to medium-sized nonsmooth optimization problems, based on using a gradient, not a subgradient, oracle paradigm. We first summarize our theoretical results on the behavior of the scaled L-BFGS method with one update applied to a simple convex nonsmooth function that is unbounded below, stating conditions under which the method converges to a non-optimal point regardless of the starting point. We then turn to empirically investigating whether the same phenomenon holds more generally, focusing on a difficult problem of Nesterov, as well as eigenvalue optimization problems arising in semidefinite programming applications. We find that when applied to a nonsmooth function directly, L-BFGS, especially its scaled variant, often breaks down with a poor approximation to an optimal solution, in sharp contrast to full BFGS. Unscaled L-BFGS is less prone to breakdown but conducts far more function evaluations per iteration than scaled L-BFGS does, and thus it is slow. Nonetheless, it is often the case that both variants obtain better results than the provably convergent, but slow, subgradient method. On the other hand, when applied to Nesterov’s smooth approximation of a nonsmooth function, scaled L-BFGS is generally much more efficient than unscaled L-BFGS, often obtaining good results even when the problem is quite ill-conditioned. Summarizing, we find that although L-BFGS is often a reliable method for minimizing ill-conditioned smooth problems, when the condition number is so large that the function is effectively nonsmooth, L-BFGS frequently fails. This behavior is in sharp contrast to the behavior of full BFGS, which is consistently reliable for nonsmooth optimization problems. We arrive at the conclusion that, for large-scale nonsmooth optimization problems for which full BFGS and other methods for nonsmooth optimization are not practical, it is often better to apply L-BFGS to a smooth approximation of a nonsmooth problem than to apply it directly to the nonsmooth problem.

Original languageEnglish (US)
Title of host publicationNumerical Analysis and Optimization, NAOV 2020
EditorsMehiddin Al-Baali, Anton Purnama, Lucio Grandinetti
PublisherSpringer
Pages25-55
Number of pages31
ISBN (Print)9783030720391
DOIs
StatePublished - 2021
Event5th International Conference on Numerical Analysis and Optimization: Theory, Methods, Applications and Technology Transfer, NAOV 2020 - Muscat, Oman
Duration: Jan 6 2020Jan 9 2020

Publication series

NameSpringer Proceedings in Mathematics and Statistics
Volume354
ISSN (Print)2194-1009
ISSN (Electronic)2194-1017

Conference

Conference5th International Conference on Numerical Analysis and Optimization: Theory, Methods, Applications and Technology Transfer, NAOV 2020
Country/TerritoryOman
CityMuscat
Period1/6/201/9/20

ASJC Scopus subject areas

  • General Mathematics

Fingerprint

Dive into the research topics of 'Behavior of Limited Memory BFGS When Applied to Nonsmooth Functions and Their Nesterov Smoothings'. Together they form a unique fingerprint.

Cite this