Theoretically Grounded Loss Functions and Algorithms for Adversarial Robustness

Pranjal Awasthi, Anqi Mao, Mehryar Mohri, Yutao Zhong

Research output: Contribution to journalConference articlepeer-review

Abstract

Adversarial robustness is a critical property of classifiers in applications as they are increasingly deployed in complex real-world systems. Yet, achieving accurate adversarial robustness in machine learning remains a persistent challenge and the choice of the surrogate loss function used for training a key factor. We present a family of new loss functions for adversarial robustness, smooth adversarial losses, which we show can be derived in a general way from broad families of loss functions used in multi-class classification. We prove strong H-consistency theoretical guarantees for these loss functions, including multi-class H-consistency bounds for sum losses in the adversarial setting. We design new regularized algorithms based on the minimization of these principled smooth adversarial losses (PSAL). We further show through a series of extensive experiments with the CIFAR-10, CIFAR-100 and SVHN datasets that our PSAL algorithm consistently outperforms the current state-of-the-art technique, TRADES, for both robust accuracy against ℓ-norm bounded perturbations and, even more significantly, for clean accuracy. Finally, we prove that, unlike PSAL, the TRADES loss in general does not admit an H-consistency property.

Original languageEnglish (US)
Pages (from-to)10077-10094
Number of pages18
JournalProceedings of Machine Learning Research
Volume206
StatePublished - 2023
Event26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023 - Valencia, Spain
Duration: Apr 25 2023Apr 27 2023

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Theoretically Grounded Loss Functions and Algorithms for Adversarial Robustness'. Together they form a unique fingerprint.

Cite this