TY - GEN
T1 - The Effects of Regularization and Data Augmentation are Class Dependent
AU - Balestriero, Randall
AU - Bottou, Leon
AU - LeCun, Yann
N1 - Publisher Copyright:
© 2022 Neural information processing systems foundation. All rights reserved.
PY - 2022
Y1 - 2022
N2 - Regularization is a fundamental technique to improve a model's generalization performances by limiting its complexity. Deep Neural Networks (DNNs), which tend to overfit their training data, heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay with hyper-parameters found from structural risk minimization, i.e., cross-validation. In this study, we demonstrate that the optimal regularization's hyper-parameters found from cross-validation over all classes leads to disastrous model performances on a minority of classes. For example, a resnet50 trained on Imagenet sees its “barn spider” test accuracy falls from 68% to 46% only by introducing random crop DA during training. Even more surprising, such unfair impact of regularization also appears when introducing uninformative regularizers such as weight decay or dropout. Those results demonstrate that our search for ever increasing generalization performance -averaged over all classes and samples-has left us with models and regularizers that silently sacrifice performances on some classes. This scenario can become dangerous when deploying a model on downstream tasks e.g. an Imagenet pre-trained resnet50 deployed on INaturalist sees its performances fall from 70% to 30% on class #8889 when introducing random crop DA during the Imagenet pre-training phase. Those results demonstrate that finding a correct measure of a model's complexity without class-dependent preference remains an open research question.
AB - Regularization is a fundamental technique to improve a model's generalization performances by limiting its complexity. Deep Neural Networks (DNNs), which tend to overfit their training data, heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay with hyper-parameters found from structural risk minimization, i.e., cross-validation. In this study, we demonstrate that the optimal regularization's hyper-parameters found from cross-validation over all classes leads to disastrous model performances on a minority of classes. For example, a resnet50 trained on Imagenet sees its “barn spider” test accuracy falls from 68% to 46% only by introducing random crop DA during training. Even more surprising, such unfair impact of regularization also appears when introducing uninformative regularizers such as weight decay or dropout. Those results demonstrate that our search for ever increasing generalization performance -averaged over all classes and samples-has left us with models and regularizers that silently sacrifice performances on some classes. This scenario can become dangerous when deploying a model on downstream tasks e.g. an Imagenet pre-trained resnet50 deployed on INaturalist sees its performances fall from 70% to 30% on class #8889 when introducing random crop DA during the Imagenet pre-training phase. Those results demonstrate that finding a correct measure of a model's complexity without class-dependent preference remains an open research question.
UR - http://www.scopus.com/inward/record.url?scp=85159904367&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85159904367&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85159904367
T3 - Advances in Neural Information Processing Systems
BT - Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
A2 - Koyejo, S.
A2 - Mohamed, S.
A2 - Agarwal, A.
A2 - Belgrave, D.
A2 - Cho, K.
A2 - Oh, A.
PB - Neural information processing systems foundation
T2 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
Y2 - 28 November 2022 through 9 December 2022
ER -