TY - JOUR
T1 - Cross Entropy versus Label Smoothing
T2 - A Neural Collapse Perspective
AU - Guo, Li
AU - Andriopoulos, George
AU - Zhao, Zifan
AU - Dong, Zixuan
AU - Ling, Shuyang
AU - Ross, Keith
N1 - Publisher Copyright:
© 2025, Transactions on Machine Learning Research. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Label smoothing is a widely adopted technique to mitigate overfitting in deep neural networks. This paper studies label smoothing from the perspective of Neural Collapse (NC), a powerful empirical and theoretical framework which characterizes model behavior during the terminal phase of training. We first show empirically that models trained with label smoothing converge faster to neural collapse solutions and attain a stronger level of neural collapse compared to those trained with cross-entropy loss. Furthermore, we show that at the same level of NC1, models under label smoothing loss exhibit intensified NC2. These findings provide valuable insights into the impact of label smoothing on model performance and calibration. Then, leveraging the unconstrained feature model, we derive closed-form solutions for the global minimizers under both label smoothing and cross-entropy losses. We show that models trained with label smoothing have a lower conditioning number and, therefore, theoretically converge faster. Our study, combining empirical evidence and theoretical results, not only provides nuanced insights into the differences between label smoothing and cross-entropy losses, but also serves as an example of how the powerful neural collapse framework can be used to improve our understanding of DNNs.
AB - Label smoothing is a widely adopted technique to mitigate overfitting in deep neural networks. This paper studies label smoothing from the perspective of Neural Collapse (NC), a powerful empirical and theoretical framework which characterizes model behavior during the terminal phase of training. We first show empirically that models trained with label smoothing converge faster to neural collapse solutions and attain a stronger level of neural collapse compared to those trained with cross-entropy loss. Furthermore, we show that at the same level of NC1, models under label smoothing loss exhibit intensified NC2. These findings provide valuable insights into the impact of label smoothing on model performance and calibration. Then, leveraging the unconstrained feature model, we derive closed-form solutions for the global minimizers under both label smoothing and cross-entropy losses. We show that models trained with label smoothing have a lower conditioning number and, therefore, theoretically converge faster. Our study, combining empirical evidence and theoretical results, not only provides nuanced insights into the differences between label smoothing and cross-entropy losses, but also serves as an example of how the powerful neural collapse framework can be used to improve our understanding of DNNs.
UR - http://www.scopus.com/inward/record.url?scp=105005999014&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105005999014&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:105005999014
SN - 2835-8856
VL - 2025-May
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -