TY - JOUR
T1 - UnbiasedNets
T2 - a dataset diversification framework for robustness bias alleviation in neural networks
AU - Naseer, Mahum
AU - Prabakaran, Bharath Srinivas
AU - Hasan, Osman
AU - Shafique, Muhammad
N1 - Publisher Copyright:
© The Author(s) 2023.
PY - 2024/5
Y1 - 2024/5
N2 - Performance of trained neural network (NN) models, in terms of testing accuracy, has improved remarkably over the past several years, especially with the advent of deep learning. However, even the most accurate NNs can be biased toward a specific output classification due to the inherent bias in the available training datasets, which may propagate to the real-world implementations. This paper deals with the robustness bias, i.e., the bias exhibited by the trained NN by having a significantly large robustness to noise for a certain output class, as compared to the remaining output classes. The bias is shown to result from imbalanced datasets, i.e., the datasets where all output classes are not equally represented. Towards this, we propose the UnbiasedNets framework, which leverages K-means clustering and the NN’s noise tolerance to diversify the given training dataset, even from relatively smaller datasets. This generates balanced datasets and reduces the bias within the datasets themselves. To the best of our knowledge, this is the first framework catering to the robustness bias problem in NNs. We use real-world datasets to demonstrate the efficacy of the UnbiasedNets for data diversification, in case of both binary and multi-label classifiers. The results are compared to well-known tools aimed at generating balanced datasets, and illustrate how existing works have limited success while addressing the robustness bias. In contrast, UnbiasedNets provides a notable improvement over existing works, while even reducing the robustness bias significantly in some cases, as observed by comparing the NNs trained on the diversified and original datasets.
AB - Performance of trained neural network (NN) models, in terms of testing accuracy, has improved remarkably over the past several years, especially with the advent of deep learning. However, even the most accurate NNs can be biased toward a specific output classification due to the inherent bias in the available training datasets, which may propagate to the real-world implementations. This paper deals with the robustness bias, i.e., the bias exhibited by the trained NN by having a significantly large robustness to noise for a certain output class, as compared to the remaining output classes. The bias is shown to result from imbalanced datasets, i.e., the datasets where all output classes are not equally represented. Towards this, we propose the UnbiasedNets framework, which leverages K-means clustering and the NN’s noise tolerance to diversify the given training dataset, even from relatively smaller datasets. This generates balanced datasets and reduces the bias within the datasets themselves. To the best of our knowledge, this is the first framework catering to the robustness bias problem in NNs. We use real-world datasets to demonstrate the efficacy of the UnbiasedNets for data diversification, in case of both binary and multi-label classifiers. The results are compared to well-known tools aimed at generating balanced datasets, and illustrate how existing works have limited success while addressing the robustness bias. In contrast, UnbiasedNets provides a notable improvement over existing works, while even reducing the robustness bias significantly in some cases, as observed by comparing the NNs trained on the diversified and original datasets.
KW - Bias
KW - Data-centric bias alleviation
KW - K-means clustering
KW - Neural networks
KW - Noise tolerance
UR - http://www.scopus.com/inward/record.url?scp=85149015666&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149015666&partnerID=8YFLogxK
U2 - 10.1007/s10994-023-06314-z
DO - 10.1007/s10994-023-06314-z
M3 - Article
AN - SCOPUS:85149015666
SN - 0885-6125
VL - 113
SP - 2499
EP - 2526
JO - Machine Learning
JF - Machine Learning
IS - 5
ER -