Abstract
Face recognition systems have made significant strides thanks to data-heavy deep learning models, but these models rely on large privacy-sensitive datasets. Recent work in facial analysis and recognition have thus started making use of synthetic datasets generated from GANs and diffusion based generative models. These models, however, lack fairness in terms of demographic representation and can introduce the same biases in the trained downstream tasks. This can have serious societal and security implications. To address this issue, we propose a methodology that generates unbiased data from a biased generative model using an evolutionary algorithm. We show results for StyleGAN2 model trained on the Flicker Faces High Quality dataset to generate data for singular and combinations of demographic attributes such as Black and Woman. We generate a large racially balanced dataset of 13.5 million images, and show that it boosts the performance of facial recognition and analysis systems whilst reducing their biases. We have made our code-base1 public to allow researchers to reproduce our work.
Original language | English (US) |
---|---|
Pages (from-to) | 1 |
Number of pages | 1 |
Journal | IEEE Transactions on Biometrics, Behavior, and Identity Science |
DOIs | |
State | Accepted/In press - 2024 |
Keywords
- Biological system modeling
- Data models
- Data privacy
- Face recognition
- Lighting
- Synthetic data
- Training
ASJC Scopus subject areas
- Instrumentation
- Computer Vision and Pattern Recognition
- Computer Science Applications
- Artificial Intelligence