TY - JOUR
T1 - Exposing the probabilistic causal structure of discrimination
AU - Bonchi, Francesco
AU - Hajian, Sara
AU - Mishra, Bud
AU - Ramazzotti, Daniele
N1 - Funding Information:
The research leading to these results has received funding from the European Unions Horizon 2020 Innovation Action Program under grant agreement No 653449 TYPES project, the Catalonia Trade and Investment Agency (Agncia per la competitivitat de l’empresa, ACCI) and CMU grant No 15-00314-SUB-000.
Publisher Copyright:
© 2017, Springer International Publishing Switzerland.
PY - 2017/2/1
Y1 - 2017/2/1
N2 - Discrimination discovery from data is an important data mining task, whose goal is to identify patterns of illegal and unethical discriminatory activities against protected-by-law groups, e.g., ethnic minorities. While any legally valid proof of discrimination requires evidence of causality, the state-of-the-art methods are essentially correlation based, albeit, as it is well known, correlation does not imply causation. In this paper, we take a principled causal approach to discrimination detection following Suppes’ probabilistic causation theory. In particular, we define a method to extract, from a dataset of historical decision records, the causal structures existing among the attributes in the data. The result is a type of constrained Bayesian network, which we dub Suppes-Bayes causal network (SBCN). Next, we develop a toolkit of methods based on random walks on top of the SBCN, addressing different anti-discrimination legal concepts, such as direct and indirect discrimination, group and individual discrimination, genuine requirement, and favoritism. Our experiments on real-world datasets confirm the inferential power of our approach in all these different tasks.
AB - Discrimination discovery from data is an important data mining task, whose goal is to identify patterns of illegal and unethical discriminatory activities against protected-by-law groups, e.g., ethnic minorities. While any legally valid proof of discrimination requires evidence of causality, the state-of-the-art methods are essentially correlation based, albeit, as it is well known, correlation does not imply causation. In this paper, we take a principled causal approach to discrimination detection following Suppes’ probabilistic causation theory. In particular, we define a method to extract, from a dataset of historical decision records, the causal structures existing among the attributes in the data. The result is a type of constrained Bayesian network, which we dub Suppes-Bayes causal network (SBCN). Next, we develop a toolkit of methods based on random walks on top of the SBCN, addressing different anti-discrimination legal concepts, such as direct and indirect discrimination, group and individual discrimination, genuine requirement, and favoritism. Our experiments on real-world datasets confirm the inferential power of our approach in all these different tasks.
KW - Algorithmic discrimination
KW - Discrimination discovery
KW - Random walks
KW - constrained Bayesian network
KW - probabilistic causation
UR - http://www.scopus.com/inward/record.url?scp=85029045105&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029045105&partnerID=8YFLogxK
U2 - 10.1007/s41060-016-0040-z
DO - 10.1007/s41060-016-0040-z
M3 - Article
AN - SCOPUS:85029045105
SN - 2364-415X
VL - 3
SP - 1
EP - 21
JO - International Journal of Data Science and Analytics
JF - International Journal of Data Science and Analytics
IS - 1
ER -