TY - GEN
T1 - Understanding Disparities in Post Hoc Machine Learning Explanation
AU - Mhasawade, Vishwali
AU - Rahman, Salman
AU - Haskell-Craig, Zoé
AU - Chunara, Rumi
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/6/3
Y1 - 2024/6/3
N2 - Previous work has highlighted that existing post-hoc explanation methods exhibit disparities in explanation fidelity (across "race"and "gender"as sensitive attributes), and while a large body of work focuses on mitigating these issues at the explanation metric level, the role of the data generating process and black box model in relation to explanation disparities remains largely unexplored. Accordingly, through both simulations as well as experiments on a real-world dataset, we specifically assess challenges to explanation disparities that originate from properties of the data: limited sample size, covariate shift, concept shift, omitted variable bias, and challenges based on model properties: inclusion of the sensitive attribute and appropriate functional form. Through controlled simulation analyses, our study demonstrates that increased covariate shift, concept shift, and omission of covariates increase explanation disparities, with the effect pronounced higher for neural network models that are better able to capture the underlying functional form in comparison to linear models. We also observe consistent findings regarding the effect of concept shift and omitted variable bias on explanation disparities in the Adult income dataset. Overall, results indicate that disparities in model explanations can also depend on data and model properties. Based on this systematic investigation, we provide recommendations for the design of explanation methods that mitigate undesirable disparities.
AB - Previous work has highlighted that existing post-hoc explanation methods exhibit disparities in explanation fidelity (across "race"and "gender"as sensitive attributes), and while a large body of work focuses on mitigating these issues at the explanation metric level, the role of the data generating process and black box model in relation to explanation disparities remains largely unexplored. Accordingly, through both simulations as well as experiments on a real-world dataset, we specifically assess challenges to explanation disparities that originate from properties of the data: limited sample size, covariate shift, concept shift, omitted variable bias, and challenges based on model properties: inclusion of the sensitive attribute and appropriate functional form. Through controlled simulation analyses, our study demonstrates that increased covariate shift, concept shift, and omission of covariates increase explanation disparities, with the effect pronounced higher for neural network models that are better able to capture the underlying functional form in comparison to linear models. We also observe consistent findings regarding the effect of concept shift and omitted variable bias on explanation disparities in the Adult income dataset. Overall, results indicate that disparities in model explanations can also depend on data and model properties. Based on this systematic investigation, we provide recommendations for the design of explanation methods that mitigate undesirable disparities.
KW - explainability
KW - fairness
KW - post hoc explanation methods
UR - http://www.scopus.com/inward/record.url?scp=85196618428&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85196618428&partnerID=8YFLogxK
U2 - 10.1145/3630106.3659043
DO - 10.1145/3630106.3659043
M3 - Conference contribution
AN - SCOPUS:85196618428
T3 - 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
SP - 2374
EP - 2388
BT - 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
PB - Association for Computing Machinery, Inc
T2 - 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
Y2 - 3 June 2024 through 6 June 2024
ER -