TY - GEN
T1 - Fides
T2 - 29th International Conference on Scientific and Statistical Database Management, SSDBM 2017
AU - Stoyanovich, Julia
AU - Howe, Bill
AU - Abiteboul, Serge
AU - Miklau, Gerome
AU - Sahuguet, Arnaud
AU - Weikum, Gerhard
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/6/27
Y1 - 2017/6/27
N2 - Issues of responsible data analysis and use are coming to the forefront of the discourse in data science research and practice, with most significant efforts to date on the part of the data mining, machine learning, and security and privacy communities. In these fields, the research has been focused on analyzing the fairness, accountability and transparency (FAT) properties of specific algorithms and their outputs. Although these issues are most apparent in the social sciences where fairness is interpreted in terms of the distribution of resources across protected groups, management of bias in source data affects a variety of fields. Consider climate change studies that require representative data from geographically diverse regions, or supply chain analyses that require data that represents the diversity of products and customers. Any domain that involves sparse or sampled data has exposure to potential bias. In this vision paper, we argue that FAT properties must be considered as database system issues, further upstream in the data science lifecycle: bias in source data goes unnoticed, and bias may be introduced during pre-processing (fairness), spurious correlations lead to reproducibility problems (accountability), and assumptions made during pre-processing have invisible but significant effects on decisions (transparency). As machine learning methods continue to be applied broadly by non-experts, the potential for misuse increases. We see a need for a data sharing and collaborative analytics platform with features to encourage (and in some cases, enforce) best practices at all stages of the data science lifecycle. We describe features of such a platform, which we term Fides, in the context of urban analytics, outlining a systems research agenda in responsible data science.
AB - Issues of responsible data analysis and use are coming to the forefront of the discourse in data science research and practice, with most significant efforts to date on the part of the data mining, machine learning, and security and privacy communities. In these fields, the research has been focused on analyzing the fairness, accountability and transparency (FAT) properties of specific algorithms and their outputs. Although these issues are most apparent in the social sciences where fairness is interpreted in terms of the distribution of resources across protected groups, management of bias in source data affects a variety of fields. Consider climate change studies that require representative data from geographically diverse regions, or supply chain analyses that require data that represents the diversity of products and customers. Any domain that involves sparse or sampled data has exposure to potential bias. In this vision paper, we argue that FAT properties must be considered as database system issues, further upstream in the data science lifecycle: bias in source data goes unnoticed, and bias may be introduced during pre-processing (fairness), spurious correlations lead to reproducibility problems (accountability), and assumptions made during pre-processing have invisible but significant effects on decisions (transparency). As machine learning methods continue to be applied broadly by non-experts, the potential for misuse increases. We see a need for a data sharing and collaborative analytics platform with features to encourage (and in some cases, enforce) best practices at all stages of the data science lifecycle. We describe features of such a platform, which we term Fides, in the context of urban analytics, outlining a systems research agenda in responsible data science.
KW - Accountability
KW - Data
KW - Data ethics
KW - Data science for social good
KW - Fairness
KW - Responsibly
KW - Transparency
UR - http://www.scopus.com/inward/record.url?scp=85025672959&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85025672959&partnerID=8YFLogxK
U2 - 10.1145/3085504.3085530
DO - 10.1145/3085504.3085530
M3 - Conference contribution
AN - SCOPUS:85025672959
T3 - ACM International Conference Proceeding Series
BT - SSDBM 2017
PB - Association for Computing Machinery
Y2 - 27 June 2017 through 29 June 2017
ER -