TY - GEN
T1 - Online set selection with fairness and diversity constraints
AU - Stoyanovich, Julia
AU - Yang, Ke
AU - Jagadish, H. V.
N1 - Publisher Copyright:
© 2018 Copyright held by the owner/author(s)
PY - 2018
Y1 - 2018
N2 - Selection algorithms usually score individual items in isolation, and then select the top scoring items. However, often there is an additional diversity objective. Since diversity is a group property, it does not easily jibe with individual item scoring. In this paper, we study set selection queries subject to diversity and group fairness constraints. We develop algorithms for several problem settings with streaming data, where an online decision must be made on each item as it is presented. We show through experiments with real and synthetic data that fairness and diversity can be achieved, usually with modest costs in terms of quality. Our experimental evaluation leads to several important insights in online set selection. We demonstrate that theoretical guarantees on solution quality are conservative in real datasets, and that tuning the length of the score estimation phase leads to an interesting accuracy-efficiency trade-off. Further, we show that if a difference in scores is expected between groups, then these groups must be treated separately during processing. Otherwise, a solution may be derived that meets diversity constraints, but that selects lower-scoring members of disadvantaged groups.
AB - Selection algorithms usually score individual items in isolation, and then select the top scoring items. However, often there is an additional diversity objective. Since diversity is a group property, it does not easily jibe with individual item scoring. In this paper, we study set selection queries subject to diversity and group fairness constraints. We develop algorithms for several problem settings with streaming data, where an online decision must be made on each item as it is presented. We show through experiments with real and synthetic data that fairness and diversity can be achieved, usually with modest costs in terms of quality. Our experimental evaluation leads to several important insights in online set selection. We demonstrate that theoretical guarantees on solution quality are conservative in real datasets, and that tuning the length of the score estimation phase leads to an interesting accuracy-efficiency trade-off. Further, we show that if a difference in scores is expected between groups, then these groups must be treated separately during processing. Otherwise, a solution may be derived that meets diversity constraints, but that selects lower-scoring members of disadvantaged groups.
UR - http://www.scopus.com/inward/record.url?scp=85058352110&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85058352110&partnerID=8YFLogxK
U2 - 10.5441/002/edbt.2018.22
DO - 10.5441/002/edbt.2018.22
M3 - Conference contribution
AN - SCOPUS:85058352110
T3 - Advances in Database Technology - EDBT
SP - 241
EP - 252
BT - Advances in Database Technology - EDBT 2018
A2 - Bohlen, Michael
A2 - Pichler, Reinhard
A2 - May, Norman
A2 - Rahm, Erhard
A2 - Wu, Shan-Hung
A2 - Hose, Katja
PB - OpenProceedings.org
T2 - 21st International Conference on Extending Database Technology, EDBT 2018
Y2 - 26 March 2018 through 29 March 2018
ER -