TY - JOUR
T1 - Identifying unreliable and adversarial workers in crowdsourced labeling tasks
AU - Jagabathula, Srikanth
AU - Subramanian, Lakshminarayanan
AU - Venkataraman, Ashwin
N1 - Funding Information:
Ashwin Venkataraman is partially supported by NSF CAREER grant CMMI 1454310. We gratefully acknowledge the thoughtful and detailed feedback of the anonymous referees which led to a substantial improvement in the quality, organization, and completeness of the paper. The authors would also like to thank Sewoong Oh and Kyunghyun Cho for insightful discussions that helped improve the paper.
Publisher Copyright:
© 2017 Srikanth Jagabathula, Lakshminarayanan Subramanian and Ashwin Venkataraman.
PY - 2017/9/1
Y1 - 2017/9/1
N2 - We study the problem of identifying unreliable and adversarial workers in crowdsourcing systems where workers (or users) provide labels for tasks (or items). Most existing studies assume that worker responses follow specific probabilistic models; however, recent evidence shows the presence of workers adopting non-random or even malicious strategies. To account for such workers, we suppose that workers comprise a mixture of honest and adversarial workers. Honest workers may be reliable or unreliable, and they provide labels according to an unknown but explicit probabilistic model. Adversaries adopt labeling strategies different from those of honest workers, whether probabilistic or not. We propose two reputation algorithms to identify unreliable honest workers and adversarial workers from only their responses. Our algorithms assume that honest workers are in the majority, and they classify workers with outlier label patterns as adversaries. Theoretically, we show that our algorithms successfully identify unreliable honest workers, workers adopting deterministic strategies, and worst-case sophisticated adversaries who can adopt arbitrary labeling strategies to degrade the accuracy of the inferred task labels. Empirically, we show that filtering out outliers using our algorithms can significantly improve the accuracy of several state-of-the-art label aggregation algorithms in real-world crowdsourcing datasets.
AB - We study the problem of identifying unreliable and adversarial workers in crowdsourcing systems where workers (or users) provide labels for tasks (or items). Most existing studies assume that worker responses follow specific probabilistic models; however, recent evidence shows the presence of workers adopting non-random or even malicious strategies. To account for such workers, we suppose that workers comprise a mixture of honest and adversarial workers. Honest workers may be reliable or unreliable, and they provide labels according to an unknown but explicit probabilistic model. Adversaries adopt labeling strategies different from those of honest workers, whether probabilistic or not. We propose two reputation algorithms to identify unreliable honest workers and adversarial workers from only their responses. Our algorithms assume that honest workers are in the majority, and they classify workers with outlier label patterns as adversaries. Theoretically, we show that our algorithms successfully identify unreliable honest workers, workers adopting deterministic strategies, and worst-case sophisticated adversaries who can adopt arbitrary labeling strategies to degrade the accuracy of the inferred task labels. Empirically, we show that filtering out outliers using our algorithms can significantly improve the accuracy of several state-of-the-art label aggregation algorithms in real-world crowdsourcing datasets.
KW - Adversary
KW - Crowdsourcing
KW - Outliers
KW - Reputation
UR - http://www.scopus.com/inward/record.url?scp=85032972309&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85032972309&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85032972309
SN - 1532-4435
VL - 18
SP - 1
EP - 67
JO - Journal of Machine Learning Research
JF - Journal of Machine Learning Research
ER -