TY - JOUR
T1 - Producing knowledge by admitting ignorance
T2 - Enhancing data quality through an “I don’t know” option in citizen science
AU - Torre, Marina
AU - Nakayama, Shinnosuke
AU - Tolbert, Tyrone J.
AU - Porfiri, Maurizio
N1 - Funding Information:
This work was supported by: MP, CMMI-1644828, National Science Foundation, https:// www.nsf.gov/awardsearch/showAward?AWD_ID= 1644828; and MP, CBET-1547864, National Science Foundation, https://www.nsf.gov/ awardsearch/showAward?AWD_ID=1547864. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank the Dynamical Systems Laboratory at New York University Tandon School of Engineering as a whole for providing useful insight during the analysis of the data.
Publisher Copyright:
© 2019 Torre et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2019/2
Y1 - 2019/2
N2 - The “noisy labeler problem” in crowdsourced data has attracted great attention in recent years, with important ramifications in citizen science, where non-experts must produce high-quality data. Particularly relevant to citizen science is dynamic task allocation, in which the level of agreement among labelers can be progressively updated through the information-theoretic notion of entropy. Under dynamic task allocation, we hypothesized that providing volunteers with an “I don’t know” option would contribute to enhancing data quality, by introducing further, useful information about the level of agreement among volunteers. We investigated the influence of an “I don’t know” option on the data quality in a citizen science project that entailed classifying the image of a highly polluted canal into “threat” or “no threat” to the environment. Our results show that an “I don’t know” option can enhance accuracy, compared to the case without the option; such an improvement mostly affects the true negative rather than the true positive rate. In an information-theoretic sense, these seemingly meaningless blank votes constitute a meaningful piece of information to help enhance accuracy of data in citizen science.
AB - The “noisy labeler problem” in crowdsourced data has attracted great attention in recent years, with important ramifications in citizen science, where non-experts must produce high-quality data. Particularly relevant to citizen science is dynamic task allocation, in which the level of agreement among labelers can be progressively updated through the information-theoretic notion of entropy. Under dynamic task allocation, we hypothesized that providing volunteers with an “I don’t know” option would contribute to enhancing data quality, by introducing further, useful information about the level of agreement among volunteers. We investigated the influence of an “I don’t know” option on the data quality in a citizen science project that entailed classifying the image of a highly polluted canal into “threat” or “no threat” to the environment. Our results show that an “I don’t know” option can enhance accuracy, compared to the case without the option; such an improvement mostly affects the true negative rather than the true positive rate. In an information-theoretic sense, these seemingly meaningless blank votes constitute a meaningful piece of information to help enhance accuracy of data in citizen science.
UR - http://www.scopus.com/inward/record.url?scp=85062166548&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062166548&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0211907
DO - 10.1371/journal.pone.0211907
M3 - Article
C2 - 30811452
AN - SCOPUS:85062166548
SN - 1932-6203
VL - 14
JO - PloS one
JF - PloS one
IS - 2
M1 - e0211907
ER -