TY - JOUR
T1 - Using demographics toward efficient data classification in citizen science
T2 - A bayesian approach
AU - De Lellis, Pietro
AU - Nakayama, Shinnosuke
AU - Porfiri, Maurizio
N1 - Funding Information:
We would like to thank Tyrone J. Tolbert for developing the experimental platform, Marina Torre for collecting the data, and the three Reviewers for their constructive feedback that has helped improve the work and its presentation. P.D. wishes to thank the Dynamical Systems Laboratory at New York University for hosting him during the design of the research. This work was supported by the National Science Foundation CMMI 1644828. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2019 De Lellis et al.
PY - 2019
Y1 - 2019
N2 - Public participation in scientific activities, often called citizen science, offers a possibility to collect and analyze an unprecedentedly large amount of data. However, diversity of volunteers poses a challenge to obtain accurate information when these data are aggregated. To overcome this problem, we propose a classification algorithm using Bayesian inference that harnesses diversity of volunteers to improve data accuracy. In the algorithm, each volunteer is grouped into a distinct class based on a survey regarding either their level of education or motivation to citizen science. We obtained the behavior of each class through a training set, which was then used as a prior information to estimate performance of new volunteers. By applying this approach to an existing citizen science dataset to classify images into categories, we demonstrate improvement in data accuracy, compared to the traditional majority voting. Our algorithm offers a simple, yet powerful, way to improve data accuracy under limited effort of volunteers by predicting the behavior of a class of individuals, rather than attempting at a granular description of each of them.
AB - Public participation in scientific activities, often called citizen science, offers a possibility to collect and analyze an unprecedentedly large amount of data. However, diversity of volunteers poses a challenge to obtain accurate information when these data are aggregated. To overcome this problem, we propose a classification algorithm using Bayesian inference that harnesses diversity of volunteers to improve data accuracy. In the algorithm, each volunteer is grouped into a distinct class based on a survey regarding either their level of education or motivation to citizen science. We obtained the behavior of each class through a training set, which was then used as a prior information to estimate performance of new volunteers. By applying this approach to an existing citizen science dataset to classify images into categories, we demonstrate improvement in data accuracy, compared to the traditional majority voting. Our algorithm offers a simple, yet powerful, way to improve data accuracy under limited effort of volunteers by predicting the behavior of a class of individuals, rather than attempting at a granular description of each of them.
KW - Algorithms
KW - Bayesian estimation
KW - Citizen science
KW - Data classification
UR - http://www.scopus.com/inward/record.url?scp=85077282863&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077282863&partnerID=8YFLogxK
U2 - 10.7717/peerj-cs.239
DO - 10.7717/peerj-cs.239
M3 - Article
AN - SCOPUS:85077282863
SN - 2376-5992
VL - 5
JO - PeerJ Computer Science
JF - PeerJ Computer Science
M1 - e239
ER -