TY - JOUR
T1 - Finding the experts in the crowd
T2 - Validity and reliability of crowdsourced measures of children’s gradient speech contrasts
AU - Harel, Daphna
AU - Hitchcock, Elaine Russo
AU - Szeredi, Daniel
AU - Ortiz, José
AU - McAllister Byun, Tara
N1 - Publisher Copyright:
© 2017 Taylor & Francis.
PY - 2017/1/2
Y1 - 2017/1/2
N2 - Perceptual ratings aggregated across multiple nonexpert listeners can be used to measure covert contrast in child speech. Online crowdsourcing provides access to a large pool of raters, but for practical purposes, researchers may wish to use smaller samples. The ratings obtained from these smaller samples may not maintain the high levels of validity seen in larger samples. This study aims to measure the validity and reliability of crowdsourced continuous ratings of child speech, obtained through Visual Analog Scaling, and to identify ways to improve these measurements. We first assess overall validity and interrater reliability for measurements obtained from a large set of raters. Second, we investigate two rater-level measures of quality, individual validity and intrarater reliability, and examine the relationship between them. Third, we show that these estimates may be used to establish guidelines for the inclusion of raters, thus impacting the quality of results obtained when smaller samples are used.
AB - Perceptual ratings aggregated across multiple nonexpert listeners can be used to measure covert contrast in child speech. Online crowdsourcing provides access to a large pool of raters, but for practical purposes, researchers may wish to use smaller samples. The ratings obtained from these smaller samples may not maintain the high levels of validity seen in larger samples. This study aims to measure the validity and reliability of crowdsourced continuous ratings of child speech, obtained through Visual Analog Scaling, and to identify ways to improve these measurements. We first assess overall validity and interrater reliability for measurements obtained from a large set of raters. Second, we investigate two rater-level measures of quality, individual validity and intrarater reliability, and examine the relationship between them. Third, we show that these estimates may be used to establish guidelines for the inclusion of raters, thus impacting the quality of results obtained when smaller samples are used.
KW - Child speech ratings
KW - covert contrasts
KW - reliability
KW - validity
UR - http://www.scopus.com/inward/record.url?scp=84973626421&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84973626421&partnerID=8YFLogxK
U2 - 10.3109/02699206.2016.1174306
DO - 10.3109/02699206.2016.1174306
M3 - Article
C2 - 27267258
AN - SCOPUS:84973626421
SN - 0269-9206
VL - 31
SP - 104
EP - 117
JO - Clinical Linguistics and Phonetics
JF - Clinical Linguistics and Phonetics
IS - 1
ER -