TY - JOUR
T1 - Crowdsourced perceptual ratings of voice quality in people with parkinson’s disease before and after intensive voice and articulation therapies
T2 - Secondary outcome of a randomized controlled trial
AU - McAllister, Tara
AU - Nightingale, Christopher
AU - Moya-Galé, Gemma
AU - Kawamura, Ava
AU - Ramig, Lorraine Olson
N1 - Publisher Copyright:
© 2023, American Speech-Language-Hearing Association. All rights reserved.
PY - 2023/5
Y1 - 2023/5
N2 - Purpose: Limited research has examined the suitability of crowdsourced ratings to measure treatment effects in speakers with Parkinson’s disease (PD), particularly for constructs such as voice quality. This study obtained measures of reliability and validity for crowdsourced listeners’ ratings of voice quality in speech samples from a published study. We also investigated whether aggregated listener ratings would replicate the original study’s findings of treatment effects based on the Acoustic Voice Quality Index (AVQI) measure. Method: This study reports a secondary outcome measure of a randomized controlled trial with speakers with dysarthria associated with PD, including two active comparators (Lee Silverman Voice Treatment [LSVT LOUD] and LSVT ARTIC), an inactive compara-tor (untreated PD), and a healthy control group. Speech samples from three time points (pretreatment, posttreatment, and 6-month follow-up) were presented in random order for rating as “typical” or “atypical” with respect to voice quality. Untrained listeners were recruited through the Amazon Mechanical Turk crowdsourcing platform until each sam-ple had at least 25 ratings. Results: Intrarater reliability for tokens presented repeatedly was substantial (Cohen’s κ =.65–.70), and interrater agreement significantly exceeded chance level. There was a significant correlation of moderate magnitude between the AVQI and the proportion of listeners classifying a given sample as “typical.” Consistent with the original study, we found a significant interaction between group and time point, with the LSVT LOUD group alone showing significantly higher perceptually rated voice quality at posttreatment and follow-up relative to the pretreatment time point. Conclusions: These results suggest that crowdsourcing can be a valid means to evaluate clinical speech samples, even for less familiar constructs such as voice quality. The findings also replicate the results of the study by Moya-Galé et al. (2022) and support their functional relevance by demonstrating that the effects of treatment measured acoustically in that study are perceptually appar-ent to everyday listeners.
AB - Purpose: Limited research has examined the suitability of crowdsourced ratings to measure treatment effects in speakers with Parkinson’s disease (PD), particularly for constructs such as voice quality. This study obtained measures of reliability and validity for crowdsourced listeners’ ratings of voice quality in speech samples from a published study. We also investigated whether aggregated listener ratings would replicate the original study’s findings of treatment effects based on the Acoustic Voice Quality Index (AVQI) measure. Method: This study reports a secondary outcome measure of a randomized controlled trial with speakers with dysarthria associated with PD, including two active comparators (Lee Silverman Voice Treatment [LSVT LOUD] and LSVT ARTIC), an inactive compara-tor (untreated PD), and a healthy control group. Speech samples from three time points (pretreatment, posttreatment, and 6-month follow-up) were presented in random order for rating as “typical” or “atypical” with respect to voice quality. Untrained listeners were recruited through the Amazon Mechanical Turk crowdsourcing platform until each sam-ple had at least 25 ratings. Results: Intrarater reliability for tokens presented repeatedly was substantial (Cohen’s κ =.65–.70), and interrater agreement significantly exceeded chance level. There was a significant correlation of moderate magnitude between the AVQI and the proportion of listeners classifying a given sample as “typical.” Consistent with the original study, we found a significant interaction between group and time point, with the LSVT LOUD group alone showing significantly higher perceptually rated voice quality at posttreatment and follow-up relative to the pretreatment time point. Conclusions: These results suggest that crowdsourcing can be a valid means to evaluate clinical speech samples, even for less familiar constructs such as voice quality. The findings also replicate the results of the study by Moya-Galé et al. (2022) and support their functional relevance by demonstrating that the effects of treatment measured acoustically in that study are perceptually appar-ent to everyday listeners.
UR - http://www.scopus.com/inward/record.url?scp=85159727228&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85159727228&partnerID=8YFLogxK
U2 - 10.1044/2023_JSLHR-22-00694
DO - 10.1044/2023_JSLHR-22-00694
M3 - Article
C2 - 37059078
AN - SCOPUS:85159727228
SN - 1092-4388
VL - 66
SP - 1541
EP - 1562
JO - Journal of Speech, Language, and Hearing Research
JF - Journal of Speech, Language, and Hearing Research
IS - 5
ER -