TY - JOUR
T1 - Design sensitivity and statistical power in acceptability judgment experiments
AU - Sprouse, Jon
AU - Almeida, Diogo
N1 - Funding Information:
This material is based upon work supported by the National Science Foundation under grant no. BCS-0843896 to JS. We would like to thank two anonymous reviewers for helpful comments on an earlier draft. We would also like to thank audiences at the École Normale Supérieure, Michigan State University, the University of Michigan, Princeton University, Harvard University, and Johns Hopkins University. We would also like to thank Carson T. Schütze for assistance in creating the test materials used in this study. All errors remain our own.
Publisher Copyright:
© 2017 The Author(s).
PY - 2017
Y1 - 2017
N2 - Previous investigations into the validity of acceptability judgment data have focused almost exclusively on type I errors (or false positives) because of the consequences of such errors for syntactic theories (Sprouse & Almeida 2012; Sprouse et al. 2013). The current study complements these previous studies by systematically investigating the type II error rate (false negatives), or equivalently, the statistical power, of a wide cross-section of possible acceptability judgment experiments. Though type II errors have historically been assumed to be less costly than type I errors, the dynamics of scientific publishing mean that high type II error rates (i.e., studies with low statistical power) can lead to increases in type I error rates in a given field of study. We present a set of experiments and resampling simulations to estimate statistical power for four tasks (forced-choice, Likert scale, magnitude estimation, and yes-no), 50 effect sizes instantiated by real phenomena, sample sizes from 5 to 100 participants, and two approaches to statistical analysis (null hypothesis and Bayesian). Our goals are twofold (i) to provide a fuller picture of the status of acceptability judgment data in syntax, and (ii) to provide detailed information that syntacticians can use to design and evaluate the sensitivity of acceptability judgment experiments in their own research.
AB - Previous investigations into the validity of acceptability judgment data have focused almost exclusively on type I errors (or false positives) because of the consequences of such errors for syntactic theories (Sprouse & Almeida 2012; Sprouse et al. 2013). The current study complements these previous studies by systematically investigating the type II error rate (false negatives), or equivalently, the statistical power, of a wide cross-section of possible acceptability judgment experiments. Though type II errors have historically been assumed to be less costly than type I errors, the dynamics of scientific publishing mean that high type II error rates (i.e., studies with low statistical power) can lead to increases in type I error rates in a given field of study. We present a set of experiments and resampling simulations to estimate statistical power for four tasks (forced-choice, Likert scale, magnitude estimation, and yes-no), 50 effect sizes instantiated by real phenomena, sample sizes from 5 to 100 participants, and two approaches to statistical analysis (null hypothesis and Bayesian). Our goals are twofold (i) to provide a fuller picture of the status of acceptability judgment data in syntax, and (ii) to provide detailed information that syntacticians can use to design and evaluate the sensitivity of acceptability judgment experiments in their own research.
KW - Acceptability judgments
KW - Experimental syntax
KW - Linguistic methodology
KW - Statistical power
UR - http://www.scopus.com/inward/record.url?scp=85061592244&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061592244&partnerID=8YFLogxK
U2 - 10.5334/gjgl.236
DO - 10.5334/gjgl.236
M3 - Article
VL - 2
SP - 1
JO - Glossa: a journal of general linguistics
JF - Glossa: a journal of general linguistics
IS - 1
M1 - 14
ER -