Abstract
Objective: In this study, we aimed to evaluate interrater agreement statistics (IRAS) for use in research on low base rate clinical diagnoses or observed behaviors. Establishing and reporting sufficient interrater agreement is essential in such studies. Yet the most commonly applied agreement statistic, Cohen's, has a well known sensitivity to base rates that results in a substantial penalization of interrater agreement when behaviors or diagnoses are very uncommon, a prevalent and frustrating concern in such studies. Method: We performed Monte Carlo simulations to evaluate the performance of 5 of κ's alternatives (Van Eerdewegh's V, Yule's Y, Holley and Guilford's G, Scott's π, and Gwet's AC1), alongside κ itself. The simulations investigated the robustness of these IRAS to conditions that are common in clinical research, with varying levels of behavior or diagnosis base rate, rater bias, observed interrater agreement, and sample size. Results: When the base rate was 0.5, each IRAS provided similar estimates, particularly with unbiased raters. G was the least sensitive of the IRAS to base rates. Conclusions: The results encourage the use of the G statistic for its consistent performance across the simulation conditions. We recommend separately reporting the rates of agreement on the presence and absence of a behavior or diagnosis alongside G as an index of chance corrected overall agreement.
Original language | English (US) |
---|---|
Pages (from-to) | 1219-1227 |
Number of pages | 9 |
Journal | Journal of consulting and clinical psychology |
Volume | 82 |
Issue number | 6 |
DOIs | |
State | Published - 2014 |
Keywords
- Behavior observation
- Diagnosis
- Interrater agreement
- Low base rate
- Skew
ASJC Scopus subject areas
- Clinical Psychology
- Psychiatry and Mental health