Interrater agreement statistics with skewed data: Evaluation of alternatives to Cohen's kappa

Shu Xu, Michael F. Lorber

Research output: Contribution to journalArticlepeer-review

Abstract

Objective: In this study, we aimed to evaluate interrater agreement statistics (IRAS) for use in research on low base rate clinical diagnoses or observed behaviors. Establishing and reporting sufficient interrater agreement is essential in such studies. Yet the most commonly applied agreement statistic, Cohen's, has a well known sensitivity to base rates that results in a substantial penalization of interrater agreement when behaviors or diagnoses are very uncommon, a prevalent and frustrating concern in such studies. Method: We performed Monte Carlo simulations to evaluate the performance of 5 of κ's alternatives (Van Eerdewegh's V, Yule's Y, Holley and Guilford's G, Scott's π, and Gwet's AC1), alongside κ itself. The simulations investigated the robustness of these IRAS to conditions that are common in clinical research, with varying levels of behavior or diagnosis base rate, rater bias, observed interrater agreement, and sample size. Results: When the base rate was 0.5, each IRAS provided similar estimates, particularly with unbiased raters. G was the least sensitive of the IRAS to base rates. Conclusions: The results encourage the use of the G statistic for its consistent performance across the simulation conditions. We recommend separately reporting the rates of agreement on the presence and absence of a behavior or diagnosis alongside G as an index of chance corrected overall agreement.

Original languageEnglish (US)
Pages (from-to)1219-1227
Number of pages9
JournalJournal of consulting and clinical psychology
Volume82
Issue number6
DOIs
StatePublished - 2014

Keywords

  • Behavior observation
  • Diagnosis
  • Interrater agreement
  • Low base rate
  • Skew

ASJC Scopus subject areas

  • Clinical Psychology
  • Psychiatry and Mental health

Fingerprint

Dive into the research topics of 'Interrater agreement statistics with skewed data: Evaluation of alternatives to Cohen's kappa'. Together they form a unique fingerprint.

Cite this