Large-scale benchmark yields no evidence that language model surprisal explains syntactic disambiguation difficulty

Kuan Jung Huang, Suhas Arehalli, Mari Kugemoto, Christian Muxica, Grusha Prasad, Brian Dillon, Tal Linzen

    Research output: Contribution to journalArticlepeer-review


    Prediction has been proposed as an overarching principle that explains human information processing in language and beyond. To what degree can processing difficulty in syntactically complex sentences – one of the major concerns of psycholinguistics – be explained by predictability, as estimated using computational language models, and operationalized as surprisal (negative log probability)? A precise, quantitative test of this question requires a much larger scale data collection effort than has been done in the past. We present the Syntactic Ambiguity Processing Benchmark, a dataset of self-paced reading times from 2000 participants, who read a diverse set of complex English sentences. This dataset makes it possible to measure processing difficulty associated with individual syntactic constructions, and even individual sentences, precisely enough to rigorously test the predictions of computational models of language comprehension. By estimating the function that relates surprisal to reading times from filler items included in the experiment, we find that the predictions of language models with two different architectures sharply diverge from the empirical reading time data, dramatically underpredicting processing difficulty, failing to predict relative difficulty among different syntactic ambiguous constructions, and only partially explaining item-wise variability. These findings suggest that next-word prediction is most likely insufficient on its own to explain human syntactic processing.

    Original languageEnglish (US)
    Article number104510
    JournalJournal of Memory and Language
    StatePublished - Aug 2024


    • Language models
    • Prediction
    • Sentence processing
    • Surprisal

    ASJC Scopus subject areas

    • Neuropsychology and Physiological Psychology
    • Language and Linguistics
    • Experimental and Cognitive Psychology
    • Linguistics and Language
    • Artificial Intelligence


    Dive into the research topics of 'Large-scale benchmark yields no evidence that language model surprisal explains syntactic disambiguation difficulty'. Together they form a unique fingerprint.

    Cite this