Bigger is not always better: The importance of human-scale language modeling for psycholinguistics

Ethan Gotlieb Wilcox, Michael Y. Hu, Aaron Mueller, Alex Warstadt, Leshem Choshen, Chengxu Zhuang, Adina Williams, Ryan Cotterell, Tal Linzen

    Research output: Contribution to journalArticlepeer-review

    Abstract

    When trained to place high probability on a training corpus, neural network language models can learn a surprising amount about language. Recent work has demonstrated that large performance improvements can arise from simply increasing, i.e., scaling, the size of the corpora they are trained on and the number of parameters in those models. Accordingly, many contemporary systems are trained on trillions of words. While largely beneficial to performance on language applications, scaling has several downsides for both computational psycholinguistics and natural language processing research. We discuss the scientific challenges presented by the scaling paradigm, as well as the benefits that would result from language models that can learn from human-scale data. In the second half of this paper, we report on findings from a recent effort to bring about human-scale language model pretraining: the first iteration of the BabyLM Challenge, a shared task organized by the authors that invited participants to train a language model on 100 million words or less. The challenge produced several concrete best practices for practitioners interested in small-scale language modeling. For cognitive scientists, the challenge demonstrated that robust linguistic generalizations can be learned by models trained on a human-scale dataset, though this is not yet achieved through cognitively plausible mechanisms. Furthermore, it established a population of “BabyLMs” that are all effective at data-efficient language learning. Studying such models can help us identify hypotheses for the computational mechanisms that underlie human language acquisition.

    Original languageEnglish (US)
    Article number104650
    JournalJournal of Memory and Language
    Volume144
    DOIs
    StatePublished - Oct 2025

    Keywords

    • Cognitive modeling
    • Connectionist networks
    • Language acquisition
    • Language modeling
    • Psycholinguistics
    • Scaling

    ASJC Scopus subject areas

    • Neuropsychology and Physiological Psychology
    • Language and Linguistics
    • Experimental and Cognitive Psychology
    • Linguistics and Language
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Bigger is not always better: The importance of human-scale language modeling for psycholinguistics'. Together they form a unique fingerprint.

    Cite this