Abstract
When trained to place high probability on a training corpus, neural network language models can learn a surprising amount about language. Recent work has demonstrated that large performance improvements can arise from simply increasing, i.e., scaling, the size of the corpora they are trained on and the number of parameters in those models. Accordingly, many contemporary systems are trained on trillions of words. While largely beneficial to performance on language applications, scaling has several downsides for both computational psycholinguistics and natural language processing research. We discuss the scientific challenges presented by the scaling paradigm, as well as the benefits that would result from language models that can learn from human-scale data. In the second half of this paper, we report on findings from a recent effort to bring about human-scale language model pretraining: the first iteration of the BabyLM Challenge, a shared task organized by the authors that invited participants to train a language model on 100 million words or less. The challenge produced several concrete best practices for practitioners interested in small-scale language modeling. For cognitive scientists, the challenge demonstrated that robust linguistic generalizations can be learned by models trained on a human-scale dataset, though this is not yet achieved through cognitively plausible mechanisms. Furthermore, it established a population of “BabyLMs” that are all effective at data-efficient language learning. Studying such models can help us identify hypotheses for the computational mechanisms that underlie human language acquisition.
Original language | English (US) |
---|---|
Article number | 104650 |
Journal | Journal of Memory and Language |
Volume | 144 |
DOIs | |
State | Published - Oct 2025 |
Keywords
- Cognitive modeling
- Connectionist networks
- Language acquisition
- Language modeling
- Psycholinguistics
- Scaling
ASJC Scopus subject areas
- Neuropsychology and Physiological Psychology
- Language and Linguistics
- Experimental and Cognitive Psychology
- Linguistics and Language
- Artificial Intelligence