Characterizing Verbatim Short-Term Memory in Neural Language Models

Kristijan Armeni, Christopher Honey, Tal Linzen

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    When a language model is trained to predict natural language sequences, its prediction at each moment depends on a representation of prior context. What kind of information about the prior context can language models retrieve? We tested whether language models could retrieve the exact words that occurred previously in a text. In our paradigm, language models (transformers and an LSTM) processed English text in which a list of nouns occurred twice. We operationalized retrieval as the reduction in surprisal from the first to the second list. We found that the transformers retrieved both the identity and ordering of nouns from the first list. Further, the transformers' retrieval was markedly enhanced when they were trained on a larger corpus and with greater model depth. Lastly, their ability to index prior tokens was dependent on learned attention patterns. In contrast, the LSTM exhibited less precise retrieval, which was limited to list-initial tokens and to short intervening texts. The LSTM's retrieval was not sensitive to the order of nouns and it improved when the list was semantically coherent. We conclude that transformers implemented something akin to a working memory system that could flexibly retrieve individual token representations across arbitrary delays; conversely, the LSTM maintained a coarser and more rapidly-decaying semantic gist of prior tokens, weighted toward the earliest items.

    Original languageEnglish (US)
    Title of host publicationCoNLL 2022 - 26th Conference on Computational Natural Language Learning, Proceedings of the Conference
    PublisherAssociation for Computational Linguistics (ACL)
    Pages405-424
    Number of pages20
    ISBN (Electronic)9781959429074
    StatePublished - 2022
    Event26th Conference on Computational Natural Language Learning, CoNLL 2022 collocated and co-organized with EMNLP 2022 - Abu Dhabi, United Arab Emirates
    Duration: Dec 7 2022Dec 8 2022

    Publication series

    NameCoNLL 2022 - 26th Conference on Computational Natural Language Learning, Proceedings of the Conference

    Conference

    Conference26th Conference on Computational Natural Language Learning, CoNLL 2022 collocated and co-organized with EMNLP 2022
    Country/TerritoryUnited Arab Emirates
    CityAbu Dhabi
    Period12/7/2212/8/22

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Human-Computer Interaction
    • Linguistics and Language

    Fingerprint

    Dive into the research topics of 'Characterizing Verbatim Short-Term Memory in Neural Language Models'. Together they form a unique fingerprint.

    Cite this