COGS: A compositional generalization challenge based on semantic interpretation

Najoung Kim, Tal Linzen

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Natural language is characterized by compositionality: the meaning of a complex expression is constructed from the meanings of its constituent parts. To facilitate the evaluation of the compositional abilities of language processing architectures, we introduce COGS, a semantic parsing dataset based on a fragment of English. The evaluation portion of COGS contains multiple systematic gaps that can only be addressed by compositional generalization; these include new combinations of familiar syntactic structures, or new combinations of familiar words and familiar structures. In experiments with Transformers and LSTMs, we found that in-distribution accuracy on the COGS test set was near-perfect (96-99%), but generalization accuracy was substantially lower (16-35%) and showed high sensitivity to random seed (±6-8%). These findings indicate that contemporary standard NLP models are limited in their compositional generalization capacity, and position COGS as a good way to measure progress.

    Original languageEnglish (US)
    Title of host publicationEMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
    PublisherAssociation for Computational Linguistics (ACL)
    Pages9087-9105
    Number of pages19
    ISBN (Electronic)9781952148606
    DOIs
    StatePublished - 2020
    Event2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 - Virtual, Online
    Duration: Nov 16 2020Nov 20 2020

    Publication series

    NameEMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

    Conference

    Conference2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020
    CityVirtual, Online
    Period11/16/2011/20/20

    ASJC Scopus subject areas

    • Information Systems
    • Computer Science Applications
    • Computational Theory and Mathematics

    Fingerprint

    Dive into the research topics of 'COGS: A compositional generalization challenge based on semantic interpretation'. Together they form a unique fingerprint.

    Cite this