How to Plant Trees in LMs: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases

Aaron Mueller, Tal Linzen

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Accurate syntactic representations are essential for robust generalization in natural language. Recent work has found that pre-training can teach language models to rely on hierarchical syntactic features-as opposed to incorrect linear features-when performing tasks after finetuning. We test what aspects of pre-training are important for endowing encoder-decoder Transformers with an inductive bias that favors hierarchical syntactic generalizations. We focus on architectural features (depth, width, and number of parameters), as well as the genre and size of the pre-training corpus, diagnosing inductive biases using two syntactic transformation tasks: question formation and passivization, both in English. We find that the number of parameters alone does not explain hierarchical generalization: model depth plays greater role than model width. We also find that pre-training on simpler language, such as child-directed speech, induces a hierarchical bias using an order-of-magnitude less data than pre-training on more typical datasets based on web text or Wikipedia; this suggests that in cognitively plausible language acquisition settings, neural language models may be more data-efficient than previously thought.

    Original languageEnglish (US)
    Title of host publicationLong Papers
    PublisherAssociation for Computational Linguistics (ACL)
    Pages11237-11252
    Number of pages16
    ISBN (Electronic)9781959429722
    StatePublished - 2023
    Event61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada
    Duration: Jul 9 2023Jul 14 2023

    Publication series

    NameProceedings of the Annual Meeting of the Association for Computational Linguistics
    Volume1
    ISSN (Print)0736-587X

    Conference

    Conference61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
    Country/TerritoryCanada
    CityToronto
    Period7/9/237/14/23

    ASJC Scopus subject areas

    • Computer Science Applications
    • Linguistics and Language
    • Language and Linguistics

    Fingerprint

    Dive into the research topics of 'How to Plant Trees in LMs: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases'. Together they form a unique fingerprint.

    Cite this