Studying the inductive biases of RNNs with synthetic variations of natural languages

Shauli Ravfogel, Yoav Goldberg, Tal Linzen

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    How do typological properties such as word order and morphological case marking affect the ability of neural sequence models to acquire the syntax of a language? Cross-linguistic comparisons of RNNs' syntactic performance (e.g., on subject-verb agreement prediction) are complicated by the fact that any two languages differ in multiple typological properties, as well as by differences in training corpus. We propose a paradigm that addresses these issues: we create synthetic versions of English, which differ from English in one or more typological parameters, and generate corpora for those languages based on a parsed English corpus. We report a series of experiments in which RNNs were trained to predict agreement features for verbs in each of those synthetic languages. Among other findings, (1) performance was higher in subject-verb-object order (as in English) than in subject-object-verb order (as in Japanese), suggesting that RNNs have a recency bias; (2) predicting agreement with both subject and object (polypersonal agreement) improves over predicting each separately, suggesting that underlying syntactic knowledge transfers across the two tasks; and (3) overt morphological case makes agreement prediction significantly easier, regardless of word order.

    Original languageEnglish (US)
    Title of host publicationLong and Short Papers
    PublisherAssociation for Computational Linguistics (ACL)
    Pages3532-3542
    Number of pages11
    ISBN (Electronic)9781950737130
    StatePublished - 2019
    Event2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019 - Minneapolis, United States
    Duration: Jun 2 2019Jun 7 2019

    Publication series

    NameNAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
    Volume1

    Conference

    Conference2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019
    CountryUnited States
    CityMinneapolis
    Period6/2/196/7/19

    ASJC Scopus subject areas

    • Language and Linguistics
    • Computer Science Applications
    • Linguistics and Language

    Fingerprint Dive into the research topics of 'Studying the inductive biases of RNNs with synthetic variations of natural languages'. Together they form a unique fingerprint.

    Cite this