Intermediate-task transfer learning with pretrained models for natural language understanding: When and why does it work?

Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target task. However, it is still poorly understood when and why intermediate-task training is beneficial for a given target task. To investigate this, we perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations. We further evaluate all trained models with 25 probing tasks meant to reveal the specific skills that drive transfer. We observe that intermediate tasks requiring high-level inference and reasoning abilities tend to work best. We also observe that target task performance is strongly correlated with higher-level abilities such as coreference resolution. However, we fail to observe more granular correlations between probing and target task performance, highlighting the need for further work on broad-coverage probing benchmarks. We also observe evidence that the forgetting of knowledge learned during pretraining may limit our analysis, highlighting the need for further work on transfer learning methods in these settings.

    Original languageEnglish (US)
    Title of host publicationACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
    PublisherAssociation for Computational Linguistics (ACL)
    Pages5231-5247
    Number of pages17
    ISBN (Electronic)9781952148255
    StatePublished - 2020
    Event58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 - Virtual, Online, United States
    Duration: Jul 5 2020Jul 10 2020

    Publication series

    NameProceedings of the Annual Meeting of the Association for Computational Linguistics
    ISSN (Print)0736-587X

    Conference

    Conference58th Annual Meeting of the Association for Computational Linguistics, ACL 2020
    Country/TerritoryUnited States
    CityVirtual, Online
    Period7/5/207/10/20

    ASJC Scopus subject areas

    • Computer Science Applications
    • Linguistics and Language
    • Language and Linguistics

    Fingerprint

    Dive into the research topics of 'Intermediate-task transfer learning with pretrained models for natural language understanding: When and why does it work?'. Together they form a unique fingerprint.

    Cite this