PREDICTING INDUCTIVE BIASES OF PRE-TRAINED MODELS

Charles Lovering, Rohan Jha, Tal Linzen, Ellie Pavlick

    Research output: Contribution to conferencePaperpeer-review

    Abstract

    Most current NLP systems are based on a pre-train-then-fine-tune paradigm, in which a large neural network is first trained in a self-supervised way designed to encourage the network to extract broadly-useful linguistic features, and then fine-tuned for a specific task of interest. Recent work attempts to understand why this recipe works and explain when it fails. Currently, such analyses have produced two sets of apparently-contradictory results. Work that analyzes the representations that result from pre-training (via “probing classifiers”) finds evidence that rich features of linguistic structure can be decoded with high accuracy, but work that analyzes model behavior after fine-tuning (via “challenge sets”) indicates that decisions are often not based on such structure but rather on spurious heuristics specific to the training set. In this work, we test the hypothesis that the extent to which a feature influences a model's decisions can be predicted using a combination of two factors: The feature's extractability after pre-training (measured using information-theoretic probing techniques), and the evidence available during fine-tuning (defined as the feature's co-occurrence rate with the label). In experiments with both synthetic and naturalistic data, we find strong evidence (statistically significant correlations) supporting this hypothesis.

    Original languageEnglish (US)
    StatePublished - 2021
    Event9th International Conference on Learning Representations, ICLR 2021 - Virtual, Online
    Duration: May 3 2021May 7 2021

    Conference

    Conference9th International Conference on Learning Representations, ICLR 2021
    CityVirtual, Online
    Period5/3/215/7/21

    ASJC Scopus subject areas

    • Language and Linguistics
    • Computer Science Applications
    • Education
    • Linguistics and Language

    Fingerprint

    Dive into the research topics of 'PREDICTING INDUCTIVE BIASES OF PRE-TRAINED MODELS'. Together they form a unique fingerprint.

    Cite this