TY - CONF
T1 - PREDICTING INDUCTIVE BIASES OF PRE-TRAINED MODELS
AU - Lovering, Charles
AU - Jha, Rohan
AU - Linzen, Tal
AU - Pavlick, Ellie
N1 - Funding Information:
We would like to thank Michael Littman for helpful suggestions on how to better present our findings and Ian Tenney for insightful comments on a previous draft of this work. We also want to thank our reviewers were their detailed and helpful comments. This work is supported by DARPA under grant number HR00111990064. This research was conducted using computational resources and services at the Center for Computation and Visualization, Brown University.
Publisher Copyright:
© 2021 ICLR 2021 - 9th International Conference on Learning Representations. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Most current NLP systems are based on a pre-train-then-fine-tune paradigm, in which a large neural network is first trained in a self-supervised way designed to encourage the network to extract broadly-useful linguistic features, and then fine-tuned for a specific task of interest. Recent work attempts to understand why this recipe works and explain when it fails. Currently, such analyses have produced two sets of apparently-contradictory results. Work that analyzes the representations that result from pre-training (via “probing classifiers”) finds evidence that rich features of linguistic structure can be decoded with high accuracy, but work that analyzes model behavior after fine-tuning (via “challenge sets”) indicates that decisions are often not based on such structure but rather on spurious heuristics specific to the training set. In this work, we test the hypothesis that the extent to which a feature influences a model's decisions can be predicted using a combination of two factors: The feature's extractability after pre-training (measured using information-theoretic probing techniques), and the evidence available during fine-tuning (defined as the feature's co-occurrence rate with the label). In experiments with both synthetic and naturalistic data, we find strong evidence (statistically significant correlations) supporting this hypothesis.
AB - Most current NLP systems are based on a pre-train-then-fine-tune paradigm, in which a large neural network is first trained in a self-supervised way designed to encourage the network to extract broadly-useful linguistic features, and then fine-tuned for a specific task of interest. Recent work attempts to understand why this recipe works and explain when it fails. Currently, such analyses have produced two sets of apparently-contradictory results. Work that analyzes the representations that result from pre-training (via “probing classifiers”) finds evidence that rich features of linguistic structure can be decoded with high accuracy, but work that analyzes model behavior after fine-tuning (via “challenge sets”) indicates that decisions are often not based on such structure but rather on spurious heuristics specific to the training set. In this work, we test the hypothesis that the extent to which a feature influences a model's decisions can be predicted using a combination of two factors: The feature's extractability after pre-training (measured using information-theoretic probing techniques), and the evidence available during fine-tuning (defined as the feature's co-occurrence rate with the label). In experiments with both synthetic and naturalistic data, we find strong evidence (statistically significant correlations) supporting this hypothesis.
UR - http://www.scopus.com/inward/record.url?scp=85150305572&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85150305572&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85150305572
T2 - 9th International Conference on Learning Representations, ICLR 2021
Y2 - 3 May 2021 through 7 May 2021
ER -