TY - CONF
T1 - What do you learn from context? Probing for sentence structure in contextualized word representations
AU - Tenney, Ian
AU - Xia, Patrick
AU - Chen, Berlin
AU - Wang, Alex
AU - Poliak, Adam
AU - Thomas McCoy, R.
AU - Kim, Najoung
AU - Van Durme, Benjamin
AU - Bowman, Samuel R.
AU - Das, Dipanjan
AU - Pavlick, Ellie
N1 - Funding Information:
This work was conducted in part at the 2018 Frederick Jelinek Memorial Summer Workshop on Speech and Language Technologies, and supported by Johns Hopkins University with unrestricted gifts from Amazon, Facebook, Google, Microsoft and Mitsubishi Electric Research Laboratories, as well as a team-specific donation of computing resources from Google. PX, AP, and BVD were supported by DARPA AIDA and LORELEI. Special thanks to Jacob Devlin for providing checkpoints of GPT model trained on the BWB corpus, and to the members of the Google AI Language team for many productive discussions.
Publisher Copyright:
© 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved.
PY - 2019/1/1
Y1 - 2019/1/1
N2 - Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of downstream NLP tasks. Building on recent token-level probing work, we introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline. We probe word-level contextual representations from four recent models and investigate how they encode sentence structure across a range of syntactic, semantic, local, and long-range phenomena. We find that existing models trained on language modeling and translation produce strong representations for syntactic phenomena, but only offer comparably small improvements on semantic tasks over a non-contextual baseline.
AB - Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of downstream NLP tasks. Building on recent token-level probing work, we introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline. We probe word-level contextual representations from four recent models and investigate how they encode sentence structure across a range of syntactic, semantic, local, and long-range phenomena. We find that existing models trained on language modeling and translation produce strong representations for syntactic phenomena, but only offer comparably small improvements on semantic tasks over a non-contextual baseline.
UR - http://www.scopus.com/inward/record.url?scp=85071171622&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85071171622&partnerID=8YFLogxK
M3 - Paper
T2 - 7th International Conference on Learning Representations, ICLR 2019
Y2 - 6 May 2019 through 9 May 2019
ER -