TY - GEN
T1 - The RepEval 2017 Shared Task
T2 - 2nd Workshop on Evaluating Vector-Space Representations for NLP, RepEval 2017
AU - Nangia, Nikita
AU - Williams, Adina
AU - Lazaridou, Angeliki
AU - Bowman, Samuel R.
N1 - Funding Information:
This work was made possible by a Google Faculty Research Award to Sam Bowman and Ange-liki Lazaridou, and was also supported by a gift from Tencent Holdings. Allyson Ettinger contributed the supplementary probe sentences. We also thank George Dahl and the organizers of the RepEval 2016 and RepEval 2017 workshops for their help and advice.
Publisher Copyright:
© 2017 Association for Computational Linguistics.
PY - 2017
Y1 - 2017
N2 - This paper presents the results of the RepEval 2017 Shared Task, which evaluated neural network sentence representation learning models on the Multi- Genre Natural Language Inference corpus (MultiNLI) recently introduced by Williams et al. (2017). All of the five participating teams beat the bidirectional LSTM (BiLSTM) and continuous bag of words baselines reported in Williams et al.. The best single model used stacked BiLSTMs with residual connections to extract sentence features and reached 74.5% accuracy on the genre-matched test set. Surprisingly, the results of the competition were fairly consistent across the genrematched and genre-mismatched test sets, and across subsets of the test data representing a variety of linguistic phenomena, suggesting that all of the submitted systems learned reasonably domainindependent representations for sentence meaning.
AB - This paper presents the results of the RepEval 2017 Shared Task, which evaluated neural network sentence representation learning models on the Multi- Genre Natural Language Inference corpus (MultiNLI) recently introduced by Williams et al. (2017). All of the five participating teams beat the bidirectional LSTM (BiLSTM) and continuous bag of words baselines reported in Williams et al.. The best single model used stacked BiLSTMs with residual connections to extract sentence features and reached 74.5% accuracy on the genre-matched test set. Surprisingly, the results of the competition were fairly consistent across the genrematched and genre-mismatched test sets, and across subsets of the test data representing a variety of linguistic phenomena, suggesting that all of the submitted systems learned reasonably domainindependent representations for sentence meaning.
UR - http://www.scopus.com/inward/record.url?scp=85122940522&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85122940522&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85122940522
T3 - RepEval 2017 - 2nd Workshop on Evaluating Vector-Space Representations for NLP, Proceedings of the Workshop
SP - 1
EP - 10
BT - RepEval 2017 - 2nd Workshop on Evaluating Vector-Space Representations for NLP, Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
Y2 - 8 September 2017
ER -