TY - GEN
T1 - New protocols and negative results for textual entailment data collection
AU - Bowman, Samuel R.
AU - Palomaki, Jennimaria
AU - Soares, Livio Baldini
AU - Pitler, Emily
N1 - Publisher Copyright:
© 2020 Association for Computational Linguistics.
PY - 2020
Y1 - 2020
N2 - Natural language inference (NLI) data has proven useful in benchmarking and, especially, as pretraining data for tasks requiring language understanding. However, the crowdsourcing protocol that was used to collect this data has known issues and was not explicitly optimized for either of these purposes, so it is likely far from ideal. We propose four alternative protocols, each aimed at improving either the ease with which annotators can produce sound training examples or the quality and diversity of those examples. Using these alternatives and a fifth baseline protocol, we collect and compare five new 8.5k-example training sets. In evaluations focused on transfer learning applications, our results are solidly negative, with models trained on our baseline dataset yielding good transfer performance to downstream tasks, but none of our four new methods (nor the recent ANLI) showing any improvements over that baseline. In a small silver lining, we observe that all four new protocols, especially those where annotators edit pre-filled text boxes, reduce previously observed issues with annotation artifacts.
AB - Natural language inference (NLI) data has proven useful in benchmarking and, especially, as pretraining data for tasks requiring language understanding. However, the crowdsourcing protocol that was used to collect this data has known issues and was not explicitly optimized for either of these purposes, so it is likely far from ideal. We propose four alternative protocols, each aimed at improving either the ease with which annotators can produce sound training examples or the quality and diversity of those examples. Using these alternatives and a fifth baseline protocol, we collect and compare five new 8.5k-example training sets. In evaluations focused on transfer learning applications, our results are solidly negative, with models trained on our baseline dataset yielding good transfer performance to downstream tasks, but none of our four new methods (nor the recent ANLI) showing any improvements over that baseline. In a small silver lining, we observe that all four new protocols, especially those where annotators edit pre-filled text boxes, reduce previously observed issues with annotation artifacts.
UR - http://www.scopus.com/inward/record.url?scp=85102674686&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102674686&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85102674686
T3 - EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
SP - 8203
EP - 8214
BT - EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020
Y2 - 16 November 2020 through 20 November 2020
ER -