TY - GEN
T1 - Finding generalizable evidence by learning to convince Q&A models
AU - Perez, Ethan
AU - Karamcheti, Siddharth
AU - Fergus, Rob
AU - Weston, Jason
AU - Kiela, Douwe
AU - Cho, Kyunghyun
N1 - Funding Information:
EP was supported by the NSF Graduate Research Fellowship and ONR grant N00014-16-1-2698. KC thanks support from eBay and NVIDIA. We thank Adam Gleave, David Krueger, Geoffrey Irving, Katharina Kann, Nikita Nangia, and Sam Bowman for helpful conversations and feedback. We thank Jack Urbanek, Jason Lee, Ilia Kulikov, Ivanka Perez, Ivy Perez, and our Mechanical Turk workers for help with human evaluations.
Publisher Copyright:
© 2019 Association for Computational Linguistics
PY - 2019
Y1 - 2019
N2 - We propose a system that finds the strongest supporting evidence for a given answer to a question, using passage-based question-answering (QA) as a testbed. We train evidence agents to select the passage sentences that most convince a pretrained QA model of a given answer, if the QA model received those sentences instead of the full passage. Rather than finding evidence that convinces one model alone, we find that agents select evidence that generalizes; agent-chosen evidence increases the plausibility of the supported answer, as judged by other QA models and humans. Given its general nature, this approach improves QA in a robust manner: using agent-selected evidence (i) humans can correctly answer questions with only ~20% of the full passage and (ii) QA models can generalize to longer passages and harder questions.
AB - We propose a system that finds the strongest supporting evidence for a given answer to a question, using passage-based question-answering (QA) as a testbed. We train evidence agents to select the passage sentences that most convince a pretrained QA model of a given answer, if the QA model received those sentences instead of the full passage. Rather than finding evidence that convinces one model alone, we find that agents select evidence that generalizes; agent-chosen evidence increases the plausibility of the supported answer, as judged by other QA models and humans. Given its general nature, this approach improves QA in a robust manner: using agent-selected evidence (i) humans can correctly answer questions with only ~20% of the full passage and (ii) QA models can generalize to longer passages and harder questions.
UR - http://www.scopus.com/inward/record.url?scp=85081555163&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081555163&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85081555163
T3 - EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference
SP - 2402
EP - 2411
BT - EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference
PB - Association for Computational Linguistics
T2 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019
Y2 - 3 November 2019 through 7 November 2019
ER -