TY - GEN
T1 - (QA)2
T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
AU - Kim, Najoung
AU - Htut, Phu Mon
AU - Bowman, Samuel R.
AU - Petty, Jackson
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Naturally occurring information-seeking questions often contain questionable assumptions-assumptions that are false or unverifiable. Questions containing questionable assumptions are challenging because they require a distinct answer strategy that deviates from typical answers for information-seeking questions. For instance, the question When did Marie Curie discover Uranium? cannot be answered as a typical when question without addressing the false assumption Marie Curie discovered Uranium. In this work, we propose (QA)2 (Question Answering with Questionable Assumptions), an open-domain evaluation dataset consisting of naturally occurring search engine queries that may or may not contain questionable assumptions. To be successful on (QA)2, systems must be able to detect questionable assumptions and also be able to produce adequate responses for both typical information-seeking questions and ones with questionable assumptions. Through human rater acceptability on end-to-end QA with (QA)2, we find that current models do struggle with handling questionable assumptions, leaving substantial headroom for progress.
AB - Naturally occurring information-seeking questions often contain questionable assumptions-assumptions that are false or unverifiable. Questions containing questionable assumptions are challenging because they require a distinct answer strategy that deviates from typical answers for information-seeking questions. For instance, the question When did Marie Curie discover Uranium? cannot be answered as a typical when question without addressing the false assumption Marie Curie discovered Uranium. In this work, we propose (QA)2 (Question Answering with Questionable Assumptions), an open-domain evaluation dataset consisting of naturally occurring search engine queries that may or may not contain questionable assumptions. To be successful on (QA)2, systems must be able to detect questionable assumptions and also be able to produce adequate responses for both typical information-seeking questions and ones with questionable assumptions. Through human rater acceptability on end-to-end QA with (QA)2, we find that current models do struggle with handling questionable assumptions, leaving substantial headroom for progress.
UR - http://www.scopus.com/inward/record.url?scp=85174400975&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174400975&partnerID=8YFLogxK
U2 - 10.18653/v1/2023.acl-long.472
DO - 10.18653/v1/2023.acl-long.472
M3 - Conference contribution
AN - SCOPUS:85174400975
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 8466
EP - 8487
BT - Long Papers
PB - Association for Computational Linguistics (ACL)
Y2 - 9 July 2023 through 14 July 2023
ER -