Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions

Alicia Parrish, Harsh Trivedi, Ethan Perez, Angelica Chen, Nikita Nangia, Jason Phang, Samuel R. Bowman

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Current QA systems can generate reasonable-sounding yet false answers without explanation or evidence for the generated answer, which is especially problematic when humans cannot readily check the model’s answers. This presents a challenge for building trust in machine learning systems. We take inspiration from real-world situations where diffcult questions are answered by considering opposing sides (see Irving et al., 2018). For multiple-choice QA examples, we build a dataset of single arguments for both a correct and incorrect answer option in a debate-style set-up as an initial step in training models to produce explanations for two candidate answers. We use long contexts—humans familiar with the context write convincing explanations for preselected correct and incorrect answers, and we test if those explanations allow humans who have not read the full context to more accurately determine the correct answer. We do not fnd that explanations in our set-up improve human accuracy, but a baseline condition shows that providing human-selected text snippets does improve accuracy. We use these fndings to suggest ways of improving the debate set up for future data collection efforts.

    Original languageEnglish (US)
    Title of host publicationLNLS 2022 - 1st Workshop on Learning with Natural Language Supervision, Proceedings of the Workshop
    EditorsJacob Andreas, Karthik Narasimhan, Aida Nematzadeh
    PublisherAssociation for Computational Linguistics (ACL)
    Pages17-28
    Number of pages12
    ISBN (Electronic)9781955917452
    StatePublished - 2022
    Event1st Workshop on Learning with Natural Language Supervision, LNLS 2022 - Dublin, Ireland
    Duration: May 26 2022 → …

    Publication series

    NameLNLS 2022 - 1st Workshop on Learning with Natural Language Supervision, Proceedings of the Workshop

    Conference

    Conference1st Workshop on Learning with Natural Language Supervision, LNLS 2022
    Country/TerritoryIreland
    CityDublin
    Period5/26/22 → …

    ASJC Scopus subject areas

    • Computational Theory and Mathematics
    • Computer Science Applications
    • Software

    Fingerprint

    Dive into the research topics of 'Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions'. Together they form a unique fingerprint.

    Cite this