BBQ: A Hand-Built Bias Benchmark for Question Answering

Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut, Samuel R. Bowman

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    It is well documented that NLP models learn social biases, but little work has been done on how these biases manifest in model outputs for applied tasks like question answering (QA). We introduce the Bias Benchmark for QA (BBQ), a dataset of question sets constructed by the authors that highlight attested social biases against people belonging to protected classes along nine social dimensions relevant for U.S. English-speaking contexts. Our task evaluates model responses at two levels: (i) given an under-informative context, we test how strongly responses refect social biases, and (ii) given an adequately informative context, we test whether the model's biases override a correct answer choice. We fnd that models often rely on stereotypes when the context is under-informative, meaning the model's outputs consistently reproduce harmful biases in this setting. Though models are more accurate when the context provides an informative answer, they still rely on stereotypes and average up to 3.4 percentage points higher accuracy when the correct answer aligns with a social bias than when it conficts, with this difference widening to over 5 points on examples targeting gender for most models tested.

    Original languageEnglish (US)
    Title of host publicationACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022
    EditorsSmaranda Muresan, Preslav Nakov, Aline Villavicencio
    PublisherAssociation for Computational Linguistics (ACL)
    Pages2086-2105
    Number of pages20
    ISBN (Electronic)9781955917254
    StatePublished - 2022
    Event60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - Dublin, Ireland
    Duration: May 22 2022May 27 2022

    Publication series

    NameProceedings of the Annual Meeting of the Association for Computational Linguistics
    ISSN (Print)0736-587X

    Conference

    Conference60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
    Country/TerritoryIreland
    CityDublin
    Period5/22/225/27/22

    ASJC Scopus subject areas

    • Computer Science Applications
    • Linguistics and Language
    • Language and Linguistics

    Fingerprint

    Dive into the research topics of 'BBQ: A Hand-Built Bias Benchmark for Question Answering'. Together they form a unique fingerprint.

    Cite this