Careless Whisper: Speech-to-Text Hallucination Harms

Allison Koenecke, Anna Seo Gyeong Choi, Katelyn X. Mei, Hilke Schellmann, Mona Sloane

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Speech-to-text services aim to transcribe input audio as accurately as possible. They increasingly play a role in everyday life, for example in personal voice assistants or in customer-company interactions. We evaluate Open AI's Whisper, a state-of-the-art automated speech recognition service outperforming industry competitors, as of 2023. While many of Whisper's transcriptions were highly accurate, we find that roughly 1% of audio transcriptions contained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio. We thematically analyze the Whisper-hallucinated content, finding that 38% of hallucinations include explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority. We then study why hallucinations occur by observing the disparities in hallucination rates between speakers with aphasia (who have a lowered ability to express themselves using speech and voice) and a control group. We find that hallucinations disproportionately occur for individuals who speak with longer shares of non-vocal durations - a common symptom of aphasia. We call on industry practitioners to ameliorate these language-model-based hallucinations in Whisper, and to raise awareness of potential biases amplified by hallucinations in downstream applications of speech-to-text models.

    Original languageEnglish (US)
    Title of host publication2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
    PublisherAssociation for Computing Machinery, Inc
    Pages1672-1681
    Number of pages10
    ISBN (Electronic)9798400704505
    DOIs
    StatePublished - Jun 3 2024
    Event2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024 - Rio de Janeiro, Brazil
    Duration: Jun 3 2024Jun 6 2024

    Publication series

    Name2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024

    Conference

    Conference2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024
    Country/TerritoryBrazil
    CityRio de Janeiro
    Period6/3/246/6/24

    Keywords

    • Algorithmic Fairness
    • Automated Speech Recognition
    • Generative AI
    • Thematic Coding

    ASJC Scopus subject areas

    • General Business, Management and Accounting

    Fingerprint

    Dive into the research topics of 'Careless Whisper: Speech-to-Text Hallucination Harms'. Together they form a unique fingerprint.

    Cite this