Breaking the closed-world assumption in stylometric authorship attribution

Ariel Stolerman, Rebekah Overdorf, Sadia Afroz, Rachel Greenstadt

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Stylometry is a form of authorship attribution that relies on the linguistic information found in a document. While there has been significant work in stylometry, most research focuses on the closed-world problem where the author of the document is in a known suspect set. For open-world problems where the author may not be in the suspect set, traditional classification methods are ineffective. This paper proposes the “classify-verify” method that augments classification with a binary verification step evaluated on stylometric datasets. This method, which can be generalized to any domain, significantly outperforms traditional classifiers in open-world settings and yields an F1-score of 0.87, comparable to traditional classifiers in closed-world settings. Moreover, the method successfully detects adversarial documents where authors deliberately change their styles, a problem for which closed-world classifiers fail.

    Original languageEnglish (US)
    Title of host publicationAdvances in Digital Forensics X - 10th IFIP WG 11.9 International Conference, Revised Selected Papers
    EditorsGilbert Peterson, Sujeet Shenoi
    PublisherSpringer New York LLC
    Pages185-205
    Number of pages21
    ISBN (Electronic)9783662449516
    StatePublished - 2014
    Event10th IFIP WG 11.9 International Conference on Digital Forensics - Vienna, Austria
    Duration: Jan 8 2014Jan 10 2014

    Publication series

    NameIFIP Advances in Information and Communication Technology
    Volume433
    ISSN (Print)1868-4238

    Conference

    Conference10th IFIP WG 11.9 International Conference on Digital Forensics
    Country/TerritoryAustria
    CityVienna
    Period1/8/141/10/14

    Keywords

    • Authorship attribution
    • Authorship verification
    • Forensic stylometry

    ASJC Scopus subject areas

    • Information Systems
    • Computer Networks and Communications
    • Information Systems and Management

    Fingerprint

    Dive into the research topics of 'Breaking the closed-world assumption in stylometric authorship attribution'. Together they form a unique fingerprint.

    Cite this