Detecting hoaxes, frauds, and deception in writing style online

Sadia Afroz, Michael Brennan, Rachel Greenstadt

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    In digital forensics, questions often arise about the authors of documents: their identity, demographic background, and whether they can be linked to other documents. The field of stylometry uses linguistic features and machine learning techniques to answer these questions. While stylometry techniques can identify authors with high accuracy in non-adversarial scenarios, their accuracy is reduced to random guessing when faced with authors who intentionally obfuscate their writing style or attempt to imitate that of another author. While these results are good for privacy, they raise concerns about fraud. We argue that some linguistic features change when people hide their writing style and by identifying those features, stylistic deception can be recognized. The major contribution of this work is a method for detecting stylistic deception in written documents. We show that using a large feature set, it is possible to distinguish regular documents from deceptive documents with 96.6% accuracy (F-measure). We also present an analysis of linguistic features that can be modified to hide writing style.

    Original languageEnglish (US)
    Title of host publicationProceedings - 2012 IEEE Symposium on Security and Privacy, S and P 2012
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages461-475
    Number of pages15
    ISBN (Print)9780769546810
    DOIs
    StatePublished - 2012
    Event33rd IEEE Symposium on Security and Privacy, S and P 2012 - San Francisco, CA, United States
    Duration: May 21 2012May 23 2012

    Publication series

    NameProceedings - IEEE Symposium on Security and Privacy
    ISSN (Print)1081-6011

    Other

    Other33rd IEEE Symposium on Security and Privacy, S and P 2012
    CountryUnited States
    CitySan Francisco, CA
    Period5/21/125/23/12

    Keywords

    • deception
    • machine learning
    • privacy
    • stylometry

    ASJC Scopus subject areas

    • Safety, Risk, Reliability and Quality
    • Software
    • Computer Networks and Communications

    Fingerprint Dive into the research topics of 'Detecting hoaxes, frauds, and deception in writing style online'. Together they form a unique fingerprint.

    Cite this