Feature vector difference based authorship verification for open-world settings

Janith Weerasinghe, Rhia Singh, Rachel Greenstadt

    Research output: Contribution to journalConference articlepeer-review

    Abstract

    This paper describes the approach we took to create a machine learning model for the PAN 2021 Authorship Verification Task. The goal of this task is to predict if a given pair of documents are written by the same author. For each document pair, we extracted stylometric features from the documents and used the absolute difference between the feature vectors as input to our classifier. Our new model is similar to out last year's model with minor improvements to the feature set and the classifier. We trained two models on the two small and large datasets which achieved AUCs of 0.967 and 0.972 in the final evaluations.

    Original languageEnglish (US)
    Pages (from-to)2201-2207
    Number of pages7
    JournalCEUR Workshop Proceedings
    Volume2936
    StatePublished - 2021
    Event2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021 - Virtual, Bucharest, Romania
    Duration: Sep 21 2021Sep 24 2021

    Keywords

    • Authorship verification
    • Machine learning
    • Natural language processing
    • Stylometry

    ASJC Scopus subject areas

    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Feature vector difference based authorship verification for open-world settings'. Together they form a unique fingerprint.

    Cite this