Writing style change detection on multi-author documents

Rhia Singh, Janith Weerasinghe, Rachel Greenstadt

    Research output: Contribution to journalConference articlepeer-review

    Abstract

    This paper describes the approach we took to create a machine learning model for the PAN 2021 Style Change Detection Task. We approached this task by transforming it to an authorship verification task and applying a slightly modified version of our previous authorship verification approach. We extracted stylometric features from each paragraph in each document and used the absolute differences between the feature vectors corresponding to pairs of paragraphs as input to a Logistic Regression classifier, together with the labels indicating if the two paragraphs were written by the same author or not. We then used this model to make predictions for the three style change detection tasks. The model achieved F1 scores of 0.634 on Task 1, 0.657 on Task 2, and 0.432 on Task 3 on the final evaluations.

    Original languageEnglish (US)
    Pages (from-to)2137-2145
    Number of pages9
    JournalCEUR Workshop Proceedings
    Volume2936
    StatePublished - 2021
    Event2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021 - Virtual, Bucharest, Romania
    Duration: Sep 21 2021Sep 24 2021

    Keywords

    • Machine learning
    • Natural language processing
    • Style change detection
    • Stylometry

    ASJC Scopus subject areas

    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Writing style change detection on multi-author documents'. Together they form a unique fingerprint.

    Cite this