Abstract
This paper describes the approach we took to create a machine learning model for the PAN 2021 Style Change Detection Task. We approached this task by transforming it to an authorship verification task and applying a slightly modified version of our previous authorship verification approach. We extracted stylometric features from each paragraph in each document and used the absolute differences between the feature vectors corresponding to pairs of paragraphs as input to a Logistic Regression classifier, together with the labels indicating if the two paragraphs were written by the same author or not. We then used this model to make predictions for the three style change detection tasks. The model achieved F1 scores of 0.634 on Task 1, 0.657 on Task 2, and 0.432 on Task 3 on the final evaluations.
Original language | English (US) |
---|---|
Pages (from-to) | 2137-2145 |
Number of pages | 9 |
Journal | CEUR Workshop Proceedings |
Volume | 2936 |
State | Published - 2021 |
Event | 2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021 - Virtual, Bucharest, Romania Duration: Sep 21 2021 → Sep 24 2021 |
Keywords
- Machine learning
- Natural language processing
- Style change detection
- Stylometry
ASJC Scopus subject areas
- General Computer Science