TY - JOUR
T1 - Writing style change detection on multi-author documents
AU - Singh, Rhia
AU - Weerasinghe, Janith
AU - Greenstadt, Rachel
N1 - Funding Information:
We thank PAN2021 organizers for organizing the shared task and helping us through the submission process. Our work was supported by the National Science Foundation under grant 1931005 and the McNulty Foundation.
Publisher Copyright:
© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2021
Y1 - 2021
N2 - This paper describes the approach we took to create a machine learning model for the PAN 2021 Style Change Detection Task. We approached this task by transforming it to an authorship verification task and applying a slightly modified version of our previous authorship verification approach. We extracted stylometric features from each paragraph in each document and used the absolute differences between the feature vectors corresponding to pairs of paragraphs as input to a Logistic Regression classifier, together with the labels indicating if the two paragraphs were written by the same author or not. We then used this model to make predictions for the three style change detection tasks. The model achieved F1 scores of 0.634 on Task 1, 0.657 on Task 2, and 0.432 on Task 3 on the final evaluations.
AB - This paper describes the approach we took to create a machine learning model for the PAN 2021 Style Change Detection Task. We approached this task by transforming it to an authorship verification task and applying a slightly modified version of our previous authorship verification approach. We extracted stylometric features from each paragraph in each document and used the absolute differences between the feature vectors corresponding to pairs of paragraphs as input to a Logistic Regression classifier, together with the labels indicating if the two paragraphs were written by the same author or not. We then used this model to make predictions for the three style change detection tasks. The model achieved F1 scores of 0.634 on Task 1, 0.657 on Task 2, and 0.432 on Task 3 on the final evaluations.
KW - Machine learning
KW - Natural language processing
KW - Style change detection
KW - Stylometry
UR - http://www.scopus.com/inward/record.url?scp=85113499782&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113499782&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85113499782
SN - 1613-0073
VL - 2936
SP - 2137
EP - 2145
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021
Y2 - 21 September 2021 through 24 September 2021
ER -