This paper describes the approach we took to create a machine learning model for the PAN 2020 Authorship Verification Task. For each document pair, we extracted stylometric features from the documents and used the absolute difference between the feature vectors as input to our classifier. We created two models: a Logistic Regression Model trained on a small dataset, and a Neural Network based model trained on the large dataset. These models achieved AUCs of 0.939 and 0.953 on the small and large datasets, making them the second-best models on both datasets submitted to the shared task.
|Original language||English (US)|
|Journal||CEUR Workshop Proceedings|
|State||Published - 2020|
|Event||11th Conference and Labs of the Evaluation Forum, CLEF 2020 - Thessaloniki, Greece|
Duration: Sep 22 2020 → Sep 25 2020
ASJC Scopus subject areas
- Computer Science(all)