TY - GEN
T1 - Breaking the closed-world assumption in stylometric authorship attribution
AU - Stolerman, Ariel
AU - Overdorf, Rebekah
AU - Afroz, Sadia
AU - Greenstadt, Rachel
N1 - Publisher Copyright:
© IFIP International Federation for Information Processing 2014.
PY - 2014
Y1 - 2014
N2 - Stylometry is a form of authorship attribution that relies on the linguistic information found in a document. While there has been significant work in stylometry, most research focuses on the closed-world problem where the author of the document is in a known suspect set. For open-world problems where the author may not be in the suspect set, traditional classification methods are ineffective. This paper proposes the “classify-verify” method that augments classification with a binary verification step evaluated on stylometric datasets. This method, which can be generalized to any domain, significantly outperforms traditional classifiers in open-world settings and yields an F1-score of 0.87, comparable to traditional classifiers in closed-world settings. Moreover, the method successfully detects adversarial documents where authors deliberately change their styles, a problem for which closed-world classifiers fail.
AB - Stylometry is a form of authorship attribution that relies on the linguistic information found in a document. While there has been significant work in stylometry, most research focuses on the closed-world problem where the author of the document is in a known suspect set. For open-world problems where the author may not be in the suspect set, traditional classification methods are ineffective. This paper proposes the “classify-verify” method that augments classification with a binary verification step evaluated on stylometric datasets. This method, which can be generalized to any domain, significantly outperforms traditional classifiers in open-world settings and yields an F1-score of 0.87, comparable to traditional classifiers in closed-world settings. Moreover, the method successfully detects adversarial documents where authors deliberately change their styles, a problem for which closed-world classifiers fail.
KW - Authorship attribution
KW - Authorship verification
KW - Forensic stylometry
UR - http://www.scopus.com/inward/record.url?scp=84911046003&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84911046003&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84911046003
T3 - IFIP Advances in Information and Communication Technology
SP - 185
EP - 205
BT - Advances in Digital Forensics X - 10th IFIP WG 11.9 International Conference, Revised Selected Papers
A2 - Peterson, Gilbert
A2 - Shenoi, Sujeet
PB - Springer New York LLC
T2 - 10th IFIP WG 11.9 International Conference on Digital Forensics
Y2 - 8 January 2014 through 10 January 2014
ER -