TY - GEN
T1 - Doppelgänger finder
T2 - 35th IEEE Symposium on Security and Privacy, SP 2014
AU - Afroz, Sadia
AU - Caliskan-Islam, Aylin
AU - Stolerman, Ariel
AU - Greenstadt, Rachel
AU - McCoy, Damon
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/11/13
Y1 - 2014/11/13
N2 - Stylometry is a method for identifying anonymous authors of anonymous texts by analyzing their writing style. While stylometric methods have produced impressive results in previous experiments, we wanted to explore their performance on a challenging dataset of particular interest to the security research community. Analysis of underground forums can provide key information about who controls a given bot network or sells a service, and the size and scope of the cybercrime underworld. Previous analyses have been accomplished primarily through analysis of limited structured metadata and painstaking manual analysis. However, the key challenge is to automate this process, since this labor intensive manual approach clearly does not scale. We consider two scenarios. The first involves text written by an unknown cybercriminal and a set of potential suspects. This is standard, supervised stylometry problem made more difficult by multilingual forums that mix l33t-speak conversations with data dumps. In the second scenario, you want to feed a forum into an analysis engine and have it output possible doppelgangers, or users with multiple accounts. While other researchers have explored this problem, we propose a method that produces good results on actual separate accounts, as opposed to data sets created by artificially splitting authors into multiple identities. For scenario 1, we achieve 77% to 84% accuracy on private messages. For scenario 2, we achieve 94% recall with 90% precision on blogs and 85.18% precision with 82.14% recall for underground forum users. We demonstrate the utility of our approach with a case study that includes applying our technique to the Carders forum and manual analysis to validate the results, enabling the discovery of previously undetected doppelganger accounts.
AB - Stylometry is a method for identifying anonymous authors of anonymous texts by analyzing their writing style. While stylometric methods have produced impressive results in previous experiments, we wanted to explore their performance on a challenging dataset of particular interest to the security research community. Analysis of underground forums can provide key information about who controls a given bot network or sells a service, and the size and scope of the cybercrime underworld. Previous analyses have been accomplished primarily through analysis of limited structured metadata and painstaking manual analysis. However, the key challenge is to automate this process, since this labor intensive manual approach clearly does not scale. We consider two scenarios. The first involves text written by an unknown cybercriminal and a set of potential suspects. This is standard, supervised stylometry problem made more difficult by multilingual forums that mix l33t-speak conversations with data dumps. In the second scenario, you want to feed a forum into an analysis engine and have it output possible doppelgangers, or users with multiple accounts. While other researchers have explored this problem, we propose a method that produces good results on actual separate accounts, as opposed to data sets created by artificially splitting authors into multiple identities. For scenario 1, we achieve 77% to 84% accuracy on private messages. For scenario 2, we achieve 94% recall with 90% precision on blogs and 85.18% precision with 82.14% recall for underground forum users. We demonstrate the utility of our approach with a case study that includes applying our technique to the Carders forum and manual analysis to validate the results, enabling the discovery of previously undetected doppelganger accounts.
KW - Stylometry
KW - cybercrime
KW - underground forum
UR - http://www.scopus.com/inward/record.url?scp=84914140103&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84914140103&partnerID=8YFLogxK
U2 - 10.1109/SP.2014.21
DO - 10.1109/SP.2014.21
M3 - Conference contribution
AN - SCOPUS:84914140103
T3 - Proceedings - IEEE Symposium on Security and Privacy
SP - 212
EP - 226
BT - Proceedings - IEEE Symposium on Security and Privacy
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 May 2014 through 21 May 2014
ER -