TY - GEN
T1 - Translate once, translate twice, translate thrice and attribute
T2 - 6th IEEE International Conference on Semantic Computing, ICSC 2012
AU - Caliskan, Aylin
AU - Greenstadt, Rachel
N1 - Funding Information:
The authors are thankful to Professor A. Ahmed, Vice-Chancellor, Jamia Hamdard for providing necessary facilities and encouragement. The authors are also thankful to the Council of Scientific and Industrial Research, New Delhi for providing the research grant.
PY - 2012
Y1 - 2012
N2 - In this paper, we investigate the effects of machine translation tools on translated texts and the accuracy of authorship and translator attribution of translated texts. We show that the more translation performed on a text by a specific machine translation tool, the more effects unique to that translator are observed. We also propose a novel method to perform machine translator and authorship attribution of translated texts using a feature set that led to 91.13% and 91.54% accuracy on average, respectively. We claim that the features leading to highest accuracy in translator attribution are translator-dependent features and that even though translator-effect-heavy features are present in translated text, we can still succeed in authorship attribution. These findings demonstrate that stylometric features of the original text are preserved at some level despite multiple consequent translations and the introduction of translator-dependent features. The main contribution of our work is the discovery of a feature set used to accurately perform both translator and authorship attribution on a corpus of diverse topics from the twenty-first century, which has been consequently translated multiple times using machine translation tools.
AB - In this paper, we investigate the effects of machine translation tools on translated texts and the accuracy of authorship and translator attribution of translated texts. We show that the more translation performed on a text by a specific machine translation tool, the more effects unique to that translator are observed. We also propose a novel method to perform machine translator and authorship attribution of translated texts using a feature set that led to 91.13% and 91.54% accuracy on average, respectively. We claim that the features leading to highest accuracy in translator attribution are translator-dependent features and that even though translator-effect-heavy features are present in translated text, we can still succeed in authorship attribution. These findings demonstrate that stylometric features of the original text are preserved at some level despite multiple consequent translations and the introduction of translator-dependent features. The main contribution of our work is the discovery of a feature set used to accurately perform both translator and authorship attribution on a corpus of diverse topics from the twenty-first century, which has been consequently translated multiple times using machine translation tools.
KW - anonymity
KW - authorship attribution
KW - machine learning
KW - machine translation
KW - privacy
UR - http://www.scopus.com/inward/record.url?scp=84870664825&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84870664825&partnerID=8YFLogxK
U2 - 10.1109/ICSC.2012.46
DO - 10.1109/ICSC.2012.46
M3 - Conference contribution
AN - SCOPUS:84870664825
SN - 9780769548593
T3 - Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012
SP - 121
EP - 125
BT - Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012
Y2 - 19 September 2012 through 21 September 2012
ER -