From Language to Family and Back: Native Language and Language Family Identification from English Text

Ariel Stolerman, Aylin Caliskan Islam, Rachel Greenstadt

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Revealing an anonymous author’s traits from text is a well-researched area. In this paper we aim to identify the native language and language family of a non-native English author, given his/her English writings. We extract features from the text based on prior work, and extend or modify it to construct different feature sets, and use support vector machines for classification. We show that native language identification accuracy can be improved by up to 6.43% for a 9-class task, depending on the feature set, by introducing a novel method to incorporate language family information. In addition we show that introducing grammar-based features improves accuracy of both native language and language family identification.

    Original languageEnglish (US)
    Title of host publicationProceedings of the 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics
    Subtitle of host publicationHuman Language Technologies, NAACL-HLT 2013 - Student Research Workshop
    EditorsAnnie Louis, Richard Socher, Julia Hockenmaier, Eric K. Ringger
    PublisherAssociation for Computational Linguistics (ACL)
    Pages32-39
    Number of pages8
    ISBN (Electronic)9781937284473
    StatePublished - 2013
    Event2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 - Atlanta, United States
    Duration: Jun 9 2013Jun 14 2013

    Publication series

    NameProceedings of the 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 - Student Research Workshop

    Conference

    Conference2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013
    Country/TerritoryUnited States
    CityAtlanta
    Period6/9/136/14/13

    ASJC Scopus subject areas

    • Linguistics and Language
    • Language and Linguistics
    • Computer Science Applications

    Fingerprint

    Dive into the research topics of 'From Language to Family and Back: Native Language and Language Family Identification from English Text'. Together they form a unique fingerprint.

    Cite this