A gold standard dependency corpus for English

Natalia Silveira, Timothy Dozat, Marie Catherine De Marneffe, Samuel R. Bowman, Miriam Connor, John Bauer, Christopher D. Manning

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We present a gold standard annotation of syntactic dependencies in the English Web Treebank corpus using the Stanford Dependencies standard. This resource addresses the lack of a gold standard dependency treebank for English, as well as the limited availability of gold standard syntactic annotations for informal genres of English text. We also present experiments on the use of this resource, both for training dependency parsers and for evaluating dependency parsers like the one included as part of the Stanford Parser. We show that training a dependency parser on a mix of newswire and web data improves performance on that type of data without greatly hurting performance on newswire text, and therefore gold standard annotations for non-canonical text can be valuable for parsing in general. Furthermore, the systematic annotation effort has informed both the SD formalism and its implementation in the Stanford Parser's dependency converter. In response to the challenges encountered by annotators in the EWT corpus, we revised and extended the Stanford Dependencies standard, and improved the Stanford Parser's dependency converter.

    Original languageEnglish (US)
    Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
    EditorsNicoletta Calzolari, Khalid Choukri, Sara Goggi, Thierry Declerck, Joseph Mariani, Bente Maegaard, Asuncion Moreno, Jan Odijk, Helene Mazo, Stelios Piperidis, Hrafn Loftsson
    PublisherEuropean Language Resources Association (ELRA)
    Pages2897-2904
    Number of pages8
    ISBN (Electronic)9782951740884
    StatePublished - 2014
    Event9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland
    Duration: May 26 2014May 31 2014

    Publication series

    NameProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

    Other

    Other9th International Conference on Language Resources and Evaluation, LREC 2014
    CountryIceland
    CityReykjavik
    Period5/26/145/31/14

    Keywords

    • Dependency grammar
    • Stanford dependencies
    • Web treebank

    ASJC Scopus subject areas

    • Linguistics and Language
    • Library and Information Sciences
    • Education
    • Language and Linguistics

    Fingerprint Dive into the research topics of 'A gold standard dependency corpus for English'. Together they form a unique fingerprint.

    Cite this