Extracting structured scholarly information from the machine translation literature

Eunsol Choi, Matic Horvat, Jonathan May, Kevin Knight, Daniel Marcu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Understanding the experimental results of a scientific paper is crucial to understanding its contribution and to comparing it with related work. We introduce a structured, queryable representation for experimental results and a baseline system that automatically populates this representation. The representation can answer compositional questions such as: "Which are the best published results reported on the NIST 09 Chinese to English dataset?" and "What are the most important methods for speeding up phrase-based decoding?" Answering such questions usually involves lengthy literature surveys. Current machine reading for academic papers does not usually consider the actual experiments, but mostly focuses on understanding abstracts. We describe annotation work to create an initial (scientific paper; experimental results representation) corpus. The corpus is composed of 67 papers which were manually annotated with a structured representation of experimental results by domain experts. Additionally, we present a baseline algorithm that characterizes the difficulty of the inference task.

Original languageEnglish (US)
Title of host publicationProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
EditorsNicoletta Calzolari, Khalid Choukri, Helene Mazo, Asuncion Moreno, Thierry Declerck, Sara Goggi, Marko Grobelnik, Jan Odijk, Stelios Piperidis, Bente Maegaard, Joseph Mariani
PublisherEuropean Language Resources Association (ELRA)
Pages421-425
Number of pages5
ISBN (Electronic)9782951740891
StatePublished - 2016
Event10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
Duration: May 23 2016May 28 2016

Publication series

NameProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

Other

Other10th International Conference on Language Resources and Evaluation, LREC 2016
Country/TerritorySlovenia
CityPortoroz
Period5/23/165/28/16

Keywords

  • Information extraction
  • Scientific literature
  • Structured prediction

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Language and Linguistics
  • Education

Fingerprint

Dive into the research topics of 'Extracting structured scholarly information from the machine translation literature'. Together they form a unique fingerprint.

Cite this