The impact of task and corpus on event extraction systems

Ralph Grishman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The term "event extraction" covers a wide range of information extraction tasks, and methods developed and evaluated for one task may prove quite unsuitable for another. Understanding these task differences is essential to making broad progress in event extraction. We look back at the MUC and ACE tasks in terms of one characteristic, the breadth of the scenario - how wide a range of information is subsumed in a single extraction task. We examine how this affects strategies for collecting information and methods for semi-supervised training of new extractors. We also consider the heterogeneity of corpora - how varied the topics of documents in a corpus are. Extraction systems may be intended in principle for general news but are typically evaluated on topic-focused corpora, and this evaluation context may affect system design. As one case study, we examine the task of identifying physical attack events in news corpora, observing the effect on system performance of shifting from an attack-event-rich corpus to a more varied corpus and considering how the impact of this shift may be mitigated.

Original languageEnglish (US)
Title of host publicationProceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
EditorsDaniel Tapias, Irene Russo, Olivier Hamon, Stelios Piperidis, Nicoletta Calzolari, Khalid Choukri, Joseph Mariani, Helene Mazo, Bente Maegaard, Jan Odijk, Mike Rosner
PublisherEuropean Language Resources Association (ELRA)
Pages2928-2931
Number of pages4
ISBN (Electronic)2951740867, 9782951740860
StatePublished - 2010
Event7th International Conference on Language Resources and Evaluation, LREC 2010 - Valletta, Malta
Duration: May 17 2010May 23 2010

Publication series

NameProceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010

Other

Other7th International Conference on Language Resources and Evaluation, LREC 2010
Country/TerritoryMalta
CityValletta
Period5/17/105/23/10

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'The impact of task and corpus on event extraction systems'. Together they form a unique fingerprint.

Cite this