Abstract
Event extraction is a particularly challenging type of information extraction (IE) that may require inferences from the whole article. However, most current event extraction systems rely on local information at the phrase or sentence level, and do not consider the article as a whole, thus limiting extraction performance. Moreover, most annotated corpora are artificially enriched to include enough positive samples of the events of interest; event identification on a more balanced collection, such as unfiltered newswire, may perform much worse. In this paper, we investigate the use of unsupervised topic models to extract topic features to improve event extraction both on test data similar to training data, and on more balanced collections. We compare this unsupervised approach to a supervised multi-label text classifier, and show that unsupervised topic modeling can get better results for both collections, and especially for a more balanced collection. We show that the unsupervised topic model can improve trigger, argument and role labeling by 3.5%, 6.9% and 6% respectively on a pre-selected corpus, and by 16.8%, 12.5% and 12.7% on a balanced corpus.
Original language | English (US) |
---|---|
Title of host publication | International Conference Recent Advances in Natural Language Processing, RANLP |
Pages | 9-16 |
Number of pages | 8 |
State | Published - 2011 |
Event | 8th International Conference on Recent Advances in Natural Language Processing, RANLP 2011 - Hissar, Bulgaria Duration: Sep 12 2011 → Sep 14 2011 |
Other
Other | 8th International Conference on Recent Advances in Natural Language Processing, RANLP 2011 |
---|---|
Country/Territory | Bulgaria |
City | Hissar |
Period | 9/12/11 → 9/14/11 |
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications
- Software
- Electrical and Electronic Engineering