TY - GEN
T1 - Can document selection help semi-supervised learning? A case study on event extraction
AU - Liao, Shasha
AU - Grishman, Ralph
PY - 2011
Y1 - 2011
N2 - Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. Also, based on the particular characteristics of this corpus, global inference is applied to provide more confident and informative data selection. We compare this approach to self-training on a normal newswire corpus and show that IR can provide a better corpus for bootstrapping and that global inference can further improve instance selection. We obtain gains of 1.7% in trigger labeling and 2.3% in role labeling through IR and an additional 1.1% in trigger labeling and 1.3% in role labeling by applying global inference.
AB - Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. Also, based on the particular characteristics of this corpus, global inference is applied to provide more confident and informative data selection. We compare this approach to self-training on a normal newswire corpus and show that IR can provide a better corpus for bootstrapping and that global inference can further improve instance selection. We obtain gains of 1.7% in trigger labeling and 2.3% in role labeling through IR and an additional 1.1% in trigger labeling and 1.3% in role labeling by applying global inference.
UR - http://www.scopus.com/inward/record.url?scp=84859062203&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859062203&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84859062203
SN - 9781932432886
T3 - ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
SP - 260
EP - 265
BT - ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics
T2 - 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011
Y2 - 19 June 2011 through 24 June 2011
ER -