TY - GEN
T1 - Compensating for annotation errors in training a relation extractor
AU - Min, Bonan
AU - Grishman, Ralph
N1 - Funding Information:
Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory (AFRL) contract number FA8650-10-C-7058. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL, or the U.S. Government.
Publisher Copyright:
© 2012 Association for Computational Linguistics.
PY - 2012
Y1 - 2012
N2 - The well-studied supervised Relation Extraction algorithms require training data that is accurate and has good coverage. To obtain such a gold standard, the common practice is to do independent double annotation followed by adjudication. This takes significantly more human effort than annotation done by a single annotator. We do a detailed analysis on a snapshot of the ACE 2005 annotation files to understand the differences between single-pass annotation and the more expensive nearly three-pass process, and then propose an algorithm that learns from the much cheaper single-pass annotation and achieves a performance on a par with the extractor trained on multi-pass annotated data. Furthermore, we show that given the same amount of human labor, the better way to do relation annotation is not to annotate with high-cost quality assurance, but to annotate more.
AB - The well-studied supervised Relation Extraction algorithms require training data that is accurate and has good coverage. To obtain such a gold standard, the common practice is to do independent double annotation followed by adjudication. This takes significantly more human effort than annotation done by a single annotator. We do a detailed analysis on a snapshot of the ACE 2005 annotation files to understand the differences between single-pass annotation and the more expensive nearly three-pass process, and then propose an algorithm that learns from the much cheaper single-pass annotation and achieves a performance on a par with the extractor trained on multi-pass annotated data. Furthermore, we show that given the same amount of human labor, the better way to do relation annotation is not to annotate with high-cost quality assurance, but to annotate more.
UR - http://www.scopus.com/inward/record.url?scp=85035354202&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85035354202&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85035354202
T3 - EACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings
SP - 194
EP - 203
BT - EACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings
PB - Association for Computational Linguistics (ACL)
T2 - 13th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2012
Y2 - 23 April 2012 through 27 April 2012
ER -