TY - GEN
T1 - Is this NE tagger getting old?
AU - Mota, Cristina
AU - Grishman, Ralph
N1 - Funding Information:
The first author’s research work was funded by Fundac¸ão para a Ciência e a Tecnologia through a doctoral scholarship (ref.: SFRH/BD/3237/2000). We are also grateful to Adam Kilgarriff for his prompt support when we first attempt to implement his method.
PY - 2008
Y1 - 2008
N2 - This paper focuses on the influence of changing the text time frame on the performance of a named entity tagger. We followed a twofold approach to investigate this subject: on the one hand, we analyzed a corpus that spans 8 years, and, on the other hand, we assessed the performance of a name tagger trained and tested on that corpus. We created 8 samples from the corpus, each drawn from the articles for a particular year. In terms of corpus analysis, we calculated the corpus similarity and names shared between samples. To see the effect on tagger performance, we implemented a semi-supervised name tagger based on co-training; then, we trained and tested our tagger on those samples. We observed that corpus similarity, names shared between samples, and tagger performance all decay as the time gap between the samples increases. Furthermore, we observed that the corpus similarity and names shared correlate with the tagger F-measure. These results show that named entity recognition systems may become obsolete in a short period of time.
AB - This paper focuses on the influence of changing the text time frame on the performance of a named entity tagger. We followed a twofold approach to investigate this subject: on the one hand, we analyzed a corpus that spans 8 years, and, on the other hand, we assessed the performance of a name tagger trained and tested on that corpus. We created 8 samples from the corpus, each drawn from the articles for a particular year. In terms of corpus analysis, we calculated the corpus similarity and names shared between samples. To see the effect on tagger performance, we implemented a semi-supervised name tagger based on co-training; then, we trained and tested our tagger on those samples. We observed that corpus similarity, names shared between samples, and tagger performance all decay as the time gap between the samples increases. Furthermore, we observed that the corpus similarity and names shared correlate with the tagger F-measure. These results show that named entity recognition systems may become obsolete in a short period of time.
UR - http://www.scopus.com/inward/record.url?scp=85008581443&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85008581443&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85008581443
T3 - Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
SP - 1196
EP - 1202
BT - Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
PB - European Language Resources Association (ELRA)
T2 - 6th International Conference on Language Resources and Evaluation, LREC 2008
Y2 - 28 May 2008 through 30 May 2008
ER -