TY - GEN
T1 - Identification of naturally occurring numerical expressions in Arabic
AU - Habash, Nizar
AU - Roth, Ryan
N1 - Funding Information:
This work was funded under the DARPA GALE program, contract HR0011-06-C-0023.
PY - 2008
Y1 - 2008
N2 - In this paper, we define the task of Number Identification in natural context. We present and validate a language-independent semiautomatic approach to quickly building a gold standard for evaluating number identification systems by exploiting hand-aligned parallel data. We also present and extensively evaluate a robust rule-based system for number identification in natural context for Arabic for a variety of number formats and types. The system is shown to have strong performance, achieving, on a blind test, a 94.8% F-score for the task of correctly identifying number expression spans in natural text, and a 92.1% F-score for the task of correctly determining the core numerical value.
AB - In this paper, we define the task of Number Identification in natural context. We present and validate a language-independent semiautomatic approach to quickly building a gold standard for evaluating number identification systems by exploiting hand-aligned parallel data. We also present and extensively evaluate a robust rule-based system for number identification in natural context for Arabic for a variety of number formats and types. The system is shown to have strong performance, achieving, on a blind test, a 94.8% F-score for the task of correctly identifying number expression spans in natural text, and a 92.1% F-score for the task of correctly determining the core numerical value.
UR - http://www.scopus.com/inward/record.url?scp=84255201790&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84255201790&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84255201790
T3 - Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
SP - 3330
EP - 3336
BT - Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
PB - European Language Resources Association (ELRA)
T2 - 6th International Conference on Language Resources and Evaluation, LREC 2008
Y2 - 28 May 2008 through 30 May 2008
ER -