TY - GEN
T1 - Tharwa
T2 - 9th International Conference on Language Resources and Evaluation, LREC 2014
AU - Diab, Mona
AU - Al-Badrashiny, Mohamed
AU - Aminian, Maryam
AU - Attia, Mohammed
AU - Dasigi, Pradeep
AU - Elfardy, Heba
AU - Eskander, Ramy
AU - Habash, Nizar
AU - Hawwari, Abdelati
AU - Salloum, Wael
N1 - Funding Information:
We would like to thank Owen Rambow for helpful discussions and feedback. We would like to acknowledge the numerous annotators who helped building Tharwa. This work was supported by the Defense Advanced Research Projects Agency (DARPA) Contract No. HR0011-12-C-0014. Any opinions, findings and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of DARPA.
PY - 2014
Y1 - 2014
N2 - We introduce an electronic three-way lexicon, Tharwa, comprising Dialectal Arabic, Modern Standard Arabic and English correspondents. The paper focuses on Egyptian Arabic as the first pilot dialect for the resource, with plans to expand to other dialects of Arabic in later phases of the project. We describe Tharwa's creation process and report on its current status. The lexical entries are augmented with various elements of linguistic information such as POS, gender, rationality, number, and root and pattern information. The lexicon is based on a compilation of information from both monolingual and bilingual existing resources such as paper dictionaries and electronic, corpus-based dictionaries. Multiple levels of quality checks are performed on the output of each step in the creation process. The importance of this lexicon lies in the fact that it is the first resource of its kind bridging multiple variants of Arabic with English. Furthermore, it is a wide coverage lexical resource containing over 73, 000 Egyptian entries. Tharwa is publicly available. We believe it will have a significant impact on both Theoretical Linguistics as well as Computational Linguistics research.
AB - We introduce an electronic three-way lexicon, Tharwa, comprising Dialectal Arabic, Modern Standard Arabic and English correspondents. The paper focuses on Egyptian Arabic as the first pilot dialect for the resource, with plans to expand to other dialects of Arabic in later phases of the project. We describe Tharwa's creation process and report on its current status. The lexical entries are augmented with various elements of linguistic information such as POS, gender, rationality, number, and root and pattern information. The lexicon is based on a compilation of information from both monolingual and bilingual existing resources such as paper dictionaries and electronic, corpus-based dictionaries. Multiple levels of quality checks are performed on the output of each step in the creation process. The importance of this lexicon lies in the fact that it is the first resource of its kind bridging multiple variants of Arabic with English. Furthermore, it is a wide coverage lexical resource containing over 73, 000 Egyptian entries. Tharwa is publicly available. We believe it will have a significant impact on both Theoretical Linguistics as well as Computational Linguistics research.
KW - Arabic dialects
KW - Arabic lexicon
KW - Arabic morphology
KW - Egyptian Arabic dictionary
UR - http://www.scopus.com/inward/record.url?scp=85026887071&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85026887071&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85026887071
T3 - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
SP - 3782
EP - 3789
BT - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
A2 - Calzolari, Nicoletta
A2 - Choukri, Khalid
A2 - Goggi, Sara
A2 - Declerck, Thierry
A2 - Mariani, Joseph
A2 - Maegaard, Bente
A2 - Moreno, Asuncion
A2 - Odijk, Jan
A2 - Mazo, Helene
A2 - Piperidis, Stelios
A2 - Loftsson, Hrafn
PB - European Language Resources Association (ELRA)
Y2 - 26 May 2014 through 31 May 2014
ER -