TY - GEN
T1 - CAMELMORPH MSA
T2 - Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
AU - Khairallah, Christian
AU - Khalifa, Salam
AU - Marzouk, Reham
AU - Nassar, Mayar
AU - Habash, Nizar
N1 - Publisher Copyright:
© 2024 ELRA Language Resource Association: CC BY-NC 4.0.
PY - 2024
Y1 - 2024
N2 - We present CAMELMORPH MSA, the largest open-source Modern Standard Arabic morphological analyzer and generator. CAMELMORPH MSA has over 100K lemmas, and includes rarely modeled morphological features of Modern Standard Arabic with Classical Arabic origins. CAMELMORPH MSA can produce ∼1.45B analyses and ∼535M unique diacritizations, almost an order of magnitude larger than SAMA (Maamouri et al., 2010c), in addition to having ∼36% less OOV rate than SAMA on a 10B word corpus. Furthermore, CAMELMORPH MSA fills the gaps of many lemma paradigms by modeling linguistic phenomena consistently. CAMELMORPH MSA seamlessly integrates with the Camel Tools Python toolkit (Obeid et al., 2020), ensuring ease of use and accessibility.
AB - We present CAMELMORPH MSA, the largest open-source Modern Standard Arabic morphological analyzer and generator. CAMELMORPH MSA has over 100K lemmas, and includes rarely modeled morphological features of Modern Standard Arabic with Classical Arabic origins. CAMELMORPH MSA can produce ∼1.45B analyses and ∼535M unique diacritizations, almost an order of magnitude larger than SAMA (Maamouri et al., 2010c), in addition to having ∼36% less OOV rate than SAMA on a 10B word corpus. Furthermore, CAMELMORPH MSA fills the gaps of many lemma paradigms by modeling linguistic phenomena consistently. CAMELMORPH MSA seamlessly integrates with the Camel Tools Python toolkit (Obeid et al., 2020), ensuring ease of use and accessibility.
KW - Arabic
KW - Morphology
KW - Open-Source
UR - http://www.scopus.com/inward/record.url?scp=85195933677&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85195933677&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85195933677
T3 - 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
SP - 2683
EP - 2691
BT - 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
A2 - Calzolari, Nicoletta
A2 - Kan, Min-Yen
A2 - Hoste, Veronique
A2 - Lenci, Alessandro
A2 - Sakti, Sakriani
A2 - Xue, Nianwen
PB - European Language Resources Association (ELRA)
Y2 - 20 May 2024 through 25 May 2024
ER -