We present a large-scale 26,000-lemma leveled readability lexicon for Modern Standard Arabic. The lexicon was manually annotated in triplicate by language professionals from three regions in the Arab world. The annotations show a high degree of agreement; and major differences were limited to regional variations. Comparing lemma readability levels with their frequencies provided good insights in the benefits and pitfalls of frequency-based readability approaches. The lexicon will be publicly available.
|Original language||English (US)|
|Title of host publication||LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings|
|Publisher||European Language Resources Association (ELRA)|
|Number of pages||10|
|State||Published - 2020|
|Name||LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings|