TY - GEN
T1 - Text readability for Arabic as a foreign language
AU - Saddiki, Hind
AU - Bouzoubaa, Karim
AU - Cavalli-Sforza, Violetta
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/7/7
Y1 - 2016/7/7
N2 - In this study, we evaluate the informativeness of lexical, morphological and semantic features in determining the readability of texts geared towards learners of Arabic as a foreign language. We have gathered low-complexity features with the purpose of establishing a baseline for future research in readability assessment, using freely available natural language processing (NLP) and machine learning (ML) tools on a publicly accessible corpus. We tested common classification algorithms, as well as random forests-an ensemble learning method-and report on their results using several evaluation measures for comparability with similar work. Our results suggest that a small set of easily computed features can be indicative of the reading level of a text. Moreover, our findings will serve as a common ground, for ourselves and others, to evaluate and compare the performance of more elaborate techniques and feature sets.
AB - In this study, we evaluate the informativeness of lexical, morphological and semantic features in determining the readability of texts geared towards learners of Arabic as a foreign language. We have gathered low-complexity features with the purpose of establishing a baseline for future research in readability assessment, using freely available natural language processing (NLP) and machine learning (ML) tools on a publicly accessible corpus. We tested common classification algorithms, as well as random forests-an ensemble learning method-and report on their results using several evaluation measures for comparability with similar work. Our results suggest that a small set of easily computed features can be indicative of the reading level of a text. Moreover, our findings will serve as a common ground, for ourselves and others, to evaluate and compare the performance of more elaborate techniques and feature sets.
KW - Arabic
KW - foreign language learning
KW - machine learning
KW - natural language processing
KW - text readability
UR - http://www.scopus.com/inward/record.url?scp=84980395642&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84980395642&partnerID=8YFLogxK
U2 - 10.1109/AICCSA.2015.7507232
DO - 10.1109/AICCSA.2015.7507232
M3 - Conference contribution
AN - SCOPUS:84980395642
T3 - Proceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSA
BT - 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications, AICCSA 2015
PB - IEEE Computer Society
T2 - 12th IEEE/ACS International Conference of Computer Systems and Applications, AICCSA 2015
Y2 - 17 November 2015 through 20 November 2015
ER -