TY - GEN
T1 - A multidialectal parallel corpus of Arabic
AU - Bouamor, Houda
AU - Habash, Nizar
AU - Oflazer, Kemal
N1 - Funding Information:
The first and third authors were supported by grant NPRP-09-1140-1-177 from the Qatar National Research Fund (QNRF), a member of the Qatar Foundation. The second author was supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-12-C-0014. Any opinions, findings and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of QNRF or DARPA.
PY - 2014
Y1 - 2014
N2 - The daily spoken variety of Arabic is often termed the colloquial or dialect form of Arabic. There are many Arabic dialects across the Arab World and within other Arabic speaking communities. These dialects vary widely from region to region and to a lesser extent from city to city in each region. The dialects are not standardized, they are not taught, and they do not have official status. However they are the primary vehicles of communication (face-to-face and recently, online) and have a large presence in the arts as well. In this paper, we present the first multidialectal Arabic parallel corpus, a collection of 2, 000 sentences in Standard Arabic, Egyptian, Tunisian, Jordanian, Palestinian and Syrian Arabic, in addition to English. Such parallel data does not exist naturally, which makes this corpus a very valuable resource that has many potential applications such as Arabic dialect identification and machine translation.
AB - The daily spoken variety of Arabic is often termed the colloquial or dialect form of Arabic. There are many Arabic dialects across the Arab World and within other Arabic speaking communities. These dialects vary widely from region to region and to a lesser extent from city to city in each region. The dialects are not standardized, they are not taught, and they do not have official status. However they are the primary vehicles of communication (face-to-face and recently, online) and have a large presence in the arts as well. In this paper, we present the first multidialectal Arabic parallel corpus, a collection of 2, 000 sentences in Standard Arabic, Egyptian, Tunisian, Jordanian, Palestinian and Syrian Arabic, in addition to English. Such parallel data does not exist naturally, which makes this corpus a very valuable resource that has many potential applications such as Arabic dialect identification and machine translation.
KW - Arabic
KW - Dialects
KW - Parallel Corpus
UR - http://www.scopus.com/inward/record.url?scp=85026863473&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85026863473&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85026863473
T3 - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
SP - 1240
EP - 1245
BT - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
A2 - Calzolari, Nicoletta
A2 - Choukri, Khalid
A2 - Goggi, Sara
A2 - Declerck, Thierry
A2 - Mariani, Joseph
A2 - Maegaard, Bente
A2 - Moreno, Asuncion
A2 - Odijk, Jan
A2 - Mazo, Helene
A2 - Piperidis, Stelios
A2 - Loftsson, Hrafn
PB - European Language Resources Association (ELRA)
T2 - 9th International Conference on Language Resources and Evaluation, LREC 2014
Y2 - 26 May 2014 through 31 May 2014
ER -