TY - GEN
T1 - Improving Arabic-to-English Statistical Machine Translation by reordering post-verbal subjects for alignment
AU - Carpuat, Marine
AU - Marton, Yuval
AU - Habash, Nizar
PY - 2010
Y1 - 2010
N2 - We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation (SMT).We show that post-verbal subject (VS) constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition, implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. We therefore propose to reorder VS constructions into SV order for SMT word alignment only. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline and despite noisy parses.
AB - We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation (SMT).We show that post-verbal subject (VS) constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition, implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. We therefore propose to reorder VS constructions into SV order for SMT word alignment only. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline and despite noisy parses.
UR - http://www.scopus.com/inward/record.url?scp=84859945878&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859945878&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84859945878
SN - 9781617388088
T3 - ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
SP - 178
EP - 183
BT - ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
T2 - 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010
Y2 - 11 July 2010 through 16 July 2010
ER -