Improving Arabic-to-English Statistical Machine Translation by reordering post-verbal subjects for alignment

Marine Carpuat, Yuval Marton, Nizar Habash

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation (SMT).We show that post-verbal subject (VS) constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition, implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. We therefore propose to reorder VS constructions into SV order for SMT word alignment only. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline and despite noisy parses.

Original languageEnglish (US)
Title of host publicationACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Pages178-183
Number of pages6
StatePublished - 2010
Event48th Annual Meeting of the Association for Computational Linguistics, ACL 2010 - Uppsala, Sweden
Duration: Jul 11 2010Jul 16 2010

Publication series

NameACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Other

Other48th Annual Meeting of the Association for Computational Linguistics, ACL 2010
CountrySweden
CityUppsala
Period7/11/107/16/10

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Improving Arabic-to-English Statistical Machine Translation by reordering post-verbal subjects for alignment'. Together they form a unique fingerprint.

  • Cite this

    Carpuat, M., Marton, Y., & Habash, N. (2010). Improving Arabic-to-English Statistical Machine Translation by reordering post-verbal subjects for alignment. In ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (pp. 178-183). (ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference).