Improved Arabic-to-English statistical machine translation by reordering post-verbal subjects for word alignment

Marine Carpuat, Yuval Marton, Nizar Habash

Research output: Contribution to journalArticlepeer-review

Abstract

We study challenges raised by the order of Arabic verbs and their subjects in statistical machine translation (SMT). We show that the boundaries of post-verbal subjects (VS) are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. In addition, VS constructions have highly ambiguous reordering patterns when translated to English, and these patterns are very different for matrix (main clause) VS and non-matrix (subordinate clause) VS. Based on this analysis, we propose a novel method for leveraging VS information in SMT: we reorder VS constructions into pre-verbal (SV) order for word alignment. Unlike previous approaches to source-side reordering, phrase extraction and decoding are performed using the original Arabic word order. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline. Limiting reordering to matrix VS yields further improvements.

Original languageEnglish (US)
Pages (from-to)105-120
Number of pages16
JournalMachine Translation
Volume26
Issue number1-2
DOIs
StatePublished - Mar 2012

Keywords

  • Dependency parsing
  • Matrix subject
  • Post-verbal subjects
  • Reordering
  • Statistical machine translation
  • Subject detection
  • VS
  • Word alignment

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Linguistics and Language
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Improved Arabic-to-English statistical machine translation by reordering post-verbal subjects for word alignment'. Together they form a unique fingerprint.

Cite this