Symbolic-to-statistical hybridization: Extending generation-heavy machine translation

Nizar Habash, Bonnie Dorr, Christof Monz

Research output: Contribution to journalArticlepeer-review

Abstract

The last few years have witnessed an increasing interest in hybridizing surface-based statistical approaches and rule-based symbolic approaches to machine translation (MT). Much of that work is focused on extending statistical MT systems with symbolic knowledge and components. In the brand of hybridization discussed here, we go in the opposite direction: adding statistical bilingual components to a symbolic system. Our base system is Generation-heavy machine translation (GHMT), a primarily symbolic asymmetrical approach that addresses the issue of Interlingual MT resource poverty in source-poor/target-rich language pairs by exploiting symbolic and statistical target-language resources. GHMT's statistical components are limited to target-language models, which arguably makes it a simple form of a hybrid system. We extend the hybrid nature of GHMT by adding statistical bilingual components. We also describe the details of retargeting it to Arabic-English MT. The morphological richness of Arabic brings several challenges to the hybridization task. We conduct an extensive evaluation of multiple system variants. Our evaluation shows that this new variant of GHMT-a primarily symbolic system extended with monolingual and bilingual statistical components-has a higher degree of grammaticality than a phrase-based statistical MT system, where grammaticality is measured in terms of correct verb-argument realization and long-distance dependency translation.

Original languageEnglish (US)
Pages (from-to)23-63
Number of pages41
JournalMachine Translation
Volume23
Issue number1
DOIs
StatePublished - Feb 2009

Keywords

  • Arabic-English machine translation
  • Generation-heavy machine translation
  • Hybrid machine translation
  • Statistical machine translation

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Linguistics and Language
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Symbolic-to-statistical hybridization: Extending generation-heavy machine translation'. Together they form a unique fingerprint.

Cite this