Handling translation divergences: Combining statistical and symbolic techniques in generation-heavy machine translation

Nizar Habash, Bonnie Dorr

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes a novel approach to handling translation divergences in a Generation-Heavy Hybrid Machine Translation (GHMT) system. The translation divergence problem is usually reserved for Transfer and Interlingual MT because it requires a large combination of complex lexical and structural mappings. A major requirement of these approaches is the accessibility of large amounts of explicit symmetric knowledge for both source and target languages. This limitation renders Transfer and Interlingual approaches ineffective in the face of structurally-divergent language pairs with asymmetric resources. GHMT addresses the more common form of this problem, source-poor/targetrich, by fully exploiting symbolic and statistical target-language resources. This non-interlingual non-transfer approach is accomplished by using target-language lexical semantics, categorial variations and subcategorization frames to overgenerate multiple lexico-structural variations from a target-glossed syntactic dependency of the source-language sentence. The symbolic overgeneration, which accounts for different possible translation divergences, is constrained by a statistical target-language model.

Original languageEnglish (US)
Title of host publicationMachine Translation
Subtitle of host publicationFrom Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings
EditorsStephen D. Richardson
PublisherSpringer Verlag
Pages84-93
Number of pages10
ISBN (Print)3540442820, 9783540442820
DOIs
StatePublished - 2002
Event5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 - Tiburon, United States
Duration: Oct 8 2002Oct 12 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2499
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other5th Conference of the Association for Machine Translation in the Americas, AMTA 2002
CountryUnited States
CityTiburon
Period10/8/0210/12/02

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Handling translation divergences: Combining statistical and symbolic techniques in generation-heavy machine translation'. Together they form a unique fingerprint.

Cite this