DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment

Bonnie J. Dorr, Lisa Pearl, Rebecca Hwa, Nizar Habash

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The frequent occurrence of divergences—structural differences between languages—presents a great challenge for statistical wordlevel alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.

Original languageEnglish (US)
Title of host publicationMachine Translation
Subtitle of host publicationFrom Research to Real Users - 5th Conference of the Association for Machine Translation in the Americas, AMTA 2002, Proceedings
EditorsStephen D. Richardson
PublisherSpringer Verlag
Pages31-43
Number of pages13
ISBN (Print)3540442820, 9783540442820
DOIs
StatePublished - 2002
Event5th Conference of the Association for Machine Translation in the Americas, AMTA 2002 - Tiburon, United States
Duration: Oct 8 2002Oct 12 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2499
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other5th Conference of the Association for Machine Translation in the Americas, AMTA 2002
Country/TerritoryUnited States
CityTiburon
Period10/8/0210/12/02

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment'. Together they form a unique fingerprint.

Cite this