Lost and found in translation: The impact of machine translated results on translingual information retrieval

Kristen Parton, Nizar Habash, Kathleen McKeown

Research output: Contribution to conferencePaperpeer-review

Abstract

In an ideal cross-lingual information retrieval (CLIR) system, a user query would generate a search over documents in a different language and the relevant results would be presented in the user's language. In practice, CLIR systems are typically evaluated by judging result relevance in the document language, to factor out the effects of translating the results using machine translation (MT). In this paper, we investigate the influence of four different approaches for integrating MT and CLIR on both retrieval accuracy and user judgment of relevancy. We create a corpus with relevance judgments for both human and machine translated results, and use it to quantify the effect that MT quality has on end-to-end relevance. We find that MT errors result in a 16-39% decrease in mean average precision over the ground truth system that uses human translations. MT errors also caused relevant sentences to appear irrelevant - 5-19% of sentences were relevant in human translation, but were judged irrelevant in MT. To counter this degradation, we present two hybrid retrieval models and two automatic MT post-editing techniques and show that these approaches substantially mitigate the errors and improve the end-to-end relevance.

Original languageEnglish (US)
StatePublished - 2012
Event10th Conference of the Association for Machine Translation in the Americas, AMTA 2012 - San Diego, United States
Duration: Oct 28 2012Nov 1 2012

Other

Other10th Conference of the Association for Machine Translation in the Americas, AMTA 2012
Country/TerritoryUnited States
CitySan Diego
Period10/28/1211/1/12

ASJC Scopus subject areas

  • Language and Linguistics
  • Software
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Lost and found in translation: The impact of machine translated results on translingual information retrieval'. Together they form a unique fingerprint.

Cite this