Filtering Antonymous, Trend-Contrasting, and Polarity-Dissimilar Distributional Paraphrases for Improving Statistical Machine Translation

Yuval Marton, Ahmed El Kholy, Nizar Habash

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Paraphrases are useful for statistical machine translation (SMT) and natural language processing tasks. Distributional paraphrase generation is independent of parallel texts and syntactic parses, and hence is suitable also for resource-poor languages, but tends to erroneously rank antonyms, trend-contrasting, and polarity-dissimilar candidates as good paraphrases. We present here a novel method for improving distributional paraphrasing by filtering out such candidates. We evaluate it in simulated low and mid-resourced SMT tasks, translating from English to two quite different languages. We show statistically significant gains in English-to-Chinese translation quality, up to 1 BLEU from non-filtered paraphrase-augmented models (1.6 BLEU from baseline). We also show that yielding gains in translation to Arabic, a morphologically rich language, is not straightforward.

Original languageEnglish (US)
Title of host publicationWMT 2011 - 6thWorkshop on Statistical Machine Translation, Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages237-249
Number of pages13
ISBN (Electronic)9781937284121
StatePublished - 2011
Event6thWorkshop on Statistical Machine Translation, WMT 2011 - Edinburgh, United Kingdom
Duration: Jul 30 2011Jul 31 2011

Publication series

NameWMT 2011 - 6thWorkshop on Statistical Machine Translation, Proceedings of the Workshop

Conference

Conference6thWorkshop on Statistical Machine Translation, WMT 2011
Country/TerritoryUnited Kingdom
CityEdinburgh
Period7/30/117/31/11

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Language and Linguistics
  • Software

Fingerprint

Dive into the research topics of 'Filtering Antonymous, Trend-Contrasting, and Polarity-Dissimilar Distributional Paraphrases for Improving Statistical Machine Translation'. Together they form a unique fingerprint.

Cite this