TY - GEN
T1 - Filtering Antonymous, Trend-Contrasting, and Polarity-Dissimilar Distributional Paraphrases for Improving Statistical Machine Translation
AU - Marton, Yuval
AU - El Kholy, Ahmed
AU - Habash, Nizar
N1 - Publisher Copyright:
© 2011 Association for Computational Linguistics
PY - 2011
Y1 - 2011
N2 - Paraphrases are useful for statistical machine translation (SMT) and natural language processing tasks. Distributional paraphrase generation is independent of parallel texts and syntactic parses, and hence is suitable also for resource-poor languages, but tends to erroneously rank antonyms, trend-contrasting, and polarity-dissimilar candidates as good paraphrases. We present here a novel method for improving distributional paraphrasing by filtering out such candidates. We evaluate it in simulated low and mid-resourced SMT tasks, translating from English to two quite different languages. We show statistically significant gains in English-to-Chinese translation quality, up to 1 BLEU from non-filtered paraphrase-augmented models (1.6 BLEU from baseline). We also show that yielding gains in translation to Arabic, a morphologically rich language, is not straightforward.
AB - Paraphrases are useful for statistical machine translation (SMT) and natural language processing tasks. Distributional paraphrase generation is independent of parallel texts and syntactic parses, and hence is suitable also for resource-poor languages, but tends to erroneously rank antonyms, trend-contrasting, and polarity-dissimilar candidates as good paraphrases. We present here a novel method for improving distributional paraphrasing by filtering out such candidates. We evaluate it in simulated low and mid-resourced SMT tasks, translating from English to two quite different languages. We show statistically significant gains in English-to-Chinese translation quality, up to 1 BLEU from non-filtered paraphrase-augmented models (1.6 BLEU from baseline). We also show that yielding gains in translation to Arabic, a morphologically rich language, is not straightforward.
UR - http://www.scopus.com/inward/record.url?scp=84881119447&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84881119447&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84881119447
T3 - WMT 2011 - 6thWorkshop on Statistical Machine Translation, Proceedings of the Workshop
SP - 237
EP - 249
BT - WMT 2011 - 6thWorkshop on Statistical Machine Translation, Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
T2 - 6thWorkshop on Statistical Machine Translation, WMT 2011
Y2 - 30 July 2011 through 31 July 2011
ER -