TY - GEN
T1 - Amortized Noisy Channel Neural Machine Translation
AU - Pang, Richard Yuanzhe
AU - He, He
AU - Cho, Kyunghyun
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - Noisy channel models have been especially effective in neural machine translation (NMT). However, recent approaches like “beam search and rerank” (BSR) incur significant computation overhead during inference, making real-world application infeasible. We aim to study if it is possible to build an amortized noisy channel NMT model such that when we do greedy decoding during inference, the translation accuracy matches that of BSR in terms of reward (based on the source-to-target log probability and the target-to-source log probability) and quality (based on BLEU and BLEURT). We attempt three approaches to train the new model: knowledge distillation, 1-step-deviation imitation learning, and Q learning. The first approach obtains the noisy channel signal from a pseudo-corpus, and the latter two approaches aim to optimize toward a noisy-channel MT reward directly. For all three approaches, the generated translations fail to achieve rewards comparable to BSR, but the translation quality approximated by BLEU and BLEURT is similar to the quality of BSR-produced translations. Additionally, all three approaches speed up inference by 1–2 orders of magnitude.
AB - Noisy channel models have been especially effective in neural machine translation (NMT). However, recent approaches like “beam search and rerank” (BSR) incur significant computation overhead during inference, making real-world application infeasible. We aim to study if it is possible to build an amortized noisy channel NMT model such that when we do greedy decoding during inference, the translation accuracy matches that of BSR in terms of reward (based on the source-to-target log probability and the target-to-source log probability) and quality (based on BLEU and BLEURT). We attempt three approaches to train the new model: knowledge distillation, 1-step-deviation imitation learning, and Q learning. The first approach obtains the noisy channel signal from a pseudo-corpus, and the latter two approaches aim to optimize toward a noisy-channel MT reward directly. For all three approaches, the generated translations fail to achieve rewards comparable to BSR, but the translation quality approximated by BLEU and BLEURT is similar to the quality of BSR-produced translations. Additionally, all three approaches speed up inference by 1–2 orders of magnitude.
UR - http://www.scopus.com/inward/record.url?scp=85165649440&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85165649440&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85165649440
T3 - 15th International Natural Language Generation Conference, INLG 2022
SP - 131
EP - 143
BT - 15th International Natural Language Generation Conference, INLG 2022
A2 - Shaikh, Samira
A2 - Ferreira, Thiago Castro
A2 - Stent, Amanda
PB - Association for Computational Linguistics (ACL)
T2 - 15th International Natural Language Generation Conference, INLG 2022
Y2 - 18 July 2022 through 22 July 2022
ER -