TY - GEN
T1 - Improving Faithfulness by Augmenting Negative Summaries from Fake Documents
AU - Wang, Tianshu
AU - Ladhak, Faisal
AU - Durmus, Esin
AU - He, He
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - Current abstractive summarization systems tend to hallucinate content that is unfaithful to the source document, posing a risk of misinformation. To mitigate hallucination, we must teach the model to distinguish hallucinated summaries from faithful ones. However, the commonly used maximum likelihood training does not disentangle factual errors from other model errors. To address this issue, we propose a back-translation-style approach to augment negative samples that mimic factual errors made by the model. Specifically, we train an elaboration model that generates hallucinated documents given the reference summaries, and then generates negative summaries from the fake documents. We incorporate the negative samples into training through a controlled generator, which produces faithful/unfaithful summaries conditioned on the control codes. Additionally, we find that adding textual entailment data through multitasking further boosts the performance. Experiments on three datasets (XSum, GigaWord, and WikiHow) show that our method consistently improves faithfulness without sacrificing informativeness according to both human and automatic evaluation.
AB - Current abstractive summarization systems tend to hallucinate content that is unfaithful to the source document, posing a risk of misinformation. To mitigate hallucination, we must teach the model to distinguish hallucinated summaries from faithful ones. However, the commonly used maximum likelihood training does not disentangle factual errors from other model errors. To address this issue, we propose a back-translation-style approach to augment negative samples that mimic factual errors made by the model. Specifically, we train an elaboration model that generates hallucinated documents given the reference summaries, and then generates negative summaries from the fake documents. We incorporate the negative samples into training through a controlled generator, which produces faithful/unfaithful summaries conditioned on the control codes. Additionally, we find that adding textual entailment data through multitasking further boosts the performance. Experiments on three datasets (XSum, GigaWord, and WikiHow) show that our method consistently improves faithfulness without sacrificing informativeness according to both human and automatic evaluation.
UR - http://www.scopus.com/inward/record.url?scp=85149442691&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149442691&partnerID=8YFLogxK
U2 - 10.18653/v1/2022.emnlp-main.816
DO - 10.18653/v1/2022.emnlp-main.816
M3 - Conference contribution
AN - SCOPUS:85149442691
T3 - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
SP - 11913
EP - 11921
BT - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
A2 - Goldberg, Yoav
A2 - Kozareva, Zornitsa
A2 - Zhang, Yue
PB - Association for Computational Linguistics (ACL)
T2 - 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
Y2 - 7 December 2022 through 11 December 2022
ER -