TY - GEN
T1 - Analyzing the forgetting problem in pretrain-finetuning of open-domain dialogue response models
AU - He, Tianxing
AU - Ott, Myle
AU - Liu, Bing
AU - Liu, Jun
AU - Glass, James
AU - Cho, Kyunghyun
AU - Peng, Fuchun
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics
PY - 2021
Y1 - 2021
N2 - In this work, we study how the finetuning stage in the pretrain-finetune framework changes the behavior of a pretrained neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. Our major finding is that after standard finetuning, the model forgets some of the important language generation skills acquired during large-scale pretraining. We demonstrate the forgetting phenomenon through a set of detailed behavior analysis from the perspectives of knowledge transfer, context sensitivity, and function space projection. As a preliminary attempt to alleviate the forgetting problem, we propose an intuitive finetuning strategy named “mix-review”. We find that mix-review effectively regularizes the finetuning process, and the forgetting problem is alleviated to some extent. Finally, we discuss interesting behavior of the resulting dialogue model and its implications.
AB - In this work, we study how the finetuning stage in the pretrain-finetune framework changes the behavior of a pretrained neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. Our major finding is that after standard finetuning, the model forgets some of the important language generation skills acquired during large-scale pretraining. We demonstrate the forgetting phenomenon through a set of detailed behavior analysis from the perspectives of knowledge transfer, context sensitivity, and function space projection. As a preliminary attempt to alleviate the forgetting problem, we propose an intuitive finetuning strategy named “mix-review”. We find that mix-review effectively regularizes the finetuning process, and the forgetting problem is alleviated to some extent. Finally, we discuss interesting behavior of the resulting dialogue model and its implications.
UR - http://www.scopus.com/inward/record.url?scp=85106166678&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85106166678&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85106166678
T3 - EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
SP - 1121
EP - 1133
BT - EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021
Y2 - 19 April 2021 through 23 April 2021
ER -