TY - JOUR
T1 - Propagating Knowledge Updates to LMs Through Distillation
AU - Padmanabhan, Shankar
AU - Onoe, Yasumasa
AU - Zhang, Michael J.Q.
AU - Durrett, Greg
AU - Choi, Eunsol
N1 - Publisher Copyright:
© 2023 Neural information processing systems foundation. All rights reserved.
PY - 2023
Y1 - 2023
N2 - Modern language models have the capacity to store and use immense amounts of knowledge about real-world entities, but it remains unclear how to update such knowledge stored in model parameters. While prior methods for updating knowledge in LMs successfully inject atomic facts, updated LMs fail to make inferences based on injected facts. In this work, we demonstrate that a context distillation-based approach can both impart knowledge about entities and propagate that knowledge to enable broader inferences. Our approach consists of two stages: transfer set generation and distillation on the transfer set. We first generate a transfer set by prompting a language model to generate continuations from the entity definition. Then, we update the model parameters so that the distribution of the LM (the'student') matches the distribution of the LM conditioned on the definition (the'teacher') on the transfer set. Our experiments demonstrate that this approach is more effective at propagating knowledge updates than finetuning and other gradient-based knowledge-editing methods. Moreover, it does not compromise performance in other contexts, even when injecting the definitions of up to 150 entities at once.
AB - Modern language models have the capacity to store and use immense amounts of knowledge about real-world entities, but it remains unclear how to update such knowledge stored in model parameters. While prior methods for updating knowledge in LMs successfully inject atomic facts, updated LMs fail to make inferences based on injected facts. In this work, we demonstrate that a context distillation-based approach can both impart knowledge about entities and propagate that knowledge to enable broader inferences. Our approach consists of two stages: transfer set generation and distillation on the transfer set. We first generate a transfer set by prompting a language model to generate continuations from the entity definition. Then, we update the model parameters so that the distribution of the LM (the'student') matches the distribution of the LM conditioned on the definition (the'teacher') on the transfer set. Our experiments demonstrate that this approach is more effective at propagating knowledge updates than finetuning and other gradient-based knowledge-editing methods. Moreover, it does not compromise performance in other contexts, even when injecting the definitions of up to 150 entities at once.
UR - http://www.scopus.com/inward/record.url?scp=85196575417&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85196575417&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85196575417
SN - 1049-5258
VL - 36
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
Y2 - 10 December 2023 through 16 December 2023
ER -