TY - GEN
T1 - Towards Robust Pruning
T2 - 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023
AU - Li, Jianwei
AU - Lei, Qi
AU - Cheng, Wei
AU - Xu, Dongkuan
N1 - Publisher Copyright:
©2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models. Despite this, existing methods struggle to enhance robustness against adversarial attacks when continually increasing model sparsity and require a retraining process. As humans step into the era of large language models, these issues become increasingly prominent. This paper proposes that the robustness of language models is proportional to the extent of pre-trained knowledge they encompass. Accordingly, we introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models, aiming to conserve more pretrained knowledge during the pruning process. In this setup, each layer's reconstruction error not only originates from itself but also includes cumulative error from preceding layers, followed by an adaptive rectification. Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews, marking a significant stride towards robust pruning in language models.
AB - The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models. Despite this, existing methods struggle to enhance robustness against adversarial attacks when continually increasing model sparsity and require a retraining process. As humans step into the era of large language models, these issues become increasingly prominent. This paper proposes that the robustness of language models is proportional to the extent of pre-trained knowledge they encompass. Accordingly, we introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models, aiming to conserve more pretrained knowledge during the pruning process. In this setup, each layer's reconstruction error not only originates from itself but also includes cumulative error from preceding layers, followed by an adaptive rectification. Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews, marking a significant stride towards robust pruning in language models.
UR - http://www.scopus.com/inward/record.url?scp=85184810717&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85184810717&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85184810717
T3 - EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 1229
EP - 1247
BT - EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
A2 - Bouamor, Houda
A2 - Pino, Juan
A2 - Bali, Kalika
PB - Association for Computational Linguistics (ACL)
Y2 - 6 December 2023 through 10 December 2023
ER -