TY - JOUR
T1 - SVFT
T2 - 38th Conference on Neural Information Processing Systems, NeurIPS 2024
AU - Lingam, Vijay
AU - Tejaswi, Atula
AU - Vavre, Aditya
AU - Shetty, Aneesh
AU - Gudur, Gautham Krishna
AU - Ghosh, Joydeep
AU - Dimakis, Alex
AU - Choi, Eunsol
AU - Bojchevski, Aleksandar
AU - Sanghavi, Sujay
N1 - Publisher Copyright:
© 2024 Neural information processing systems foundation. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights W and inject learnable matrices ∆W. These ∆W matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically exhibit a performance gap compared to full fine-tuning. While recent PEFT methods have narrowed this gap, they do so at the expense of additional learnable parameters. We propose SVFT2, a simple approach that structures ∆W based on the specific weight matrix W. SVFT updates W as a sparse combination M of outer products of its singular vectors, training only the coefficients of these combinations. Crucially, we make additional off-diagonal elements in M learnable, enabling a smooth trade-off between trainable parameters and expressivity-an aspect that distinctly sets our approach apart from previous works leveraging singular values. Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that achieve only up to 85% performance with 0.03 to 0.8% of the trainable parameter budget.
AB - Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights W and inject learnable matrices ∆W. These ∆W matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically exhibit a performance gap compared to full fine-tuning. While recent PEFT methods have narrowed this gap, they do so at the expense of additional learnable parameters. We propose SVFT2, a simple approach that structures ∆W based on the specific weight matrix W. SVFT updates W as a sparse combination M of outer products of its singular vectors, training only the coefficients of these combinations. Crucially, we make additional off-diagonal elements in M learnable, enabling a smooth trade-off between trainable parameters and expressivity-an aspect that distinctly sets our approach apart from previous works leveraging singular values. Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that achieve only up to 85% performance with 0.03 to 0.8% of the trainable parameter budget.
UR - http://www.scopus.com/inward/record.url?scp=105000472679&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105000472679&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:105000472679
SN - 1049-5258
VL - 37
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 9 December 2024 through 15 December 2024
ER -