TY - GEN
T1 - DIMAT
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
AU - Saadati, Nastaran
AU - Pham, Minh
AU - Saleem, Nasla
AU - Waite, Joshua R.
AU - Balu, Aditya
AU - Jiang, Zhanong
AU - Hegde, Chinmay
AU - Sarkar, Soumik
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Recent advances in decentralized deep learning algorithms have demonstrated cutting-edge performance on various tasks with large pretrained models. However, a pivotal prerequisite for achieving this level of competitiveness is the significant communication and computation overheads when updating these models, which prohibits the applications of them to real-world scenarios. To address this issue, drawing inspiration from advanced model merging techniques without requiring additional training, we introduce the Decentralized Iterative Merging-And-Training (DIMAT) paradigm-a novel decentralized deep learning framework. Within DIMAT, each agent is trained on their local data and periodically merged with their neighboring agents using advanced model merging techniques like activation matching until convergence is achieved. DIMAT provably converges with the best available rate for non-convex functions with various first-order methods, while yielding tighter error bounds compared to the popular existing approaches. We conduct a comprehensive empirical analysis to validate DIMAT's superiority over baselines across diverse computer vision tasks sourced from multiple datasets. Empirical results validate our theoretical claims by showing that DIMAT attains faster and higher initial gain in accuracy with independent and identically distributed (IID) and non-IID data, incurring lower communication overhead. This DIMAT paradigm presents a new op-portunity for the future decentralized learning, enhancing its adaptability to real-world with sparse and lightweight communication and computation.
AB - Recent advances in decentralized deep learning algorithms have demonstrated cutting-edge performance on various tasks with large pretrained models. However, a pivotal prerequisite for achieving this level of competitiveness is the significant communication and computation overheads when updating these models, which prohibits the applications of them to real-world scenarios. To address this issue, drawing inspiration from advanced model merging techniques without requiring additional training, we introduce the Decentralized Iterative Merging-And-Training (DIMAT) paradigm-a novel decentralized deep learning framework. Within DIMAT, each agent is trained on their local data and periodically merged with their neighboring agents using advanced model merging techniques like activation matching until convergence is achieved. DIMAT provably converges with the best available rate for non-convex functions with various first-order methods, while yielding tighter error bounds compared to the popular existing approaches. We conduct a comprehensive empirical analysis to validate DIMAT's superiority over baselines across diverse computer vision tasks sourced from multiple datasets. Empirical results validate our theoretical claims by showing that DIMAT attains faster and higher initial gain in accuracy with independent and identically distributed (IID) and non-IID data, incurring lower communication overhead. This DIMAT paradigm presents a new op-portunity for the future decentralized learning, enhancing its adaptability to real-world with sparse and lightweight communication and computation.
KW - Convergence
KW - Decentralized Learning
KW - Deep Learning
KW - Model Merging
KW - Non-Convex Optimization
UR - http://www.scopus.com/inward/record.url?scp=85207278189&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85207278189&partnerID=8YFLogxK
U2 - 10.1109/CVPR52733.2024.02598
DO - 10.1109/CVPR52733.2024.02598
M3 - Conference contribution
AN - SCOPUS:85207278189
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 27507
EP - 27517
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PB - IEEE Computer Society
Y2 - 16 June 2024 through 22 June 2024
ER -