Abstract
Large-scale machine learning training suffers from two prior challenges, specifically for nuclear-norm constrained problems with distributed systems: the synchronization slowdown due to the straggling workers, and high communication costs. In this work, we propose an asynchronous Stochastic Frank Wolfe (SFW-asyn) method, which, for the first time, solves the two problems simultaneously, while successfully maintaining the same convergence rate as the vanilla SFW. We implement our algorithm in python (with MPI) to run on Amazon EC2, and demonstrate that SFW-asyn yields speed-ups almost linear to the number of machines compared to the vanilla SFW.
Original language | English (US) |
---|---|
Pages (from-to) | 1464-1474 |
Number of pages | 11 |
Journal | Proceedings of Machine Learning Research |
Volume | 108 |
State | Published - 2020 |
Event | 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020 - Virtual, Online Duration: Aug 26 2020 → Aug 28 2020 |
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability