TY - JOUR
T1 - Blockwise Self-Supervised Learning at Scale
AU - Siddiqui, Shoaib Ahmed
AU - Krueger, David
AU - Lecun, Yann
AU - Deny, Stéphane
N1 - Publisher Copyright:
© 2024, Transactions on Machine Learning Research. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Current state-of-the-art deep networks are all powered by backpropagation. However, long backpropagation paths as found in end-to-end training are biologically implausible, as well as inefficient in terms of energy consumption. In this paper, we explore alternatives to full backpropagation in the form of blockwise learning rules, leveraging the latest developments in self-supervised learning. We show that a blockwise pretraining procedure consisting of training independently the 4 main blocks of layers of a ResNet-50 with Barlow Twins’ loss function at each block performs almost as well as end-to-end backpropagation on ImageNet: a linear probe trained on top of our blockwise pretrained model obtains a top-1 classification accuracy of 70.48%, only 1.1% below the accuracy of an end-to-end pretrained network (71.57% accuracy). We perform extensive experiments to understand the impact of different components within our method and explore a variety of adaptations of self-supervised learning to the blockwise paradigm, building an exhaustive understanding of the critical avenues for scaling local learning rules to large networks, with implications ranging from hardware design to neuroscience. Code to reproduce our experiments is available at: https://github.com/shoaibahmed/blockwise_ssl.
AB - Current state-of-the-art deep networks are all powered by backpropagation. However, long backpropagation paths as found in end-to-end training are biologically implausible, as well as inefficient in terms of energy consumption. In this paper, we explore alternatives to full backpropagation in the form of blockwise learning rules, leveraging the latest developments in self-supervised learning. We show that a blockwise pretraining procedure consisting of training independently the 4 main blocks of layers of a ResNet-50 with Barlow Twins’ loss function at each block performs almost as well as end-to-end backpropagation on ImageNet: a linear probe trained on top of our blockwise pretrained model obtains a top-1 classification accuracy of 70.48%, only 1.1% below the accuracy of an end-to-end pretrained network (71.57% accuracy). We perform extensive experiments to understand the impact of different components within our method and explore a variety of adaptations of self-supervised learning to the blockwise paradigm, building an exhaustive understanding of the critical avenues for scaling local learning rules to large networks, with implications ranging from hardware design to neuroscience. Code to reproduce our experiments is available at: https://github.com/shoaibahmed/blockwise_ssl.
UR - http://www.scopus.com/inward/record.url?scp=85216821178&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85216821178&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85216821178
SN - 2835-8856
VL - 2024
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -