TY - GEN
T1 - Supersizing self-supervision
T2 - 2016 IEEE International Conference on Robotics and Automation, ICRA 2016
AU - Pinto, Lerrel
AU - Gupta, Abhinav
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/6/8
Y1 - 2016/6/8
N2 - Current model free learning-based robot grasping approaches exploit human-labeled datasets for training the models. However, there are two problems with such a methodology: (a) since each object can be grasped in multiple ways, manually labeling grasp locations is not a trivial task; (b) human labeling is biased by semantics. While there have been attempts to train robots using trial-and-error experiments, the amount of data used in such experiments remains substantially low and hence makes the learner prone to over-fitting. In this paper, we take the leap of increasing the available training data to 40 times more than prior work, leading to a dataset size of 50K data points collected over 700 hours of robot grasping attempts. This allows us to train a Convolutional Neural Network (CNN) for the task of predicting grasp locations without severe overfitting. In our formulation, we recast the regression problem to an 18-way binary classification over image patches. We also present a multi-stage learning approach where a CNN trained in one stage is used to collect hard negatives in subsequent stages. Our experiments clearly show the benefit of using large-scale datasets (and multi-stage training) for the task of grasping. We also compare to several baselines and show state-of-the-art performance on generalization to unseen objects for grasping.
AB - Current model free learning-based robot grasping approaches exploit human-labeled datasets for training the models. However, there are two problems with such a methodology: (a) since each object can be grasped in multiple ways, manually labeling grasp locations is not a trivial task; (b) human labeling is biased by semantics. While there have been attempts to train robots using trial-and-error experiments, the amount of data used in such experiments remains substantially low and hence makes the learner prone to over-fitting. In this paper, we take the leap of increasing the available training data to 40 times more than prior work, leading to a dataset size of 50K data points collected over 700 hours of robot grasping attempts. This allows us to train a Convolutional Neural Network (CNN) for the task of predicting grasp locations without severe overfitting. In our formulation, we recast the regression problem to an 18-way binary classification over image patches. We also present a multi-stage learning approach where a CNN trained in one stage is used to collect hard negatives in subsequent stages. Our experiments clearly show the benefit of using large-scale datasets (and multi-stage training) for the task of grasping. We also compare to several baselines and show state-of-the-art performance on generalization to unseen objects for grasping.
UR - http://www.scopus.com/inward/record.url?scp=84977599666&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84977599666&partnerID=8YFLogxK
U2 - 10.1109/ICRA.2016.7487517
DO - 10.1109/ICRA.2016.7487517
M3 - Conference contribution
AN - SCOPUS:84977599666
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 3406
EP - 3413
BT - 2016 IEEE International Conference on Robotics and Automation, ICRA 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 16 May 2016 through 21 May 2016
ER -