TY - JOUR
T1 - BAIL
T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
AU - Chen, Xinyue
AU - Zhou, Zijian
AU - Wang, Zheng
AU - Wang, Che
AU - Wu, Yanqiu
AU - Ross, Keith
N1 - Funding Information:
This research was partially supported by Nokia Bell Labs.
Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.
PY - 2020
Y1 - 2020
N2 - There has recently been a surge in research in batch Deep Reinforcement Learning (DRL), which aims for learning a high-performing policy from a given dataset without additional interactions with the environment. We propose a new algorithm, Best-Action Imitation Learning (BAIL), which strives for both simplicity and performance. BAIL learns a V function, uses the V function to select actions it believes to be high-performing, and then uses those actions to train a policy network using imitation learning. For the MuJoCo benchmark, we provide a comprehensive experimental study of BAIL, comparing its performance to four other batch Q-learning and imitation-learning schemes for a large variety of batch datasets. Our experiments show that BAIL’s performance is much higher than the other schemes, and is also computationally much faster than the batch Q-learning schemes.
AB - There has recently been a surge in research in batch Deep Reinforcement Learning (DRL), which aims for learning a high-performing policy from a given dataset without additional interactions with the environment. We propose a new algorithm, Best-Action Imitation Learning (BAIL), which strives for both simplicity and performance. BAIL learns a V function, uses the V function to select actions it believes to be high-performing, and then uses those actions to train a policy network using imitation learning. For the MuJoCo benchmark, we provide a comprehensive experimental study of BAIL, comparing its performance to four other batch Q-learning and imitation-learning schemes for a large variety of batch datasets. Our experiments show that BAIL’s performance is much higher than the other schemes, and is also computationally much faster than the batch Q-learning schemes.
UR - http://www.scopus.com/inward/record.url?scp=85108413817&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108413817&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85108413817
VL - 2020-December
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
SN - 1049-5258
Y2 - 6 December 2020 through 12 December 2020
ER -