BAIL: Best-action imitation learning for batch deep reinforcement learning

Xinyue Chen, Zijian Zhou, Zheng Wang, Che Wang, Yanqiu Wu, Keith Ross

    Research output: Contribution to journalConference articlepeer-review


    There has recently been a surge in research in batch Deep Reinforcement Learning (DRL), which aims for learning a high-performing policy from a given dataset without additional interactions with the environment. We propose a new algorithm, Best-Action Imitation Learning (BAIL), which strives for both simplicity and performance. BAIL learns a V function, uses the V function to select actions it believes to be high-performing, and then uses those actions to train a policy network using imitation learning. For the MuJoCo benchmark, we provide a comprehensive experimental study of BAIL, comparing its performance to four other batch Q-learning and imitation-learning schemes for a large variety of batch datasets. Our experiments show that BAIL’s performance is much higher than the other schemes, and is also computationally much faster than the batch Q-learning schemes.

    Original languageEnglish (US)
    JournalAdvances in Neural Information Processing Systems
    StatePublished - 2020
    Event34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online
    Duration: Dec 6 2020Dec 12 2020

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Information Systems
    • Signal Processing


    Dive into the research topics of 'BAIL: Best-action imitation learning for batch deep reinforcement learning'. Together they form a unique fingerprint.

    Cite this