## Abstract

We study the statistical guarantees for the Imitation Learning (IL) problem in episodic MDPs. [22] show an information theoretic lower bound that in the worst case, a learner which can even actively query the expert policy suffers from a suboptimality growing quadratically in the length of the horizon, H. We study imitation learning under the μ-recoverability assumption of [27] which assumes that the difference in the Q-value under the expert policy across different actions in a state do not deviate beyond μ from the maximum. We show that the reduction proposed by [25] is statistically optimal: the resulting algorithm upon interacting with the MDP for N episodes results in a suboptimality bound of e O(μ|S|H/N) which we show is optimal up to log-factors. In contrast, we show that any algorithm which does not interact with the MDP and uses an offline dataset of N expert trajectories must incur suboptimality growing as & |S|H2/N even under the μ- recoverability assumption. This establishes a clear and provable separation of the minimax rates between the active setting and the no-interaction setting. We also study IL with linear function approximation. When the expert plays actions according to a linear classifier of known state-action features, we use the reduction to multi-class classification to show that with high probability, the suboptimality of behavior cloning is eO(dH2/N) given N rollouts from the optimal policy. This is optimal up to log-factors but can be improved to eO(dH/N) if we have a linear expert with parameter-sharing across time steps. In contrast, when the MDP transition structure is known to the learner such as in the case of simulators, we demonstrate fundamental differences compared to the tabular setting in terms of the performance of an optimal algorithm, MIMIC-MD (Rajaraman et al. [22]) when extended to the function approximation setting. Here, we introduce a new problem called confidence set linear classification, that can be used to construct sample-efficient IL algorithms.

Original language | English (US) |
---|---|

Title of host publication | Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021 |

Editors | Marc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan |

Publisher | Neural information processing systems foundation |

Pages | 1325-1336 |

Number of pages | 12 |

ISBN (Electronic) | 9781713845393 |

State | Published - 2021 |

Event | 35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online Duration: Dec 6 2021 → Dec 14 2021 |

### Publication series

Name | Advances in Neural Information Processing Systems |
---|---|

Volume | 2 |

ISSN (Print) | 1049-5258 |

### Conference

Conference | 35th Conference on Neural Information Processing Systems, NeurIPS 2021 |
---|---|

City | Virtual, Online |

Period | 12/6/21 → 12/14/21 |

## ASJC Scopus subject areas

- Computer Networks and Communications
- Information Systems
- Signal Processing