Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage pre-established conventions to great effect. In this paper, we focus on ad-hoc settings with no previous coordination between partners. We introduce a ‘`Bayesian Meta-Agent’' that maintains a belief distribution over hypotheses of partner policies. The policies that serve as initial hypotheses are generated using MAP-Elites, to ensure behavioral diversity. We evaluate an ‘`Adaptive’' version of the agent, which selects a response policy based on the updated belief distribution and a ‘`Generalist’' version, which selects a response based on the uniform prior. In short episodes of 10 games with a consistent partner, the ‘`Adaptive’' version outperforms the ‘`Generalist’' when the training and evaluation populations are the same. This presents a first step towards an agent that can model its partner and adapt within a time frame that is compatible with human interaction.
- Adaptive systems
- Artificial intelligence
ASJC Scopus subject areas
- Control and Systems Engineering
- Artificial Intelligence
- Electrical and Electronic Engineering