Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage preestablished conventions to great effect. In this article, we focus on ad hoc settings with no previous coordination between partners. We introduce a 'Bayesian Meta-Agent' that maintains a belief distribution over hypotheses of partner policies. The policies that serve as initial hypotheses are generated using MAP-Elites, to ensure behavioral diversity. We evaluate an 'Adaptive' version of the agent, which selects a response policy based on the updated belief distribution and a 'Generalist' version, which selects a response based on the uniform prior. In short episodes of ten games with a consistent partner, the 'Adaptive' version outperforms the 'Generalist' when the training and evaluation populations are the same. This presents a first step toward an agent that can model its partner and adapt within a time frame that is compatible with human interaction.
- Computational and artificial intelligence -Evolutionary computation
- Learning (artificial intelligence) -Naive Bayes methods
ASJC Scopus subject areas
- Artificial Intelligence
- Electrical and Electronic Engineering
- Control and Systems Engineering