Generating and Adapting to Diverse Ad-Hoc Partners in Hanabi

Rodrigo Canaan, Xianbo Gao, Julian Togelius, Andy Nealen, Stefan Menzel

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage pre-established conventions to great effect. In this paper, we focus on ad-hoc settings with no previous coordination between partners. We introduce a ‘`Bayesian Meta-Agent’' that maintains a belief distribution over hypotheses of partner policies. The policies that serve as initial hypotheses are generated using MAP-Elites, to ensure behavioral diversity. We evaluate an ‘`Adaptive’' version of the agent, which selects a response policy based on the updated belief distribution and a ‘`Generalist’' version, which selects a response based on the uniform prior. In short episodes of 10 games with a consistent partner, the ‘`Adaptive’' version outperforms the ‘`Generalist’' when the training and evaluation populations are the same. This presents a first step towards an agent that can model its partner and adapt within a time frame that is compatible with human interaction.

    Original languageEnglish (US)
    JournalIEEE Transactions on Games
    DOIs
    StateAccepted/In press - 2022

    Keywords

    • Adaptive systems
    • Artificial intelligence
    • Color
    • Games
    • Sociology
    • Statistics
    • Training

    ASJC Scopus subject areas

    • Software
    • Control and Systems Engineering
    • Artificial Intelligence
    • Electrical and Electronic Engineering

    Cite this