TY - GEN
T1 - Behavioral evaluation of hanabi rainbow DQN agents and rule-based agents
AU - Canaan, Rodrigo
AU - Gao, Xianbo
AU - Chung, Youjin
AU - Togelius, Julian
AU - Nealen, Andy
AU - Menzel, Stefan
N1 - Funding Information:
Rodrigo Canaan, Andy Nealen and Julian Togelius gratefully acknowledge the financial support from Honda Research Institute Europe (HRI-EU).
Publisher Copyright:
Copyright © 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2020
Y1 - 2020
N2 - Hanabi is a multiplayer cooperative card game, where only your partners know your cards. All players succeed or fail together. This makes the game an excellent testbed for studying collaboration. Recently, it has been shown that deep neural networks can be trained through self-play to play the game very well. However, such agents generally do not play well with others. In this paper, we investigate the consequences of training Rainbow DQN agents with human-inspired rule-based agents. We analyze with which agents Rainbow agents learn to play well, and how well playing skill transfers to agents they were not trained with. We also analyze patterns of communication between agents to elucidate how collaboration happens. A key finding is that while most agents only learn to play well with partners seen during training, one particular agent leads the Rainbow algorithm towards a much more general policy. The metrics and hypotheses advanced in this paper can be used for further study of collaborative agents.
AB - Hanabi is a multiplayer cooperative card game, where only your partners know your cards. All players succeed or fail together. This makes the game an excellent testbed for studying collaboration. Recently, it has been shown that deep neural networks can be trained through self-play to play the game very well. However, such agents generally do not play well with others. In this paper, we investigate the consequences of training Rainbow DQN agents with human-inspired rule-based agents. We analyze with which agents Rainbow agents learn to play well, and how well playing skill transfers to agents they were not trained with. We also analyze patterns of communication between agents to elucidate how collaboration happens. A key finding is that while most agents only learn to play well with partners seen during training, one particular agent leads the Rainbow algorithm towards a much more general policy. The metrics and hypotheses advanced in this paper can be used for further study of collaborative agents.
UR - http://www.scopus.com/inward/record.url?scp=85102268278&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102268278&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85102268278
T3 - Proceedings of the 16th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2020
SP - 31
EP - 37
BT - Proceedings of the 16th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2020
A2 - Lelis, Levi
A2 - Thue, David
PB - The AAAI Press
T2 - 16th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2020
Y2 - 19 October 2020 through 23 October 2020
ER -