Abstract
Deep reinforcement learning agents such as AlphaZero have achieved superhuman strength in complex combinatorial games. By contrast, the cognitive science of planning has mostly focused on simple tasks for experimental and computational tractability. Using a board game that strikes a balance between complexity and tractability, we find that AlphaZero agents improve in value function quality and planning depth through learning, similar to human in previous modeling work. In addition, these metrics reflect causal contributions to AlphaZero's playing strength. Yet the strongest contributor is the policy quality. The decrease in policy entropy also drives the increase in planning depth. The contribution of planning depth to performance is lessened in late training. These results contribute to a joint understanding of machine and human planning, providing an interpretable way of understanding the learning and strength of AlphaZero, while generating novel hypothesis on human planning.
Original language | English (US) |
---|---|
Pages | 3601-3607 |
Number of pages | 7 |
State | Published - 2022 |
Event | 44th Annual Meeting of the Cognitive Science Society: Cognitive Diversity, CogSci 2022 - Toronto, Canada Duration: Jul 27 2022 → Jul 30 2022 |
Conference
Conference | 44th Annual Meeting of the Cognitive Science Society: Cognitive Diversity, CogSci 2022 |
---|---|
Country/Territory | Canada |
City | Toronto |
Period | 7/27/22 → 7/30/22 |
Keywords
- Human-DNN comparison
- Interpretable Machine Learning
- Learning
- Planning
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications
- Human-Computer Interaction
- Cognitive Neuroscience