Environmental statistics and the trade-off between model-based and TD learning in humans

Dylan A. Simon, Nathaniel D. Daw

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

There is much evidence that humans and other animals utilize a combination of model-based and model-free RL methods. Although it has been proposed that these systems may dominate according to their relative statistical efficiency in different circumstances, there is little specific evidence - especially in humans - as to the details of this trade-off. Accordingly, we examine the relative performance of different RL approaches under situations in which the statistics of reward are differentially noisy and volatile. Using theory and simulation, we show that model-free TD learning is relatively most disadvantaged in cases of high volatility and low noise. We present data from a decision-making experiment manipulating these parameters, showing that humans shift learning strategies in accord with these predictions. The statistical circumstances favoring model-based RL are also those that promote a high learning rate, which helps explain why, in psychology, the distinction between these strategies is traditionally conceived in terms of rulebased vs. incremental learning.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 24
Subtitle of host publication25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
StatePublished - 2011
Event25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011 - Granada, Spain
Duration: Dec 12 2011Dec 14 2011

Publication series

NameAdvances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011

Other

Other25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
Country/TerritorySpain
CityGranada
Period12/12/1112/14/11

ASJC Scopus subject areas

  • Information Systems

Fingerprint

Dive into the research topics of 'Environmental statistics and the trade-off between model-based and TD learning in humans'. Together they form a unique fingerprint.

Cite this