TY - JOUR
T1 - Sources of suboptimality in a minimalistic explore–exploit task
AU - Song, Mingyu
AU - Bnaya, Zahy
AU - Ma, Wei Ji
N1 - Publisher Copyright:
© 2019, The Author(s), under exclusive licence to Springer Nature Limited.
PY - 2019/4/1
Y1 - 2019/4/1
N2 - People often choose between sticking with an available good option (exploitation) and trying out a new option that is uncertain but potentially more rewarding (exploration)1,2. Laboratory studies on explore–exploit decisions often contain real-world complexities such as non-stationary environments, stochasticity under exploitation and unknown reward distributions3–7. However, such factors might limit the researcher’s ability to understand the essence of people’s explore–exploit decisions. For this reason, we introduce a minimalistic task in which the optimal policy is to start off exploring and to switch to exploitation at most once in each sequence of decisions. The behaviour of 49 laboratory and 143 online participants deviated both qualitatively and quantitatively from the optimal policy, even when allowing for bias and decision noise. Instead, people seem to follow a suboptimal rule in which they switch from exploration to exploitation when the highest reward so far exceeds a certain threshold. Moreover, we show that this threshold decreases approximately linearly with the proportion of the sequence that remains, suggesting a temporal ratio law. Finally, we find evidence for ‘sequence-level’ variability that is shared across all decisions in the same sequence. Our results emphasize the importance of examining sequence-level strategies and their variability when studying sequential decision-making.
AB - People often choose between sticking with an available good option (exploitation) and trying out a new option that is uncertain but potentially more rewarding (exploration)1,2. Laboratory studies on explore–exploit decisions often contain real-world complexities such as non-stationary environments, stochasticity under exploitation and unknown reward distributions3–7. However, such factors might limit the researcher’s ability to understand the essence of people’s explore–exploit decisions. For this reason, we introduce a minimalistic task in which the optimal policy is to start off exploring and to switch to exploitation at most once in each sequence of decisions. The behaviour of 49 laboratory and 143 online participants deviated both qualitatively and quantitatively from the optimal policy, even when allowing for bias and decision noise. Instead, people seem to follow a suboptimal rule in which they switch from exploration to exploitation when the highest reward so far exceeds a certain threshold. Moreover, we show that this threshold decreases approximately linearly with the proportion of the sequence that remains, suggesting a temporal ratio law. Finally, we find evidence for ‘sequence-level’ variability that is shared across all decisions in the same sequence. Our results emphasize the importance of examining sequence-level strategies and their variability when studying sequential decision-making.
UR - http://www.scopus.com/inward/record.url?scp=85061373253&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061373253&partnerID=8YFLogxK
U2 - 10.1038/s41562-018-0526-x
DO - 10.1038/s41562-018-0526-x
M3 - Letter
C2 - 30971784
AN - SCOPUS:85061373253
SN - 2397-3374
VL - 3
SP - 361
EP - 368
JO - Nature human behaviour
JF - Nature human behaviour
IS - 4
ER -