Close
This site uses cookies

By using this site, you consent to our use of cookies. You can view our terms and conditions for more information.

Evidence accumulation or reinforcement learning? Modeling sequential decision-making in the “observe or bet” task

Authors
Dr. Beth Baribault
UC Berkeley ~ Helen Wills Neuroscience Institute & Dept. of Psychology
Ms. Manon Ironside
University of California, Berkeley ~ Department of Psychology
Dr. Sheri Johnson
University of California, Berkeley ~ Department of Psychology
Anne Collins
UC Berkeley, United States of America
Abstract

How do we decide whether we should explore or exploit in uncertain environments where feedback is intermittent? In this talk, we compare two approaches to computational modeling of the cognitive process underlying such decisions, using control group data from an ongoing clinical research collaboration. Participants completed multiple blocks of the “observe or bet” task, which is a dynamic sequential decision-making task. To maximize reward, participants must strike a balance between betting on (but not seeing) which event will occur, versus observing events in the sequence (and forgoing gaining or losing points). Participants efficiently alternated between observing and betting, while overall observing more at the start of a sequence, and betting more towards the end. To better understand this data, we used two classes of hierarchical Bayesian models. First, we implemented nine versions of the “heuristic model” of this task, developed by Navarro, Newell, & Schulze (2016), which posits a cross-trial evidence accumulation process. Second, we implemented eight variants of a modified reinforcement learning (RL) model, which is a novel adaptation of Q-learning. Across all models, the modified RL model with counterfactual learning and a high fixed value of observing provided the best fit to the observed data. We discuss implications for modeling of this task, and for RL modeling more generally. We emphasize how this challenges a strict conceptualization of RL, as the modified RL model’s success suggests that the same computations responsible for learning from rewards might also subserve learning from outcomes that are non-extrinsically (but potentially intrinsically) rewarding.

Tags

Keywords

learning
reinforcement learning
decision making
Discussion
New
Absolute versus relative model fit Last updated 2 years ago

Hi Beth, Very interesting talk and cool modelling. I understand your results such that the RL model not only does a better job in terms of relative model performance (WAIC), but also in terms absolute model performance (i.e., it provides the overall best fit). Is this the case across all the posterior predictive summary statistics you have looke...

Dr. Henrik Singmann 1 comment
Cite this as:

Baribault, B., Ironside, M., Johnson, S. L., & Collins, A. (2021, July). Evidence accumulation or reinforcement learning? Modeling sequential decision-making in the “observe or bet” task. Paper presented at Virtual MathPsych/ICCM 2021. Via mathpsych.org/presentation/556.