Society for Mathematical Psychology

MathPsych/ICCM 2021 Half-Normal Time series

...

Authors

UC Berkeley ~ Helen Wills Neuroscience Institute & Dept. of Psychology

University of California, Berkeley ~ Department of Psychology

UC Berkeley, United States of America

Video

Legal disclaimers

Share

Abstract

How do we decide whether we should explore or exploit in uncertain environments where feedback is intermittent? In this talk, we compare two approaches to computational modeling of the cognitive process underlying such decisions, using control group data from an ongoing clinical research collaboration. Participants completed multiple blocks of the “observe or bet” task, which is a dynamic sequential decision-making task. To maximize reward, participants must strike a balance between betting on (but not seeing) which event will occur, versus observing events in the sequence (and forgoing gaining or losing points). Participants efficiently alternated between observing and betting, while overall observing more at the start of a sequence, and betting more towards the end. To better understand this data, we used two classes of hierarchical Bayesian models. First, we implemented nine versions of the “heuristic model” of this task, developed by Navarro, Newell, & Schulze (2016), which posits a cross-trial evidence accumulation process. Second, we implemented eight variants of a modified reinforcement learning (RL) model, which is a novel adaptation of Q-learning. Across all models, the modified RL model with counterfactual learning and a high fixed value of observing provided the best fit to the observed data. We discuss implications for modeling of this task, and for RL modeling more generally. We emphasize how this challenges a strict conceptualization of RL, as the modified RL model’s success suggests that the same computations responsible for learning from rewards might also subserve learning from outcomes that are non-extrinsically (but potentially intrinsically) rewarding.

Evidence accumulation or reinforcement learning? Modeling sequential decision-making in the “observe or bet” task

Keywords

Cite this as: