This site uses cookies

By using this site, you consent to our use of cookies. You can view our terms and conditions for more information.

Differences in learning process dynamics when rewards are familiar versus instructed

Dr. Beth Baribault
UC Berkeley ~ Helen Wills Neuroscience Institute & Dept. of Psychology
Anne Collins
UC Berkeley, United States of America

Most reinforcement learning (RL) experiments use familiar reinforcers, such as food or money, which are relatively objectively rewarding. However, in everyday life, teaching signals are rarely so straightforward --- often we must learn from the achievement of subgoals (e.g., high heat must be achieved before cooking), or from feedback that we have been instructed to perceive as reinforcement, yet is not intrinsically rewarding (e.g., grades). As such, investigating how similar the dynamics of learning from familiar rewards, which are well-studied, are to the dynamics of learning from subgoals and instructed rewards, which are more realistic, can help us to understand the ecological validity of laboratory reinforcement learning research.In this talk, we discuss our recent work investigating these potential similarities using computational modeling, while emphasizing individual differences. In our experiment, participants completed a probabilistic RL task, comprising multiple interleaved two-armed bandit problems, and an N-back task. Some bandits were learned using points, a familiar reward, while others were learned based on whether their selection lead to a “goal image” unique to each trial, an instructed reward. In the instructed condition, participants tended to learn more slowly, and each participant’s performance correlated with their working memory ability. Hierarchical Bayesian model comparison revealed that differences in behavior due to feedback type were best explained by a lower learning rate for instructed rewards, although this effect was reversed or absent for some participants. These strong individual differences suggest that differences in learning dynamics between familiar and instructed rewards may not be universally applicable.



reinforcement learning
working memory
cognitive modeling
model comparison


Cognitive Modeling
Bayesian Modeling

Dear Dr. Beth, thank you very much for your presentation, was excellent. It is a fascinating, enlightening, and very relevant topic. I have a couple of questions. The first question has to do with what you mention about the interactions of different processes. I agree with you that study of the interaction of cognitive processes (simultaneously) i...

Dr. Alfonso Díaz Furlong 0 comments
SDT bias Last updated 3 years ago

Excellent talk, very interesting. Just curious whether you also considered response bias in the n-back task, e.g. bias in SDT in your model. You mentioned a model using keystrokes, so maybe the tendency of being a yes or no sayer (liberal vs conservative bias) in the n-back task will also affect performance in the bandit task.

Prof. Gerit Pfuhl 1 comment
Cite this as:

Baribault, B., & Collins, A. (2020, July). Differences in learning process dynamics when rewards are familiar versus instructed. Paper presented at Virtual MathPsych/ICCM 2020. Via