Close
This site uses cookies

By using this site, you consent to our use of cookies. You can view our terms and conditions for more information.

Functional generalization and asymmetric learning in a feature-based bandit task

Authors
Prof. Maarten Speekenbrink
University College London ~ Experimental Psychology
Abstract

Multi-armed bandits are a useful paradigm to study how people balance exploration (learning about the value of options) and exploitation (choosing options with known high value). When options are distinguished by features predictive of reward, exploration aids generalization of experience to unknown options. The present study builds on our earlier work on human exploration and generalization in a feature-based bandit task (Stojic et al., 2020). Here, I present results from a new experiment where novel options are introduced regularly in three different environments: options either only provide rewards (gain), only provide punishments (loss), or can both provide rewards or punishments (mixed). Options were represented by randomly generated tree-like shapes, with features determining the angle and width of branches. Value of the options was a nonlinear function of the features. Regardless of the environment, people were quite good at choosing the best option. When first encountering each novel option, whether that option was chosen depended on the relative value of the option, indicative of successful function generalization. Compared to the other environments, exploration of novel options was generally larger in the loss environment. Computational modelling provides further insights into these results. We contrast a model that employs function learning through Gaussian Process regression with a new model that learns the value of options through a hierarchical Bayesian filter. Both models can employ a Bayesian mechanism to allow for asymmetric learning rates for positive vs negative reward prediction errors. Some evidence for such asymmetric learning is found.

Tags

Keywords

Generalization
exploration
function learning
multi-armed bandit
Gaussian Process
Discussion
New

There is nothing here yet. Be the first to create a thread.

Cite this as:

Speekenbrink, M. (2021, July). Functional generalization and asymmetric learning in a feature-based bandit task. Paper presented at Virtual MathPsych/ICCM 2021. Via mathpsych.org/presentation/524.