Society for Mathematical Psychology

SMP 2023 Mellenbergh | M1.01 Risky Choice

...

University of Bremen, Germany ~ Psychology

Situations requiring to balance exploration and exploitation are ubiquitous. In such, humans frequently have the chance to observe others. Participants performed restless nine-armed bandit tasks, either on their own or while seeing the choices of fictitious agents, which were equally good, but different regarding their tendency to explore. We used different Bayesian Mean Tracker models to fit participants data. Therein, individual choice probabilities are calculated from the expected values of all options using a softmax function, in which random exploration is implemented as temperature parameter while directed exploration biases the expected values of the options towards especially uncertain, informative options. We implemented copying in two different ways: in the unconditional copying model it is assumed that participants copy the observed agent with a fixed probability, independent of the subjective value estimations. In the copy when uncertain model, the probability of copying depends on the entropy in all options’ value estimations. Our results indicate that the copy when uncertain model can account better for participants data than the unconditional copying model. Participants use observational learning directly, i.e., they imitate the specific choices, but they also accommodate their individual exploration strategy towards the strategy of the observed agents.

Observational learning of Exploration Exploitation Strategies in Bandit Tasks

Keywords

Cite this as: