Reinforcement Learning
Niek Stevenson
Prof. Birte Forstmann
Prof. Andrew Heathcote
The delta learning rule offers a simple but powerful explanation of feedback-based learning. However, normative theories predict that the learning rate should depend on uncertainty, which would allow for more efficient learning processes especially when environments are volatile, such as in the reversal learning paradigm. This paradigm consists of an acquisition phase, in which the participant learns some properties of the environment (e.g., reward probability of each choice option), is followed by the reversal phase, in which those statistical properties are switched. In two datasets, we previously demonstrated that the delta rule fails to capture the speed at which participants adapt to the reversal. A mechanism that allows the learning rate to vary as a function of the volatility of the environment could potentially provide a better account of learning behavior in this paradigm. Here, we studied whether the volatile Kalman filter (Piray and Daw, 2020) better accounts for empirical data in the reversal learning paradigm, and include tests of parameter recovery.
This is an in-person presentation on July 19, 2023 (11:20 ~ 11:40 UTC).
Dr. Myeong Seop Song
Prof. Min-hwan Oh
Prof. Woo-Young Ahn
Impulsivity has been extensively studied in relation to mental disorders and maladaptive behaviors using self-report questionnaires and behavioral tasks. A persistent issue is that self-report and behavioral measures show weak correlations between each other, although they are supposed to tap the same construct. To address this problem, we devised a real-time driving task called the “highway task” that allows participants to exhibit impulsive behaviors, such as reckless driving, which may mirror real-life impulsive traits assessed by self-report questionnaires. We hypothesized that the highway task would provide impulsivity measures that are strongly correlated with self-report measures of impulsivity. As hypothesized, statistical evidence supported the correlation between the performance in the highway task and a self-report measure of impulsivity (i.e., the Barratt impulsiveness scale, r=0.46). By contrast, measures of impulsivity from two traditional laboratory tasks (delay discounting and go/no-go tasks) did not correlate with BIS (r=0.01, 0.07, respectively). To infer subjective reward functions that underlie observed real-time behaviors in the highway task, we used an inverse reinforcement learning (IRL) algorithm combined with deep neural networks. The agents trained by IRL produced actions that resemble participants’ behaviors observed in the highway task. IRL inferred sensible reward functions from participants’ behaviors and revealed real-time changes in rewards around salient events (e.g., overtaking, a collision with a car ahead, etc.). The rewards inferred by IRL suggested that impulsive participants have high subjective reward values for irrational or risky behaviors. Overall, our results suggested that using real-time tasks with IRL may bridge the gap between self-report and behavioral measures of impulsivity, with IRL being a practical modeling framework for multidimensional data from real-time tasks.
This is an in-person presentation on July 19, 2023 (11:40 ~ 12:00 UTC).
Mr. Erik Stuchlý
Sebastian Gluth
Humans are known to be capable of inferring hidden preferences and beliefs of their conspecifics when observing their decisions. While observational learning based on choices has been explored extensively, the question of how response times (RT) impact our learning of others’ social preferences has received little attention. Yet, there is only limited potential for inferring the strength of preference (i.e., the confidence with which the person has made their choice or how likely they are to make the same choice again) from choices alone, and RT can provide critical information in this regard. Here, we propose an orthogonal design to investigate the role of both choices and RT in learning and inferring others’ social preferences. In our lab study, participants (n = 46) observed other people’s decision process in a Dictator Game, where the dictators were asked to choose between different monetary allocations. Choice and RT information was either hidden or revealed to participants in a 2-by-2 within-subject design. Behavioral analyses confirmed our hypothesis: trial by trial, observers were able to learn the dictators' social preferences when they could observe their choices, but also when they could only observe their RT. To gain mechanistic insights into these observational learning processes, we developed a reinforcement learning model that takes both choices and RT into account to infer the dictator’s social preference. This model closely captured the performance and learning curves of observers in the different conditions. By comparing this model to a Bayes-optimal model, we show that while our participants’ learning is close-to optimal when they can observe choices, they substantially deviate from optimality when they can only observe RT, suggesting that the underlying mechanisms are better captured by our approximate reinforcement learning model. Overall, our study proposes an innovative approach to investigate the role of RT in learning and inferring preferences and highlights the importance of considering decision processes when investigating observational learning.
This is an in-person presentation on July 19, 2023 (12:00 ~ 12:20 UTC).
Dr. Ismail Guennouni
In social settings, the consequences of our actions typically depend on the actions of other agents. Successful outcomes then require agents to adapt their behaviour to each other. Planning under such mutual adaptation is a challenging computational problem. Circumventing this complexity, socially-ignorant reinforcement learning can, in principle, succeed in optimising behaviour in the long-run. But this works for the isolated case of repeated exposure to the same task with the same other agents. In reality we have limited exposure to such situations, and are more likely to encounter other agents in the same task, or encounter the same other agent in different tasks. Leveraging prior experience then requires generalization, from the same agent to other settings, and from encountered agents to novel agents. Such generalization can rely on various inferences, such as others' depth of strategic reasoning (e.g. how far to proceed with reasoning such as "you think that I think that you think that I will do...") and their social preferences (e.g. "you want us both to be better off" vs "you want to make sure you are further ahead of me"). Here, I will discuss some of the challenges of such social inference, present evidence that such inferences are indeed made, and provide a new framework (based on hidden Markov models) to navigate planning in social interactions.
This is an in-person presentation on July 19, 2023 (12:20 ~ 12:40 UTC).
Submitting author
Author