In environments that have multiple sources of reward available simultaneously, organisms generally invest effort proportionately to exploit them. This general and robust behavioral finding is known as “the matching law” and has been documented in humans and several non-human species, under a variety of experimental variations as well as in observational settings.
In a typical matching preparation, two alternatives of reward would pay at different rates. For example, alternative A would deliver, on average, two rewards per minute, while alternative B would deliver only one. Under these constraints organisms tend to invest, on average, twice as many resources exploiting alternative A than B. In other words, the matching law states that the relative rate of investment is a linear function of the relative rate of reward.
The two parameters of the linear matching model represent, respectively, sensitivity to relative reinforcement rates and bias to some alternative regardless of its relative richness. The perfect matching relation is the special case of this linear model in which the organism shows equal sensitivity to all alternatives (slope 1) and no bias towards any (intercept 0). Deviations from that equilibrium constitute an active field of research in that they may account for suboptimal behavior. Crucial to this endeavor, especially when sample sizes are small, is the proper accounting of statistical uncertainty over inferred parameter values.
In this work we present a novel Bayesian graphical model to quantify matching behavior and show its potential by analyzing previously published datasets.
The key contribution of the model lies in its generative nature: while most published analyses under the matching law framework summarize and collapse data across sessions, subjects, or both, our model is able to generate raw counts of responses directly for each individual under every experimental condition. Furthermore, hierarchical extensions of the model allow the inclusion of differences and effects both at individual and session level, paving the way for explanatory extensions to account for potential sources of optimal or suboptimal behavior.
The Bayesian implementation we propose naturally quantifies the evidence in favor or against the matching equilibrium for each unit and for the hyperparameters that control their hierarchical distribution without loss of uncertainty. These novel tools may shed new light on a behavioral finding that has been central in animal decision-making over the last half century.
Ms. Nicole King
Dr. Brandon Turner
Dr. Emily Weichart
In our everyday lives, there are often more aspects of the environment than we can reasonably attend. As a consequence, we selectively attend to some aspects of the environment -- usually those aspects which are most relevant to our goals -- and ignore aspects that are deemed irrelevant. It follows then, that using selective attention can limit a learner's impression of an environment, because the information that is stored in memory is only a biased sample or partially encoded version of that environment. However, many classic models of category learning make a simplifying assumption that dimensions of information are perfectly encoded. Here, we investigate the merits of this assumption by evaluating categorization and memory performance in a categorization paradigm designed to discern learning strategies and partially encoded representations. We demonstrate how particular learning strategies and corresponding representations can influence generalization to novel stimuli presented in a testing phase. We build upon existing models of categorization to illustrate how partial encoding can account for differences in learning.
During the learning process, the brain processes various sources, including sensory and motor input, memory, and attention. However, it can only produce a single output at a time, i.e. motor commands, which are crucial for actively controlling the learning process.
In real-world scenarios, a reinforcement learning agent may need to balance multiple conflicting objectives. Furthermore, the field of artificial curiosity, in which intrinsic reward is linked to learning progress has gained much attention. Here, we are investigating the use of convergent intrinsic reward functions, which incentivize the agent to pursue learning goals that align with multiple objectives.
Previous work in Multi-Objective Reinforcement Learning (MORL) with intrinsic reward has focused on developing algorithms that can optimize multiple objectives simultaneously while also incorporating intrinsic rewards to encourage exploration and the discovery of new strategies. These algorithms have used multi-objective optimization frameworks such as Pareto optimization and hierarchical RL to achieve this goal.
To investigate the scenario of a learning agent with multiple sensory-motor learning objectives that can only take one action at a time, we generated a hierarchical composition framework, in which all possible learners are co-learned, each represented by a neural network that explores unique two-to-one correlations. This adaptable system has a generic structure and can adjust to different input types. These networks’ learning progress is translated to intrinsic rewards, which were fed to a singular actor.
Our findings indicate that certain networks achieve convergence, while others do not, which sheds light on the types of correlations that are learnable and those that are not. Moreover, the singular actor eventually develops a policy that produces more successful learning networks compared to random. This study has the potential to provide insights into how convergent intrinsic reward functions correlate with mechanisms underlying human cognition and behavior.
We propose a dynamic system model for the investigation of the reciprocal relation between practice and success in learning under conditions of free practice, where practice leads to success and success reinforces practice. In free practice, one may quit, in contrast to forced practice, a case that has been extensively studied in mathematical learning theory. The forced ‘law of practice’ models studied in mathematical learning theory are the main building blocks of our model. It is shown that the equilibrium behavior of the reciprocal practice-success (RPS) model depends mainly on the choice of the ‘law of practice’ function. For concave practice functions, the resulting dynamics can be characterized as a fold catastrophe. For S-shaped practice functions, the behavior is governed by a cusp catastrophe, in which sudden transitions between optimal and deprived learning states occur. As such, the model offers new explanations for drop-out, the Matthew effect, and the development of expertise. The psychological interpretation of this model, its practical implications, and limitations are discussed.
Probabilistic Knowledge Space Theory (PKST; Doignon & Falmagne, 1999) provides a set-theoretic framework for the assessment of a subject's mastery of items within a knowledge domain while accounting for response errors (i.e., careless errors and lucky guesses). For usage in longitudinal contexts, a skill-based extension of PKST has been suggested to incorporate two points of measurement (Anselmi et al., 2017; Stefanutti et al., 2011), where skills may be gained or lost from one point of measurement to the next, and the associated parameters for gaining and losing skills may vary between multiple groups. For some of these models, MATLAB code for maximum likelihood parameter estimation via the expectation-maximization algorithm (ML-EM) is available. Its known drawback of potentially inflating response error probabilities is dealt with by introducing (arbitrary) upper bounds for these parameters. In the present work, we develop models that extend the Basic Local Independence Model of PKST with parameters for gaining (or losing) item mastery between two points of measurement. We establish ML-EM parameter estimation and, in order to avoid parameter inflation, both a minimum-discrepancy (MD) method that minimizes response errors and a hybrid MDML method (Heller & Wickelmaier, 2013). All estimation methods are implemented in R. Results on parameter recovery and identifiability are presented.
Dr. Maarten van der Velde
Dr. Jelmer Borst
Hedderik van Rijn
Adaptive learning systems enable any learner to study at a level that is appropriately challenging to them. The cold start problem occurs whenever an adaptive system has not yet had the opportunity to adapt to its user or content. Using learning data from 140 thousand students, we evaluate several methods for alleviating the cold start problem in an adaptive fact learning system. We show that data-driven prediction of the learning system's adaptive parameter leads to more accurate estimates of learning at the start of a session, particularly when the prediction involves fact-specific difficulty information. The observed improvements are similar in magnitude to those in an earlier lab study, where using the predicted values as starting estimates in a learning session significantly increased posttest retention. We expect that comparable retention gains can be achieved in real-world educational practice.
Mr. Thomas Wilschut
Hedderik van Rijn
Cognitive models of memory retrieval aim to capture human learning and forgetting over time, and have been applied in learning systems that aid in memorizing information by adapting to the needs of individual learners. The effectiveness of such learning systems critically depends on their ability to use behavioral proxies to estimate the extent to which learners have successfully memorized the materials. The present study examines cognitive and meta-cognitive indicators of memory strength that are present in the learners’ recorded speech signal while studying vocabulary items by vocally responding to cues. We demonstrate that meta-cognitive beliefs about memory performance are reflected in variations in pitch and speaking speed, whereas the objective accuracy of a response is mainly reflected in its loudness. The results of this study contribute to a better understanding of the relationship between prosodic speech variations and (meta)memory processes. Furthermore, they can have important implications for the further development of models of memory retrieval that are used in adaptive learning systems. For example, extracting information about a speaker’s confidence from the speech signal in real time may allow for improvement of predictions of future retrieval success—without the learner having to make explicit confidence judgments after each learning trial.