Not great, not terrible: A reward “landscape” analysis of time-varying decision thresholds
Normative models of perceptual decision-making predict that time-varying decision policies, such as collapsing decision thresholds, represent the optimal strategy in certain contexts. Nevertheless, experimental studies often reveal systematic differences between the model-inferred optimal threshold and the thresholds adopted by participants. Malhotra et al. (2018, J. Exp. Psychol. Gen.) computed the reward rate of decision thresholds with different intercepts and gradients – the ‘reward landscape’ - and found that the optimal policy in their task was adjacent to policies with extremely low reward rate. They proposed that the observed choice of sub-optimal thresholds is a result of satisficing, whereby participants explore this landscape and settle for policies distant enough from those which yield low reward rate, while still being near-optimal. If this hypothesis holds, then lowering the reward rate of all non-optimal policies, while keeping the optimal policy unchanged, should motivate participants to adopt thresholds closer to the optimal policy. We report findings from Monte Carlo simulations used to generate the reward landscape, which identified two task parameters that change the reward rate of thresholds around the optimal policy, while keeping the optimal policy unchanged: monetary penalty and inter-trial interval for incorrect decisions. We manipulated these parameters in an experimental task to identify participants’ position on the reward landscape and to examine how sensitive they are to changes in this landscape. By considering a broad range of decision policies in this fashion, we can reach a better understanding of why and how time-varying decision strategies are used.