g-distance: A new framework for comparison of model and human heterogeneity
This work explores model adequacy as a function of heterogeneity, prediction and a priori likelihood. Models are often evaluated when their behaviour is at its closest to a single group-averaged empirical result. This evaluation neglects the fact that both models and humans are heterogeneous. Models' and humans' behavioural repertoire is not restricted to a single unit of behaviour but is composed of a range of distinguishable behaviours - ordinal patterns. In this framework, we develop a measure, g-distance, that considers model adequacy to be the extent to which models exhibit a similar range of behaviours to the human behaviours it models. We then continue to apply this framework to models of an irrational learning effect, the inverse base-rate effect. We include 6 models in our model comparison. In the process of analysing the human data, we show that the canonical averaged group-level empirical result hides theoretically important and robust relationships between pairs of stimuli. These are amongst the most commonly observed ordinal results on a subject level. Models are unable to accommodate these relationships. They all perform unanimously poorly in our benchmark. In addition, the model that best accommodates human behaviour also predicts almost all unobserved possible behaviours. We show that all models unanimously predicted many more unobserved behaviours than accommodated already observed behaviours. We discuss these sets of results in terms of how well they approximate human behaviour in the inverse base-rate paradigm if most of the behaviours they produce are not exhibited by humans. Finally, we propose various new avenues for formal computational modelling by clearly defining a handful of scientific problems.