Statistics and Methodology
Several authors have recommended adopting the ROC Area Under the Curve (AUC) as an effect-size for group comparisons, arguing that it measures a type of effect that conventional effect-size measures do not. Likewise, the mean ridit technique has been rediscovered (and renamed) several times, and recommended as an effect-size measure for comparing groups on an ordinal dependent variable. Both the AUC and mean ridit measure the probability that a randomly chosen case from one group will score higher on the dependent variable than a randomly chosen case from the other group. Moreover, as the number of ordinal categories approaches infinity the mean ridit equals the AUC in the limit. Both are base-rate insensitive, robust to outliers, and invariant under order-preserving transformations. However, applications of both AUC and mean ridit have been limited to group comparisons, and usually just two groups. I will show that the AUC and mean ridit can be used as an effect-size for both categorical and continuous predictors in a wide variety of general linear models whose dependent variables may be ordinal, interval, or ratio-level. Thus, the AUC/mean ridit is a very general effect-size measure and it measures an important and interpretable effect not captured by conventional effect-sizes.
Modeling performance on cognitive tasks with accuracy at or near ceiling presents modelers with a difficult choice. One option is to use the full diffusion model, but high accuracy makes it difficult to obtain enough error trials to allow accurate estimation of the diffusion parameters. Another option is to use a single-boundary accumulator, such as the shifted Wald distribution. However, despite their conceptual similarity, the parameters of the shifted Wald distribution do not correspond uniquely to those of the diffusion model, and thus, the ability to interpret shifted Wald parameters in the context of a cognitive model is compromised. One way to "split the difference" is to introduce a censoring mechanism to the shifted Wald distribution, which allows the small number of error trials to be modeled as correct trials which have undergone censoring. Miller et al. (2018) showed that this censored shifted Wald model was able to successfully recover diffusion model parameters in high-accuracy contexts. In this talk, I will describe a hierarchical Bayesian version of the censored shifted Wald model. In addition, I will share some preliminary data from a parameter recovery study that shows its superior ability to accurately recover diffusion model parameters compared to classical maximum likelihood approaches. Finally, I will describe an application of the model to an open question in numerical cognition.
Most research problems can be presented using systems of random variables in which each variable is identified by what it measures (what question it answers) and by their context, the conditions under which it is recorded. In a contextual system, observed joint distributions of variables recorded under different conditions cannot be placed together into a particular overall joint distribution where variables that correspond to the same property in different contexts are equal to each other as often as possible. For general systems of random variables, several principled measures of the degree of contextuality have been proposed in the contextuality literature. They are denoted as CNT1, CNT2, CNT3, and CNTF (Contextual Fraction). Each of these measures depicts a unique aspect of contextuality. CNT1 gives a degree of incompatibility, given fixed observed joint distributions, of the (maximal possible probability of) identity of variables that are response to the same question. CNT2 reverses the roles to measure the incompatibility, assuming the (maximal possible probability of) identity across contexts, as a function of varying hypothetical joint distributions of observables. CNT3 is computed replacing the probability distribution for all variables in a system by a signed-measure and minimizing the need for negative masses. Lastly, CNTF characterizes a system in terms of its relative distance between a noncontextual system and a maximally contextual system. Within the class of cyclic systems, those for which each question is answered only under two different contexts, and each context includes only two questions being asked, it has been conjectured that all the above measures coincide up to proportionality. Previous work presented at MathPsych has already proved this to be the case for some of these measures (CNT1 and CNT2). In this talk I will show that the remaining measures available (CNT3 and Contextual Fraction) are also proportional to each other and to the other measures. The present proofs complete the theory of the cyclic systems, the different measures of contextuality, and their properties. Literature: Dzhafarov, E.N., Kujala, J.V., & Cervantes, V.H. (2020). Contextuality and noncontextuality measures and generalized Bell inequalities for cyclic systems. Physical Review A 101, 042119. (available as arXiv:1907.03328.) (Erratum Notes: Physical Review A, 101, 069902 and Physical Review A, 103, 059901) Cervantes, V. H. (2023). A note on the relation between the Contextual Fraction and CNT2. Journal of Mathematical Psychology, 112, 102726. Camillo, G. & Cervantes, V. H. (in preparation). Measures of contextuality in cyclic systems and the negative probabilities measure CNT3.
Prof. Robert Biegler
Prior experience can help resolve ambiguity. Quantitative models of this process represent both prior experience and sensory information as probability distributions over suitable parameters. Such prior distributions are core features of models of perception, learning, and reasoning, and thus their properties are important. We define three (families of ) priors, and fit them to existing data. The iterative Kalman prior involves multiplying the prior and sensory probability distributions. If the distributions are Gaussian, the precision (inverse variance) of the resulting posterior is the sum of the precisions of the prior and the sensory distributions. The posterior becomes the new prior, precision keeps adding up across iterations and describes how precisely the mean of the underlying distribution is known. A second family of priors can be generated by either creating a distribution from the central tendencies of past sensory inputs, producing a prior whose variance is the sum of process and sensory variance, or else by averaging past sensory distributions, producing a prior whose variance is the sum of process and twice the sensory variance. Such a prior is useful for risk sensitivity and change point detection. A third family of priors can be generated by delaying storage in memory until after a posterior has been created through Bayesian cue integration of prior and sensory data to predict the distribution of future subjective experience. This family of priors is sensitive to the order of inputs, and it is impossible to know either the shape of the distribution or its variance without knowing in which order stimuli were presented. Fitting priors to existing data indicates that the worst performing prior is the Kalman prior, even though, in the papers we have found so far that explicitly state how the prior is updated, the iterative Kalman prior is favoured 11 to 1.
Full Maximum Likelihood (ML) estimation of polychoric correlation matrices is too computationally expensive. As an alternative, other approaches use pairwise bivariate normal densities of the categorical distributions as an estimation method. These methods, however, usually result in matrices that are not positive semidefinite. In this presentation, we introduce a new approach to estimate polychoric correlation matrices that use pairwise bivariate normal densities, but the form of the correlation matrix is restricted to guarantee that it is positive semidefinite. Our approach is based on using a transformation of the lower diagonal values of the Cholesky (or LDL) decomposition matrix as the parameters of the models. This restriction is monotonic in regards to the estimated correlations and, therefore, are easy to include in an optimization routine and should result in a well-behaved objective function. We also show that extending the original approach to a regularized Bayesian approach—i.e., using zero-centered symmetric distribution as priors to the lower diagonal values of the Cholesky (or LDL) decomposition matrix—helps in guaranteeing a better convergence of the estimates. Results of a pilot simulation study are presented and discussed in regards to computation efficiency (i.e., computation time) and the precision of the estimates. Suggestions for future studies are also presented, focusing mainly on what aspects of the simulation these studies should focus on.
Submitting author
Author