Ada Aka
Dr. Sudeep Bhatia
Dr. John McCoy
What kinds of words are more memorable? Can we use insights from data science and high-dimensional semantic representations, derived from large-scale natural language data, to predict memorability? In Study 1, we trained a model to map semantic representations directly to recognizability and recallability of 576 unique words from a multi-session mega-study. Specifically, we tested how well we could predict the average memorabilities of words using their vector representations. Leave-one-out cross validation results demonstrated that our model was able to reliably predict which words are more likely to be recognized and recalled with very high accuracy (r = 0.70, 95% Confidence Interval (CI) = [0.656, 0.739]). We next compared our model predictions to an alternative psycholinguistic model which was only trained on conventional word properties such as concreteness and word frequency (r = 0.28, 95% CI=[0.203, 0.353]). Despite previous work in the memory literature that have consistently demonstrated the importance of psycholinguistic properties, our method of mapping rich semantic representations to recognition and recall data outperformed this alternative model. Combining semantic representations and psycholinguistic properties, however, further increased our models’ predictive power (r = 0.72, 95% CI=[0.679, 0.757]). In Study 2, we sought to examine and interpret the information contained in semantic representations that gives rise to these successful predictions. We studied individual words and concepts that are most (vs. least) strongly associated with different words in our study word pool in these multi-dimensional spaces. These associations allowed us to characterize the variability in memorability across different study words and determine which attributes, traits, and concepts are most associated with the words that participants were more likely to remember. Results of this study highlighted top constructs that were related to memory performance. These constructs included those relating to humans (e.g., family-, female-, male-related constructs), emotions, and arousing situations. Altogether, we introduced a computational approach that can generalize its learned mappings to make quantitative predictions for the memorability of millions of words or phrases with semantic representations, without the need of any further participant data. In addition, we were also able to identify psychological concepts and constructs that are most-related to high (or low) memory performance. Thus, we provide evidence that using high-dimensional semantic representations is a powerful predictive tool to shed light on which words are more likely to be remembered and what the underlying psychological constructs of successful memory may be.