Symposium: Computational Psycholinguistics
Garden-path sentences such as “While the doctor visited the patient collapsed” are one of the most-studied phenomena in psycholinguistics, as they typically engender large, reliable slowdowns in reading compared to control conditions (“While the doctor visited, …”). These reading slowdowns have classically been interpreted as indexing syntactic reanalysis, that is, switching from an initial incorrect structure (“the doctor visited the patient”) to the correct one. More recently, however, it has been argued that readers do not always carry out reanalysis, but may retain the incorrect structure instead (e.g., Christianson et al., 2001). Furthermore, both the online reading time data and offline sentence judgments (“grammatical”/”ungrammatical”) may be contaminated by trials in which participants are not paying attention to the stimulus at all. Given these potential contaminants, it is important to go beyond simply comparing condition means. Multinomial processing trees, which take observed responses as being generated from a cascade of latent cognitive processes, can be a highly useful tool in this context. I present a model that partitions the processing of the critical sentence region (“collapsed”) into three components: attention, surprise, and reanalysis. Reading times are analyzed as coming from a mixture distribution whose components are identified by the costs of the latent processes that either occur or do not occur in a given experimental trial. The full tree model provides better predictive fit than simpler models, and the estimates suggest that syntactic reanalysis is much more costly than previously assumed once contaminants are taken into account.
This is an in-person presentation on July 21, 2024 (10:00 ~ 10:20 CEST).
Eunice Fernandes
Manabu Arai
Frank Keller
Human cognition is a highly integrated system which synchronizes processes and representations across modalities. Previous research on the synchronization between attention and sentence production demonstrated that similar scene descriptions correspond to similar sequences of attended objects (scan patterns). Here, we generalise this finding from English to languages with different word order. We test whether synchronicity holds not just within a language but across languages and examine the relative contribution of syntax and semantics. 74 participants (24 English, 28 Portuguese, 20 Japanese) described objects (N = 24), either animate (e.g., man) or inanimate (e.g., suitcase), situated in a visual scene, while being eye-tracked. Across all participants, pair-wise sentence similarity was computed using Universal Sentence Encoder, which generates multilingual vector-based meaning representations. Part-of-Speech (POS) sequences, shallow representation of the syntax of sentences, were extracted using spaCy. Similarities between POS sequences and scan patterns were measured using Longest Common Subsequence. We found that similar sentences are associated with similar scan patterns in all three languages. Moreover, we demonstrated for the first time that this relationship holds across languages (e.g., if a Japanese and a Portuguese sentence are semantically similar, their associated scan patterns are also similar). In contrast, we find that syntactic (POS) similarity is predicted by scan patterns only within the same scene and only between languages with similar word order. This confirms that visual attention and language production are synchronized, but also points to a grammar of perception that is language-independent, goes beyond syntactic realizations, and manifests as oculomotor responses.
This is an in-person presentation on July 21, 2024 (10:20 ~ 10:40 CEST).
Mr. David Reich
Mr. Patrick Haller
Ms. Deborah Jakobi
Mr. Paul Prasse
Prof. Lena Jäger
Eye movements in reading reveal cognitive processes involved in human language understanding. As such, they have been pivotal in psycholinguistic research. More recently, they have also been utilized for both interpreting and enhancing the cognitive plausibility of language models as well as for inference tasks, such as deducing properties of the reader or the text being read. However, eye movement data is scarce and usually unavailable at inference time, which constitutes an obstacle for this branch of research. Traditional approaches trying to amend this issue relied on cognitive models to generate synthetic scanpaths, but recent shifts favor purely machine-learning-based methods, as they prove more suitable for the sole task of generating human-like synthetic eye movement data. Based on recent research that applies continuous diffusion processes to discrete data, we present ScanDL, a discrete sequence-to-sequence diffusion model that produces human-like scanpaths on texts. The model captures the multi-modal interactions of the input by employing pre-trained word embeddings and a joint representation of the text and the fixations in space. Our assessment of ScanDL across different settings demonstrates its superior performance against previous benchmarks in scanpath generation. Moreover, an extensive psycholinguistic analysis reveals that ScanDL captures key phenomena observed in human readers, such as surprisal, word length, and frequency. We underline the model’s ability to exhibit human-like reading behavior and further show that it can be used for power analyses, and could, prospectively, also be used for piloting psycholinguistic experiments.
This is an in-person presentation on July 21, 2024 (10:40 ~ 11:00 CEST).
Neurocognitive models of language comprehension are informed by the differential modulation patterns of the N400 and P600 components of the Event-Related Potential (ERP) signal during incremental language comprehension. Models differ, however, in the functional interpretation assigned to the N400 and P600, leading to fundamentally different comprehension architectures that yield different predictions regarding the modulation of these components. Here, we focus on the predictions of an explicit neurocomputational model that instantiates Retrieval-Integration (RI) theory. On RI theory, N400 amplitude reflects the contextualized retrieval of word meaning from long-term memory, and P600 amplitude indexes the integration of retrieved word meaning into the unfolding utterance representation. In particular, RI theory predicts that the well-known N400-effect for semantic incongruity can be wiped out completely when the retrieval of word meaning is facilitated through lexical and contextual priming, and moreover, that the integrative processes underlying the P600 are graded for plausibility, such that P600 amplitude increases as words get less plausible. ERP evidence will be discussed that directly confirms these predictions, which are unique to RI theory, thereby critically challenging alternative accounts and models of the N400 and P600.
This is an in-person presentation on July 21, 2024 (11:40 ~ 12:00 CEST).
Fitz & Chang (2019) argue that event-related brain potentials (ERPs) during sentence comprehension result from detection and incorporation of word-prediction error. The N400 component would correlate with prediction error while the P600 would be indicative of error backpropagation in the language system. Psycholinguistically speaking, the latter is an estimate of how much the comprehended sentence changes the reader’s language knowledge. Fitz & Chang evaluate their theory using (backpropagated) prediction error from a neural network trained on an artificial miniature language, and show that it indeed accounts for many ERP effects from the literature. I present an evaluation of Fitz & Chang’s (2019) account on a corpus of EEG data recorded from participants reading naturalistic English sentences (Frank et al., 2015). At each word of the sentences, surprisal and the total gradient of recurrent-layer connections were estimated by an LSTM language model trained on a large, natural language corpus. Consistent with the theory, higher surprisal resulted in stronger N400 while higher gradient resulted in stronger P600, although a detailed analysis of the ERP time course suggests the apparent P600 effect should be interpreted as a reversed N400 effect. The same model was then applied to over 4000 sentences from a large-scale sentence acceptability rating study (Lau et al., 2017). Higher gradient was associated with lower acceptability (over and above surprisal), suggesting that lower subjective acceptability is partly due to larger update in language knowledge, and not only to lower sentence probability.
This is an in-person presentation on July 21, 2024 (12:00 ~ 12:20 CEST).
Milena Rabovsky
Daniel Schad
Prediction error, both at the level of sentence meaning and at the level of the next presented word, has been shown to successfully account for N400 amplitudes. Here we address the question of whether people differ in the representational level at which they implicitly predict upcoming language. To this end, we computed a measure of prediction error at the level of sentence meaning (magnitude of change in hidden layer activation, termed semantic update, in a neural network model of sentence comprehension, the Sentence Gestalt model) and a measure of prediction error at the level of the next word (surprisal from a next word prediction language model). When using both measures to predict N400 amplitudes during the reading of naturalistic texts, results showed that both measures significantly accounted for N400 amplitudes even when the other measure was controlled for. Most important for current purposes, both effects were significantly negatively correlated such that people with a reversed or weak surprisal effect showed the strongest influence of semantic update on N400 amplitudes. Moreover, random-effects model comparison showed that individuals differ in whether their N400 amplitudes are driven by semantic update only, by surprisal only, or by both, and that the most common model in the population was either semantic update or the combined model but not the pure surprisal model. The current approach of combining large-scale models implementing different theoretical accounts with advanced model comparison techniques enables fine-grained investigations into the computational processes underlying N400 amplitudes, including interindividual differences in the involved computations.
This is an in-person presentation on July 21, 2024 (12:20 ~ 12:40 CEST).
One of the challenges in cognitive research on language is to find meaning representations that would go beyond simple (lexical) representations and be able to capture the incremental interpretation of discourse. At the same time, the representations should be practical enough to be deployable on data that are of interests to cognitive scientists. In this presentation, I discuss a way to make use of meaning representations developed in formal discourse semantics, Discourse Representations Structures, to advance our understanding of how interpretation and behavioral measures (reading data) are related. I will present a text corpus with eye-tracking and self-paced reading data that was fully annotated with Discourse Representation Structures. Then I will show how such an annotation can be used to investigate a link between behavioral measures (fixations, reaction times) and discourse meanings. Finally, time permitting, I show how we can use the annotated corpus to test claims regarding meaning and cognition present in theories of sentence processing, e.g., the assumption that processing cost and the introduction of discourse referents are interlinked (Dependency locality theory, Gibson, 1998, among others).
This is an in-person presentation on July 21, 2024 (15:20 ~ 15:40 CEST).
Fritz Günther
Marco Marelli
Giuseppe Attanasio
Federico Bianchi
Computational models of semantic representations have long assumed a single static representa-tion for each word type, ignoring the influence of linguistic context. Recent Large Language Models (LLMs), however, learn token-level contextualized representations, allowing the study of how semantic representations change in context. We use BERT to probe type- and token-level representations for their ability to i) ex-plain semantic effects for isolated words (semantic relatedness and similarity ratings, lexical decision, and semantic priming), ii) exhibit systematic interactions between lexical semantics and context, and iii) explain meaning modulations in context. Across several empirical studies, we show that BERT representations satisfy two desid-erata for psychologically valid semantic representations. First, they have a stable semantic core which allows people to interpret words in isolation and prevents them to be used arbitrarily. Neighborhood density of prototype embeddings explains unique variance in lexical decisions, target-prime relatedness accounts for reaction times in primed lexical decision, and geometric proximity in BERT’s representational space accounts well for both semantic similarity and relat-edness. Second, BERT representations interact with sentence context in systematic ways, with representations shifting as a function of their semantic core and the context: replacing word A for word B makes B’s representation shift away from its prototype representation and closer to that of A, and the closer A and B’s prototypes are, the easier it is for the model to effectively contextualize the replacement. Finally, we show that BERT representations can capture meaning modulations across and within word senses: for example, the representation for tomato is clos-er to that of green in the sentence The tomato in my garden is ripening and to that of spaghetti in the sentence You can still see some tomato pieces in the pasta sauce, despite the referent possibly being the same. Therefore, a single, comprehensive model which simultaneously learns abstract, type-level prototype representations as well as mechanisms of how these interact with context can explain both isolated word effects and context-dependent variations. Notably, these variations are not limited to discrete word senses, eschewing a strict dichotomy between exemplar and prototype models and re-framing traditional notions of polysemy.
This is an in-person presentation on July 21, 2024 (15:40 ~ 16:00 CEST).
Mx. Alexandra Mayn
Prof. Vera Demberg
Individuals are known to vary in their likelihood of drawing Gricean pragmatic inferences. We present the first algorithmic-level model of a pragmatic reasoning task, in ACT-R, to formalize the way in which this variability could come from individual differences in domain-general cognitive variables. In non-linguistic reference games (RefGames), participants select a referent for a possibly-ambiguous message. RefGame participants vary in their tendency to derive “simple” and “complex” Gricean inferences (Franke & Degen, 2016). Mayn & Demberg (2022) found that more complex RefGame inferencing was strongly associated with successful problem-solving in Raven’s Progressive Matrices (RPM). One shared mechanism which may underlie successful RPM and RefGame performance is the ability to use internal negative feedback to efficiently disengage from unsuccessful strategies: Stocco and colleagues’ (2021) ACT-R model captures an observed relationship between one’s negative feedback strength (Fneg) and RPM success. Based on the Stocco model, we present an ACT-R model of RefGame where success requires disengaging from strategies which fail to provide a high-quality (unique) answer. Our model stochastically applies interpretive strategies of varying complexity until arriving at a unique answer or else guesses after a timeout corresponding to individual persistence. Deriving complex inferences ultimately requires faster disengagement via stronger Fneg (or higher persistence). Our model generates several novel predictions: (a) RefGame performance should correlate with specific Fneg and persistence metrics, (b) more complex trials will take longer to solve, (c) incorrect responses will have shorter response times and (d) correct responses will more often demonstrate eye movements predicted for complex strategy execution.
This is an in-person presentation on July 21, 2024 (16:00 ~ 16:20 CEST).
Submitting author
Author