#
Formal analysis

Dr. Yiyun Shou

For some time modeling doubly-bounded random variables has been hindered by a scarcity of applicable distributions with finite densities at the bounds. Most of the useful distributions for these variables have finite density only on (0,1) or, at best, inconsistently on [0,1]. This talk presents a flexible family of 2- and 3-parameter distributions whose support is the closed interval [0,1] in the sense that they always have finite nonzero densities at 0 and at 1. These distributions have explicit density, cumulative density, and quantile functions, so they are well-suited for quantile regression. The densities at the boundaries are determined by dispersion and skew parameters, and a third parameter influences location. These distributions have a single mode in (0,1) but also can simultaneously have modes at 0 or at 1, or they can be U- or J-shaped. Some of them include the uniform distribution as a special case. Their location, dispersion, and skew parameters are easy to interpret and each of them can have a submodel with its own predictors. They have been implemented in packages for R and Stata.

Critical Path Networks are models of the Psychological Refractory Period and of some cognitive tasks, such as visual search. A Critical Path Network is a directed acyclic network in which each arc represents a process that must be completed to perform a task, The processes on a path must be executed in order on the path. Processes not on a path together are unordered, and can be executed simultaneously. Each process has a duration. The time to complete the task, the response time, is the sum of the durations of the processes on the longest path through the network. If a process X precedes a process Y, the slack from X to Y is the longest amount of time by which X can be prolonged without making Y start late. Suppose processes in a task are executed in a Critical Path Network, but the network is unknown. By observing effects on response time of selectively influencing processes, one can learn for each pair of processes whether the pair is ordered or unordered. If they are ordered, one can learn the value of the slack from one to the other. From the order information a directed acyclic network can be constructed with the Transitive Orientation Algorithm. From the slacks a duration can be determined for each process. Several directed acyclic networks may be possible and the durations are not unique. If the slack values are valid for one of the possible directed acyclic networks, they are valid for all.

Prolog is a classical logic programming language with many applications in expert systems, computer linguistics and traditional, that is, symbolic artificial intelligence. The main strength of Prolog is its concise representation of facts and rules for the representation of knowledge and grammar, as well as its very efficient built in search engine for closed world domains. R is a statistical programming language for data analysis and statistical modeling which is widely used in academia and industry. Besides the core library, a lot of packages have been developed for all kinds of statistical problems, including new-style artificial intelligence tools such as neural networks for machine learning and deep learning. Whereas Prolog is weak in statistical computation, but strong in symbolic manipulation, the converse may be said for the R language. The R package Rolog embeds the SWI-Prolog system into an R package, thus enabling deterministic and non-deterministic queries to the Prolog interpreter. Usage of the Rolog library is illustrated by a few examples, including grammars for mathematical typesetting, linguistics, knowledge structures, and interval arithmetic.

Cognitive architecture models can help the simulation and prediction of human performance in complicated human-machine systems. In the current work, we demonstrate a pilot model that can complete takeoff tasks. The model was constructed in Queueing Network-Adaptive Control of Thought Rational (QN-ACTR) cognitive architecture and can be connected to X-Plane to generate various statistics, including performance, mental effort, and situational awareness. The model outcomes are determined in combination with declarative knowledge, chunks, production rules, and a set of parameters. Currently, the model can simulate fly operation behavior similar to human pilots in various conditions. In the future, with additional refinement, we anticipate this model can assist interface evaluation and competency-based pilot training, giving a theory-based prediction method supplementary to human-in-the-loop investigations for research and development in the aviation industry.

Pablo Leon Villagra

Nick Chater

Prof. Adam Sanborn

Many computational approaches to cognition argue that people's decisions are based on examples drawn from memory. But what mechanism do people use to come up with those examples? In this work, we study how the mind generates these examples by asking participants to produce long sequences of items at random. Although previous random generation research has exclusively focused on uniform distributions, we find that people can generate items from more complex distributions (such as people's heights), while showcasing the same systematic deviations from true randomness. We propose that to produce new items, people employ an internal sampling algorithm like those used in computer science – algorithms which have previously been used to explain other features of human behavior such as how people reason with probabilities. We find that these algorithms approximate people's random sequences better than previous computational models. We then evaluate which different qualitative components of the sampling algorithms better emulate human behavior: We find that people's sequences are most similar to samplers that propose new states based on the gradient of the space (such as HMC) and which run several replicas at different temperatures (such as MC3). By identifying the algorithms used in random generation, our results may be used to create more accurate sequential sampling models of decision making that better reflect how evidence is accumulated.

Modelers are often faced with the dilemma of deciding how to parameterize a probability model. Too few parameters may yield a misspecified model with uninterpretable parameter estimates due to model misspecification, while too many parameters may yield a correctly specified model which fits the observed data with uninterpretable parameter estimates due to parameter redundancy. During the model development process, it is therefore likely that situations will arise where it is desirable to evaluate possibly misspecified or parameter redundant models. In the context of maximum likelihood estimation, it has been shown (see Ran and Hu, 2017, and Cole, 2020, for relevant reviews) that the presence of parameter redundancy corresponds to situations where the Fisher Information Matrix (FIM) (i.e., the covariance matrix of the log-likelihood per data record) does not have full rank. Local identifiability in maximum likelihood estimation often corresponds to checking if the Hessian of the log-likelihood (LL) has full rank (e.g., White, 1982, Theorem 3.1). Classical asymptotic theory (e.g., White, 1982) often assumes that both the FIM and LL Hessian are full rank in order to obtain analytic formulas for estimating parameter confidence intervals. In this presentation it is shown that analytic formulas for estimating confidence intervals for some but not all parameters can sometimes be obtained in the presence of parameter redundancy (i.e., without the assumption that the FIM has full rank). Some preliminary simulation studies are reported to illustrate the practical applications of the theoretical results.

Melike Baykal-Gursoy

Lone-actor (LA) terrorism has been one of the rising security threats of the last decade. The LA behavior and characteristics research has produced valuable information on demographics, classifications, and warning signs. Nonetheless, commonality among these characteristics does not imply similar outcomes for different attacks and the incident-scene behavior varies. Since the security footage videos of LA attacks are not publicly available, associating incident-scene behavior to the early and preparatory attacker behavior is a challenging research field. Serious games have been utilized to evaluate mitigation strategies to a natural disaster. At GRIST Lab at Rutgers University, we design virtual games to simulate real-world conditions to observe an attacker’s reaction to incident-scene dynamics. This study aims to identify short-term target and route selection decisions of the attacker through the data obtained from a virtual game; and in turn to develop better first responder allocation strategies against LA attacks. We implement time-series clustering and classification methods to the behavior differences between an attacker and other civilians based on spatio-temporal data. The findings indicate that these methods will be instrumental in developing LA detection and capture strategies.

Submitting author

Author