Research

To think inside the box, or to think out of the box? Scientific discovery via the reciprocation of insights and concepts

Key Words: Scientific Discovery, Insight Problem Solving, Semantics Landscape.

If scientific discovery is one of the main driving forces of human progress, insight is the fuel for the engine, which has long attracted behavior-level research to understand and model its underlying cognitive process. However, current tasks that abstract scientific discovery mostly focus on the emergence of insight, ignoring the special role played by domain knowledge. In this concept paper, we view scientific discovery as an interplay between thinking out of the box that actively seeks insightful solutions and thinking inside the box that generalizes on conceptual domain knowledge to keep correct. Accordingly, we propose Mindle, a semantic searching game that triggers scientific-discovery-like thinking spontaneously, as infrastructure for exploring scientific discovery on a large scale. On this basis, the meta-strategies for insights and the usage of concepts can be investigated reciprocally. In the pilot studies, several interesting observations inspire elaborated hypotheses on meta-strategies, context, and individual diversity for further investigations.

Why perspective matters? On the quest of computational human-centered metascience

Key Words: Metascience, Logology (Study of Science), Introspective Studies.

Metascience helps improve scientific research by not only synthesizing research on a topic but also elaborating the common sense for shaping an appropriate mindset to study the topic. Though the end-users of metascience are scientists, user-centered considerations have been under-researched. Modeling how scientists work with science is possible, given the great progress on computational modeling of human cognitive progress. Hence, some major obstacles for scientists, i.e., proposing perspectives and hypotheses, or capturing global information and local details simultaneously, can be partly automated and transparentized, improving the reliability and reproducibility of research. This concept paper introduces computational human-centered metascience through the most elementary research skill---framing a perspective with reviewing the literature. This problem can be decomposed to computational, algorithmic, and implementational levels, regarding the macro view how perspective communicate with each other, how to frame perspectives automatically, and how to scientists understand and present perspectives. This case on perspective modeling goes over the major components to consider human-centered metascience. On this basis, such introspective understanding can be generalized to identify different mindsets of scientific research. We further review the divergence of perspectives and the convergence of common grounds in the development of science to enhance the value of introspective studies.

On the Complexity of Bayesian Generalization

Key Words: Bayesian Generalization, Word Learning, Rational Analysis, Natural Image Statistics, Algorithmic Information Theory.

We consider concept generalization at a large scale in the diverse and natural visual spectrum. Established computational modes (i.e., rule-based or similarity-based) are primarily studied isolated and focus on confined and abstract problem spaces. In this work, we study these two modes when the problem space scales up, and the complexity of concepts becomes diverse. Specifically, at the representational level, we seek to answer how the complexity varies when a visual concept is mapped to the representation space. Prior psychology literature has shown that two types of complexities (i.e., subjective complexity and visual complexity) (Griffiths and Tenenbaum, 2003) build an inverted-U relation (Donderi, 2006; Sun and Firestone, 2021). Leveraging Representativeness of Attribute (RoA), we computationally confirm the following observation: Models use attributes with high RoA to describe visual concepts, and the description length falls in an inverted-U relation with the increment in visual complexity. At the computational level, we aim to answer how the complexity of representation affects the shift between the rule- and similarity-based generalization. We hypothesize that category-conditioned visual modeling estimates the co-occurrence frequency between visual and categorical attributes, thus potentially serving as the prior for the natural visual world. Experimental results show that representations with relatively high subjective complexity outperform those with relatively low subjective complexity in the rule-based generalization, while the trend is the opposite in the similarity-based generalization.

Semantics emerge from solving problems given abstract prior

Key Words: Prior for Problem Solving, Rationality, Bayesian Inference, Problem Representation, Constrained Problem Solving.

Evidences from cognitive and developmental psychology support that people solve unseen problems with the help of high-level strategies according to prior knowledge that is agnostic to problem context. Specifically, we argue that people construct such strategies by assigning semantics to the elements in the problem to make connection with prior experience. Such assignment can be viewed as Bayesian inference given goals and constraints as prior, which implies the diversity and the convergence of strategies used by people. To understand and model the capability of people, we propose the ProbSol Worlds (ProbSol) environment---the world is driven by the same dynamics but can be configured as different tasks, such as tool use, causal inference, and sketching. Equipped with magnetism-based dynamics, ProbSol alleviates the confounding variables of prior semantics brought by conventional physically-grounded problem solving tasks. Hence, the environment-agnostic prior of goals and constraints can be disentangled from problem solving. We show the potential of ProbSol for carrying out large-scale behavioral studies and benchmarking computational models to probe people's and machine's diverse understanding of goals and constraints in problem solving.

AutoBio Programming Language

Key Words: Semi-auto Programming Language Design, Program Library Evloving, Program Synthesis.

We are building a framework that (1) generates content-centered instruction set assembly (ISA) proposals via statistical features from the natural corpus and refine the ISA by integrating domain expert knowledge in a human-in-the-loop fashion; and (2) supports human users to indicate their preference on instructions to form a human-centered program library. The framework works in the domain of biological experiment protocols, and has the potential to be applied to arbitrary domains, for helping design domain-specific programming languages both for human usage and for program synthesis.

AutoBio ISA Manual

Human-Level Abuctive Learning and Planning

Key Words: Neural-Symbolic Learning, Abductive Reasoning, Qualitative Simulation, Inverse Planning.

Motivation: A novice Minecraft player can explore the world well from only visual observation given very little guidance thanks to the rich background knowledge (or commonsense). Can a machine learner do the same?

Task: I prepare several first-person video sequences recording a rational human player playing Minecraft and define some logic rules representing the human commonsense, e.g. Perspective Relationship.

Method: The learner models the agent operations following the idea of Qualitative Simulation, which describes the motion not step-to-step but in a high-level state-to-state way, yielding greater representative power. It exploits a Transformer model to translate the video sequence to a sequence of motion state transition signals. Then the learner tries to inference subgoals of the player by abducting an interpretation from motion observations and background knowledge. Finally, the learner generates logic programs representing the strategies learned from human players.

Project

Interpretative Neural Feature Primitives for Image Classification

Key Words: Key Words: Explanability of Neural Networks, Abductive Reasoning, Visual Association.

Motivation: Human can recognize instances by executing association between different visual concepts using simple features such as shape, texture, color or symbol. I call these simple explanable features primitives. Can we train a neural network that detects such feature primmitives, and has compositional generalization ability over feature primitives?

Toy Data: We construct a toy image dataset with objects in diverse shapes, textures, colors, and with different symbols on them, using C4D.

Method: We train 4 CNN classifiers, one in each simple feature family. For a particular sample, we order the other samples by calculating their similarities to it. Experiments show that our method extremely reduces the open-world risk for novel instances and we can link them with known classes by applying association over some features. We couldn't go further due to the lack of computing resource.

Project

Abductive Novel Object Invention for Incremental Learning

Key Words: Neural-Symbolic Learning, Probabilistic CFG, Expectation Maximization.

Motivation: Can the learner detect instances belonging to classes that never been seen nor known before from raw data and label them autonomously with the help of background knowledge?

Task: The learner starts with a CNN classifier that recognizes hand-written symbols \texttt{0, 1, 2, +, -, =} and background knowledge about successor relation.

Method: We feed the learner with image sequences like \texttt{0+++=3} and it tries to estimate the distribution of novel objects by both the classification score and logical consistency of PCFG. Then the learner abducts an atomic predicate representing the relationship between the novel class and known classes, e.g. \texttt{new\_digit(X):-succ(2,X)}. It labels the novel instances with the predicates and update the classifier.

Project