I am Guangyuan Jiang (姜广源), a final-year undergrad (‘25) at Yuanpei College, Peking University. I will be joining MIT Brain and Cognitive Sciences as a PhD student starting this fall.
Over the past year, I was a visiting student at the MIT Computational Cognitive Science Group and the Computational Psycholinguistics Lab. I am fortunate to be advised by Prof. Josh Tenenbaum and Prof. Roger Levy. At Peking University, I am affiliated with the PKU Cognitive Reasoning Lab, led by Prof. Yixin Zhu.
In general, I am interested in languages in and about the mind—the dynamic interplay between language, culture, and thought. I seek to build models of human language to uncover the core cognitive and computational principles that shape language, and to develop human-like machines to reverse-engineer the mind through language. On the machine side, my primary goal is to develop machines that can learn and reason in human-like ways, revealing the cultural origin of intelligence.
Selected Publications
-
Grapheme, sound, and meaning systematicity in logographic writing: A library learning approach
Preprint coming soon (email for draft), 2025
Writing systems are structured to depict the various facets of human language, from sounds to meanings. Chinese writing, as a logographic system, offers a distinctive opportunity to study the structural relationships between written forms and their sounds and meanings all at once. In this paper, we explore a computational model that can capture the compositional structure of Chinese characters and their relationship to sound and meaning. We develop a library learning framework with written-sound joint compression and distributional semantic meaning representations. Our new model discovers structural relations between a character’s logographic parts and its sounds, resembling the role of phonetic and semantic radicals in Chinese orthography. On the form-meaning systematicity side, the model reveals systematicity between the graphical structure and meaning of characters, capable of predicting meanings of holdout characters from their constituent logographic parts. Further, our library learning model enables us to study the historical changes in how written Chinese represented spoken language. We anticipate that our library learning model to be a unified computational account of writing’s interaction with multi-level structures of human language.
-
Rapid Word Learning Through Meta In-Context Learning
arXiv preprint arXiv:2502.14791, 2025
Humans can quickly learn a new word from a few illustrative examples, and then systematically and flexibly use it in novel contexts. Yet the abilities of current language models for few-shot word learning, and methods for improving these abilities, are underexplored. In this study, we introduce a novel method, Meta-training for IN-context learNing Of Words (Minnow). This method trains language models to generate new examples of a word’s usage given a few in-context examples, using a special placeholder token to represent the new word. This training is repeated on many new words to develop a general word-learning ability. We find that training models from scratch with Minnow on human-scale child-directed language enables strong few-shot word learning, comparable to a large language model (LLM) pre-trained on orders of magnitude more data. Furthermore, through discriminative and generative evaluations, we demonstrate that finetuning pre-trained LLMs with Minnow improves their ability to discriminate between new words, identify syntactic categories of new words, and generate reasonable new usages and definitions for new words, based on one or a few in-context examples. These findings highlight the data efficiency of Minnow and its potential to improve language model performance in word learning tasks.
-
Finding structure in logographic writing with library learning
(CogSci Sayan Gul Award for Best Undergrad Student Paper)
In CogSci, 2024 (Talk)
One hallmark of human language is its combinatoriality—reusing a relatively small inventory of building blocks to create a far larger inventory of increasingly complex structures.
In this paper, we explore the idea that combinatoriality in language reflects a human inductive bias toward representational efficiency in symbol systems. We develop a computational framework for discovering structure in a writing system. Built on top of state-of-the-art library learning and program synthesis techniques, our computational framework discovers known linguistic structures in the Chinese writing system and reveals how the system evolves towards simplification under pressures for representational efficiency. We demonstrate how a library learning approach, utilizing learned abstractions and compression, may help reveal the fundamental computational principles that underlie the creation of combinatorial structures in human cognition, and offer broader insights into the evolution of efficient communication systems.
-
MEWL: Few-shot multimodal word learning with referential uncertainty
In ICML, 2023
Without explicit feedback, humans can rapidly learn the meaning of words. Children can acquire a new word after just a few passive exposures, a process known as fast mapping. This word learning capability is believed to be the most fundamental building block of multimodal understanding and reasoning. Despite recent advancements in multimodal learning, a systematic and rigorous evaluation is still missing for human-like word learning in machines. To fill in this gap, we introduce the MachinE Word Learning (MEWL) benchmark to assess how machines learn word meaning in grounded visual scenes. MEWL covers human’s core cognitive toolkits in word learning: cross-situational reasoning, bootstrapping, and pragmatic learning. Specifically, MEWL is a few-shot benchmark suite consisting of nine tasks for probing various word learning capabilities. These tasks are carefully designed to be aligned with the children’s core abilities in word learning and echo the theories in the developmental literature. By evaluating multimodal and unimodal agents’ performance with a comparative analysis of human performance, we notice a sharp divergence in human and machine word learning. We further discuss these differences between humans and machines and call for human-like few-shot word learning in machines.
-
Evaluating and Inducing Personality in Pre-trained Language Models
In NeurIPS, 2023 (Spotlight)
Standardized and quantified evaluation of machine behaviors is a crux of understanding LLMs. In this study, we draw inspiration from psychometric studies by leveraging human personality theory as a tool for studying machine behaviors. Originating as a philosophical quest for human behaviors, the study of personality delves into how individuals differ in thinking, feeling, and behaving. Toward building and understanding human-like social machines, we are motivated to ask: Can we assess machine behaviors by leveraging human psychometric tests in a principled and quantitative manner? If so, can we induce a specific personality in LLMs? To answer these questions, we introduce the Machine Personality Inventory (MPI) tool for studying machine behaviors; MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories. By systematically evaluating LLMs with MPI, we provide the first piece of evidence demonstrating the efficacy of MPI in studying LLMs behaviors. We further devise a Personality Prompting (P^2) method to induce LLMs with specific personalities in a controllable way, capable of producing diverse and verifiable behaviors. We hope this work sheds light on future studies by adopting personality as the essential indicator for various downstream tasks, and could further motivate research into equally intriguing human-like machine behaviors.
-
Interactive Visual Reasoning under Uncertainty
In NeurIPS Datasets and Benchmarks, 2023
One of the fundamental cognitive abilities of humans is to quickly resolve uncertainty by generating hypotheses and testing them via active trials. Encountering a novel phenomenon accompanied by ambiguous cause-effect relationships, humans make hypotheses against data, conduct inferences from observation, test their theory via experimentation, and correct the proposition if inconsistency arises. These iterative processes persist until the underlying mechanism becomes clear. In this work, we devise the IVRE (pronounced as "ivory") environment for evaluating artificial agents’ reasoning ability under uncertainty. IVRE is an interactive environment featuring rich scenarios centered around Blicket detection. Agents in IVRE are placed into environments with various ambiguous action-effect pairs and asked to determine each object’s role. They are encouraged to propose effective and efficient experiments to validate their hypotheses based on observations and actively gather new information. The game ends when all uncertainties are resolved or the maximum number of trials is consumed. By evaluating modern artificial agents in IVRE, we notice a clear failure of today’s learning methods compared to humans. Such inefficacy in interactive reasoning ability under uncertainty calls for future research in building human-like intelligence.