Search | arXiv e-print repository

PhyloLM : Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks

Authors: Nicolas Yax, Pierre-Yves Oudeyer, Stefano Palminteri

Abstract: This paper introduces PhyloLM, a method adapting phylogenetic algorithms to Large Language Models (LLMs) to explore whether and how they relate to each other and to predict their performance characteristics. Our method calculates a phylogenetic distance metrics based on the similarity of LLMs' output. The resulting metric is then used to construct dendrograms, which satisfactorily capture known re… ▽ More This paper introduces PhyloLM, a method adapting phylogenetic algorithms to Large Language Models (LLMs) to explore whether and how they relate to each other and to predict their performance characteristics. Our method calculates a phylogenetic distance metrics based on the similarity of LLMs' output. The resulting metric is then used to construct dendrograms, which satisfactorily capture known relationships across a set of 111 open-source and 45 closed models. Furthermore, our phylogenetic distance predicts performance in standard benchmarks, thus demonstrating its functional validity and paving the way for a time and cost-effective estimation of LLM capabilities. To sum up, by translating population genetic concepts to machine learning, we propose and validate a tool to evaluate LLM development, relationships and capabilities, even in the absence of transparent training information. △ Less

Submitted 16 June, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

arXiv:2403.08882 [pdf, other]

Cultural evolution in populations of Large Language Models

Authors: Jérémy Perez, Corentin Léger, Marcela Ovando-Tellez, Chris Foulon, Joan Dussauld, Pierre-Yves Oudeyer, Clément Moulin-Frier

Abstract: Research in cultural evolution aims at providing causal explanations for the change of culture over time. Over the past decades, this field has generated an important body of knowledge, using experimental, historical, and computational methods. While computational models have been very successful at generating testable hypotheses about the effects of several factors, such as population structure o… ▽ More Research in cultural evolution aims at providing causal explanations for the change of culture over time. Over the past decades, this field has generated an important body of knowledge, using experimental, historical, and computational methods. While computational models have been very successful at generating testable hypotheses about the effects of several factors, such as population structure or transmission biases, some phenomena have so far been more complex to capture using agent-based and formal models. This is in particular the case for the effect of the transformations of social information induced by evolved cognitive mechanisms. We here propose that leveraging the capacity of Large Language Models (LLMs) to mimic human behavior may be fruitful to address this gap. On top of being an useful approximation of human cultural dynamics, multi-agents models featuring generative agents are also important to study for their own sake. Indeed, as artificial agents are bound to participate more and more to the evolution of culture, it is crucial to better understand the dynamics of machine-generated cultural evolution. We here present a framework for simulating cultural evolution in populations of LLMs, allowing the manipulation of variables known to be important in cultural evolution, such as network structure, personality, and the way social information is aggregated and transformed. The software we developed for conducting these simulations is open-source and features an intuitive user-interface, which we hope will help to build bridges between the fields of cultural evolution and generative artificial intelligence. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 17 pages, 20 figures. Open-source code available at https://github.com/jeremyperez2/LLM-Culture

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2403.08397 [pdf, other]

doi 10.1145/3585088.3593880

Interactive environments for training children's curiosity through the practice of metacognitive skills: a pilot study

Authors: Rania Abdelghani, Edith Law, Chloé Desvaux, Pierre-Yves Oudeyer, Hélène Sauzéon

Abstract: Curiosity-driven learning has shown significant positive effects on students' learning experiences and outcomes. But despite this importance, reports show that children lack this skill, especially in formal educational settings. To address this challenge, we propose an 8-session workshop that aims to enhance children's curiosity through training a set of specific metacognitive skills we hypothesiz… ▽ More Curiosity-driven learning has shown significant positive effects on students' learning experiences and outcomes. But despite this importance, reports show that children lack this skill, especially in formal educational settings. To address this challenge, we propose an 8-session workshop that aims to enhance children's curiosity through training a set of specific metacognitive skills we hypothesize are involved in its process. Our workshop contains animated videos presenting declarative knowledge about curiosity and the said metacognitive skills as well as practice sessions to apply these skills during a reading-comprehension task, using a web platform designed for this study (e.g. expressing uncertainty, formulating questions, etc). We conduct a pilot study with 15 primary school students, aged between 8 and 10. Our first results show a positive impact on children's metacognitive efficiency and their ability to express their curiosity through question-asking behaviors. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2402.14846 [pdf, other]

Stick to Your Role! Context-dependence and Stability of Personal Value Expression in Large Language Models

Authors: Grgur Kovač, Rémy Portelas, Masataka Sawayama, Peter Ford Dominey, Pierre-Yves Oudeyer

Abstract: The standard way to study Large Language Models (LLMs) with benchmarks or psychology questionnaires is to provide many different queries from similar minimal contexts (e.g. multiple choice questions). However, due to LLMs' highly context-dependent nature, conclusions from such minimal-context evaluations may be little informative about the model's behavior in deployment (where it will be exposed t… ▽ More The standard way to study Large Language Models (LLMs) with benchmarks or psychology questionnaires is to provide many different queries from similar minimal contexts (e.g. multiple choice questions). However, due to LLMs' highly context-dependent nature, conclusions from such minimal-context evaluations may be little informative about the model's behavior in deployment (where it will be exposed to many new contexts). We argue that context-dependence (specifically, value stability) should be studied a specific property of LLMs and used as another dimension of LLM comparison (alongside others such as cognitive abilities, knowledge, or model size). We present a case-study on the stability of value expression over different contexts (simulated conversations on different topics) as measured using a standard psychology questionnaire (PVQ) and on behavioral downstream tasks. Reusing methods from psychology, we study Rank-order stability on the population (interpersonal) level, and Ipsative stability on the individual (intrapersonal) level. We consider two settings (with and without instructing LLMs to simulate particular personas), two simulated populations, and three downstream tasks. We observe consistent trends in the stability of models and model families - Mixtral, Mistral, GPT-3.5 and Qwen families are more stable than LLaMa-2 and Phi. The consistency of these trends implies that some models exhibit higher value-stability than others, and that value stability can be estimated with the set of introduced methodological tools. When instructed to simulate particular personas, LLMs exhibit low Rank-Order stability, which further diminishes with conversation length. This highlights the need for future research on LLMs that coherently simulate different personas. This paper provides a foundational step in that direction, and, to our knowledge, it is the first study of value stability in LLMs. △ Less

Submitted 30 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: The project website and code are available at https://sites.google.com/view/llmvaluestability

MSC Class: 68T07 ACM Class: I.2.7

arXiv:2402.10236 [pdf, other]

Discovering Sensorimotor Agency in Cellular Automata using Diversity Search

Authors: Gautier Hamon, Mayalen Etcheverry, Bert Wang-Chak Chan, Clément Moulin-Frier, Pierre-Yves Oudeyer

Abstract: The research field of Artificial Life studies how life-like phenomena such as autopoiesis, agency, or self-regulation can self-organize in computer simulations. In cellular automata (CA), a key open-question has been whether it it is possible to find environment rules that self-organize robust "individuals" from an initial state with no prior existence of things like "bodies", "brain", "perception… ▽ More The research field of Artificial Life studies how life-like phenomena such as autopoiesis, agency, or self-regulation can self-organize in computer simulations. In cellular automata (CA), a key open-question has been whether it it is possible to find environment rules that self-organize robust "individuals" from an initial state with no prior existence of things like "bodies", "brain", "perception" or "action". In this paper, we leverage recent advances in machine learning, combining algorithms for diversity search, curriculum learning and gradient descent, to automate the search of such "individuals", i.e. localized structures that move around with the ability to react in a coherent manner to external obstacles and maintain their integrity, hence primitive forms of sensorimotor agency. We show that this approach enables to find systematically environmental conditions in CA leading to self-organization of such basic forms of agency. Through multiple experiments, we show that the discovered agents have surprisingly robust capabilities to move, maintain their body integrity and navigate among various obstacles. They also show strong generalization abilities, with robustness to changes of scale, random updates or perturbations from the environment not seen during training. We discuss how this approach opens new perspectives in AI and synthetic bioengineering. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.01669 [pdf, other]

Improved Performances and Motivation in Intelligent Tutoring Systems: Combining Machine Learning and Learner Choice

Authors: Benjamin Clément, Hélène Sauzéon, Didier Roy, Pierre-Yves Oudeyer

Abstract: Large class sizes pose challenges to personalized learning in schools, which educational technologies, especially intelligent tutoring systems (ITS), aim to address. In this context, the ZPDES algorithm, based on the Learning Progress Hypothesis (LPH) and multi-armed bandit machine learning techniques, sequences exercises that maximize learning progress (LP). This algorithm was previously shown in… ▽ More Large class sizes pose challenges to personalized learning in schools, which educational technologies, especially intelligent tutoring systems (ITS), aim to address. In this context, the ZPDES algorithm, based on the Learning Progress Hypothesis (LPH) and multi-armed bandit machine learning techniques, sequences exercises that maximize learning progress (LP). This algorithm was previously shown in field studies to boost learning performances for a wider diversity of students compared to a hand-designed curriculum. However, its motivational impact was not assessed. Also, ZPDES did not allow students to express choices. This limitation in agency is at odds with the LPH theory concerned with modeling curiosity-driven learning. We here study how the introduction of such choice possibilities impact both learning efficiency and motivation. The given choice concerns dimensions that are orthogonal to exercise difficulty, acting as a playful feature. In an extensive field study (265 7-8 years old children, RCT design), we compare systems based either on ZPDES or a hand-designed curriculum, both with and without self-choice. We first show that ZPDES improves learning performance and produces a positive and motivating learning experience. We then show that the addition of choice triggers intrinsic motivation and reinforces the learning effectiveness of the LP-based personalization. In doing so, it strengthens the links between intrinsic motivation and performance progress during the serious game. Conversely, deleterious effects of the playful feature are observed for hand-designed linear paths. Thus, the intrinsic motivation elicited by a playful feature is beneficial only if the curriculum personalization is effective for the learner. Such a result deserves great attention due to increased use of playful features in non adaptive educational technologies. △ Less

Submitted 16 January, 2024; originally announced February 2024.

Comments: 29 pages, 37 figures

ACM Class: I.2.1; I.2.6

arXiv:2312.00455 [pdf]

Meta-Diversity Search in Complex Systems, A Recipe for Artificial Open-Endedness ?

Authors: Mayalen Etcheverry, Bert Wang-Chak Chan, Clément Moulin-Frier, Pierre-Yves Oudeyer

Abstract: Can we build an artificial system that would be able to generate endless surprises if ran "forever" in Minecraft? While there is not a single path toward solving that grand challenge, this article presents what we believe to be some working ingredients for the endless generation of novel increasingly complex artifacts in Minecraft. Our framework for an open-ended system includes two components: a… ▽ More Can we build an artificial system that would be able to generate endless surprises if ran "forever" in Minecraft? While there is not a single path toward solving that grand challenge, this article presents what we believe to be some working ingredients for the endless generation of novel increasingly complex artifacts in Minecraft. Our framework for an open-ended system includes two components: a complex system used to recursively grow and complexify artifacts over time, and a discovery algorithm that leverages the concept of meta-diversity search. Since complex systems have shown to enable the emergence of considerable complexity from set of simple rules, we believe them to be great candidates to generate all sort of artifacts in Minecraft. Yet, the space of possible artifacts that can be generated by these systems is often unknown, challenging to characterize and explore. Therefore automating the long-term discovery of novel and increasingly complex artifacts in these systems is an exciting research field. To approach these challenges, we formulate the problem of meta-diversity search where an artificial "discovery assistant" incrementally learns a diverse set of representations to characterize behaviors and searches to discover diverse patterns within each of them. A successful discovery assistant should continuously seek for novel sources of diversities while being able to quickly specialize the search toward a new unknown type of diversity. To implement those ideas in the Minecraft environment, we simulate an artificial "chemistry" system based on Lenia continuous cellular automaton for generating artifacts, as well as an artificial "discovery assistant" (called Holmes) for the artifact-discovery process. Holmes incrementally learns a hierarchy of modular representations to characterize divergent sources of diversity and uses a goal-based intrinsically-motivated exploration as the diversity search strategy. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2311.11388 [pdf]

doi 10.1038/s41562-023-01742-2

Machine Culture

Authors: Levin Brinkmann, Fabian Baumann, Jean-François Bonnefon, Maxime Derex, Thomas F. Müller, Anne-Marie Nussberger, Agnieszka Czaplicka, Alberto Acerbi, Thomas L. Griffiths, Joseph Henrich, Joel Z. Leibo, Richard McElreath, Pierre-Yves Oudeyer, Jonathan Stray, Iyad Rahwan

Abstract: The ability of humans to create and disseminate culture is often credited as the single most important factor of our success as a species. In this Perspective, we explore the notion of machine culture, culture mediated or generated by machines. We argue that intelligent machines simultaneously transform the cultural evolutionary processes of variation, transmission, and selection. Recommender algo… ▽ More The ability of humans to create and disseminate culture is often credited as the single most important factor of our success as a species. In this Perspective, we explore the notion of machine culture, culture mediated or generated by machines. We argue that intelligent machines simultaneously transform the cultural evolutionary processes of variation, transmission, and selection. Recommender algorithms are altering social learning dynamics. Chatbots are forming a new mode of cultural transmission, serving as cultural models. Furthermore, intelligent machines are evolving as contributors in generating cultural traits--from game strategies and visual art to scientific results. We provide a conceptual framework for studying the present and anticipated future impact of machines on cultural evolution, and present a research agenda for the study of machine culture. △ Less

Submitted 22 November, 2023; v1 submitted 19 November, 2023; originally announced November 2023.

Journal ref: Nat Hum Behav 7, 1855-1868 (2023)

arXiv:2311.00344 [pdf, other]

A Definition of Open-Ended Learning Problems for Goal-Conditioned Agents

Authors: Olivier Sigaud, Gianluca Baldassarre, Cedric Colas, Stephane Doncieux, Richard Duro, Pierre-Yves Oudeyer, Nicolas Perrin-Gilbert, Vieri Giuliano Santucci

Abstract: A lot of recent machine learning research papers have ``open-ended learning'' in their title. But very few of them attempt to define what they mean when using the term. Even worse, when looking more closely there seems to be no consensus on what distinguishes open-ended learning from related concepts such as continual learning, lifelong learning or autotelic learning. In this paper, we contribute… ▽ More A lot of recent machine learning research papers have ``open-ended learning'' in their title. But very few of them attempt to define what they mean when using the term. Even worse, when looking more closely there seems to be no consensus on what distinguishes open-ended learning from related concepts such as continual learning, lifelong learning or autotelic learning. In this paper, we contribute to fixing this situation. After illustrating the genealogy of the concept and more recent perspectives about what it truly means, we outline that open-ended learning is generally conceived as a composite notion encompassing a set of diverse properties. In contrast with previous approaches, we propose to isolate a key elementary property of open-ended processes, which is to produce elements from time to time (e.g., observations, options, reward functions, and goals), over an infinite horizon, that are considered novel from an observer's perspective. From there, we build the notion of open-ended learning problems and focus in particular on the subset of open-ended goal-conditioned reinforcement learning problems in which agents can learn a growing repertoire of goal-driven skills. Finally, we highlight the work that remains to be performed to fill the gap between our elementary definition and the more involved notions of open-ended learning that developmental AI researchers may have in mind. △ Less

Submitted 7 June, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.10692 [pdf, other]

ACES: Generating Diverse Programming Puzzles with with Autotelic Generative Models

Authors: Julien Pourcel, Cédric Colas, Gaia Molinaro, Pierre-Yves Oudeyer, Laetitia Teodorescu

Abstract: The ability to invent novel and interesting problems is a remarkable feature of human intelligence that drives innovation, art, and science. We propose a method that aims to automate this process by harnessing the power of state-of-the-art generative models to produce a diversity of challenging yet solvable problems, here in the context of Python programming puzzles. Inspired by the intrinsically… ▽ More The ability to invent novel and interesting problems is a remarkable feature of human intelligence that drives innovation, art, and science. We propose a method that aims to automate this process by harnessing the power of state-of-the-art generative models to produce a diversity of challenging yet solvable problems, here in the context of Python programming puzzles. Inspired by the intrinsically motivated literature, Autotelic CodE Search (ACES) jointly optimizes for the diversity and difficulty of generated problems. We represent problems in a space of LLM-generated semantic descriptors describing the programming skills required to solve them (e.g. string manipulation, dynamic programming, etc.) and measure their difficulty empirically as a linearly decreasing function of the success rate of Llama-3-70B, a state-of-the-art LLM problem solver. ACES iteratively prompts a large language model to generate difficult problems achieving a diversity of target semantic descriptors (goal-directed exploration) using previously generated problems as in-context examples. ACES generates problems that are more diverse and more challenging than problems produced by baseline methods and three times more challenging than problems found in existing Python programming benchmarks on average across 11 state-of-the-art code LLMs. △ Less

Submitted 29 May, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

arXiv:2310.03192 [pdf, other]

Generative AI in the Classroom: Can Students Remain Active Learners?

Authors: Rania Abdelghani, Hélène Sauzéon, Pierre-Yves Oudeyer

Abstract: Generative Artificial Intelligence (GAI) can be seen as a double-edged weapon in education. Indeed, it may provide personalized, interactive and empowering pedagogical sequences that could favor students' intrinsic motivation, active engagement and help them have more control over their learning. But at the same time, other GAI properties such as the lack of uncertainty signalling even in cases of… ▽ More Generative Artificial Intelligence (GAI) can be seen as a double-edged weapon in education. Indeed, it may provide personalized, interactive and empowering pedagogical sequences that could favor students' intrinsic motivation, active engagement and help them have more control over their learning. But at the same time, other GAI properties such as the lack of uncertainty signalling even in cases of failure (particularly with Large Language Models (LLMs)) could lead to opposite effects, e.g. over-estimation of one's own competencies, passiveness, loss of curious and critical-thinking sense, etc. These negative effects are due in particular to the lack of a pedagogical stance in these models' behaviors. Indeed, as opposed to standard pedagogical activities, GAI systems are often designed to answers users' inquiries easily and conveniently, without asking them to make an effort, and without focusing on their learning process and/or outcomes. This article starts by outlining some of these opportunities and challenges surrounding the use of GAI in education, with a focus on the effects on students' active learning strategies and related metacognitive skills. Then, we present a framework for introducing pedagogical transparency in GAI-based educational applications. This framework presents 1) training methods to include pedagogical principles in the models, 2) methods to ensure controlled and pedagogically-relevant interactions when designing activities with GAI and 3) educational methods enabling students to acquire the relevant skills to properly benefit from the use of GAI in their learning activities (meta-cognitive skills, GAI litteracy). △ Less

Submitted 10 November, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

arXiv:2307.08452 [pdf, other]

SBMLtoODEjax: Efficient Simulation and Optimization of Biological Network Models in JAX

Authors: Mayalen Etcheverry, Michael Levin, Clément Moulin-Frier, Pierre-Yves Oudeyer

Abstract: Advances in bioengineering and biomedicine demand a deep understanding of the dynamic behavior of biological systems, ranging from protein pathways to complex cellular processes. Biological networks like gene regulatory networks and protein pathways are key drivers of embryogenesis and physiological processes. Comprehending their diverse behaviors is essential for tackling diseases, including canc… ▽ More Advances in bioengineering and biomedicine demand a deep understanding of the dynamic behavior of biological systems, ranging from protein pathways to complex cellular processes. Biological networks like gene regulatory networks and protein pathways are key drivers of embryogenesis and physiological processes. Comprehending their diverse behaviors is essential for tackling diseases, including cancer, as well as for engineering novel biological constructs. Despite the availability of extensive mathematical models represented in Systems Biology Markup Language (SBML), researchers face significant challenges in exploring the full spectrum of behaviors and optimizing interventions to efficiently shape those behaviors. Existing tools designed for simulation of biological network models are not tailored to facilitate interventions on network dynamics nor to facilitate automated discovery. Leveraging recent developments in machine learning (ML), this paper introduces SBMLtoODEjax, a lightweight library designed to seamlessly integrate SBML models with ML-supported pipelines, powered by JAX. SBMLtoODEjax facilitates the reuse and customization of SBML-based models, harnessing JAX's capabilities for efficient parallel simulations and optimization, with the aim to accelerate research in biological network analysis. △ Less

Submitted 29 October, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

arXiv:2307.07871 [pdf, other]

The SocialAI School: Insights from Developmental Psychology Towards Artificial Socio-Cultural Agents

Authors: Grgur Kovač, Rémy Portelas, Peter Ford Dominey, Pierre-Yves Oudeyer

Abstract: Developmental psychologists have long-established the importance of socio-cognitive abilities in human intelligence. These abilities enable us to enter, participate and benefit from human culture. AI research on social interactive agents mostly concerns the emergence of culture in a multi-agent setting (often without a strong grounding in developmental psychology). We argue that AI research should… ▽ More Developmental psychologists have long-established the importance of socio-cognitive abilities in human intelligence. These abilities enable us to enter, participate and benefit from human culture. AI research on social interactive agents mostly concerns the emergence of culture in a multi-agent setting (often without a strong grounding in developmental psychology). We argue that AI research should be informed by psychology and study socio-cognitive abilities enabling to enter a culture too. We discuss the theories of Michael Tomasello and Jerome Bruner to introduce some of their concepts to AI and outline key concepts and socio-cognitive abilities. We present The SocialAI school - a tool including a customizable parameterized uite of procedurally generated environments, which simplifies conducting experiments regarding those concepts. We show examples of such experiments with RL agents and Large Language Models. The main motivation of this work is to engage the AI community around the problem of social intelligence informed by developmental psychology, and to provide a tool to simplify first steps in this direction. Refer to the project website for code and additional information: https://sites.google.com/view/socialai-school. △ Less

Submitted 23 November, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

Comments: Preprint, see v1 for a shorter version (accepted at the "Workshop on Theory-of-Mind" at ICML 2023) See project website for demo and code: https://sites.google.com/view/socialai-school

MSC Class: 68T07 ACM Class: I.2.0

arXiv:2307.07870 [pdf, other]

Large Language Models as Superpositions of Cultural Perspectives

Authors: Grgur Kovač, Masataka Sawayama, Rémy Portelas, Cédric Colas, Peter Ford Dominey, Pierre-Yves Oudeyer

Abstract: Large Language Models (LLMs) are often misleadingly recognized as having a personality or a set of values. We argue that an LLM can be seen as a superposition of perspectives with different values and personality traits. LLMs exhibit context-dependent values and personality traits that change based on the induced perspective (as opposed to humans, who tend to have more coherent values and personal… ▽ More Large Language Models (LLMs) are often misleadingly recognized as having a personality or a set of values. We argue that an LLM can be seen as a superposition of perspectives with different values and personality traits. LLMs exhibit context-dependent values and personality traits that change based on the induced perspective (as opposed to humans, who tend to have more coherent values and personality traits across contexts). We introduce the concept of perspective controllability, which refers to a model's affordance to adopt various perspectives with differing values and personality traits. In our experiments, we use questionnaires from psychology (PVQ, VSM, IPIP) to study how exhibited values and personality traits change based on different perspectives. Through qualitative experiments, we show that LLMs express different values when those are (implicitly or explicitly) implied in the prompt, and that LLMs express different values even when those are not obviously implied (demonstrating their context-dependent nature). We then conduct quantitative experiments to study the controllability of different models (GPT-4, GPT-3.5, OpenAssistant, StableVicuna, StableLM), the effectiveness of various methods for inducing perspectives, and the smoothness of the models' drivability. We conclude by examining the broader implications of our work and outline a variety of associated scientific questions. The project website is available at https://sites.google.com/view/llm-superpositions . △ Less

Submitted 7 November, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

Comments: Preprint

MSC Class: 68T07 ACM Class: I.2.7

arXiv:2305.12487 [pdf, other]

Augmenting Autotelic Agents with Large Language Models

Authors: Cédric Colas, Laetitia Teodorescu, Pierre-Yves Oudeyer, Xingdi Yuan, Marc-Alexandre Côté

Abstract: Humans learn to master open-ended repertoires of skills by imagining and practicing their own goals. This autotelic learning process, literally the pursuit of self-generated (auto) goals (telos), becomes more and more open-ended as the goals become more diverse, abstract and creative. The resulting exploration of the space of possible skills is supported by an inter-individual exploration: goal re… ▽ More Humans learn to master open-ended repertoires of skills by imagining and practicing their own goals. This autotelic learning process, literally the pursuit of self-generated (auto) goals (telos), becomes more and more open-ended as the goals become more diverse, abstract and creative. The resulting exploration of the space of possible skills is supported by an inter-individual exploration: goal representations are culturally evolved and transmitted across individuals, in particular using language. Current artificial agents mostly rely on predefined goal representations corresponding to goal spaces that are either bounded (e.g. list of instructions), or unbounded (e.g. the space of possible visual inputs) but are rarely endowed with the ability to reshape their goal representations, to form new abstractions or to imagine creative goals. In this paper, we introduce a language model augmented autotelic agent (LMA3) that leverages a pretrained language model (LM) to support the representation, generation and learning of diverse, abstract, human-relevant goals. The LM is used as an imperfect model of human cultural transmission; an attempt to capture aspects of humans' common-sense, intuitive physics and overall interests. Specifically, it supports three key components of the autotelic architecture: 1)~a relabeler that describes the goals achieved in the agent's trajectories, 2)~a goal generator that suggests new high-level goals along with their decomposition into subgoals the agent already masters, and 3)~reward functions for each of these goals. Without relying on any hand-coded goal representations, reward functions or curriculum, we show that LMA3 agents learn to master a large diversity of skills in a task-agnostic text-based environment. △ Less

Submitted 21 May, 2023; originally announced May 2023.

arXiv:2304.10548 [pdf, other]

doi 10.1145/3581754.3584136

Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding

Authors: Ziang Xiao, Xingdi Yuan, Q. Vera Liao, Rania Abdelghani, Pierre-Yves Oudeyer

Abstract: Qualitative analysis of textual contents unpacks rich and valuable information by assigning labels to the data. However, this process is often labor-intensive, particularly when working with large datasets. While recent AI-based tools demonstrate utility, researchers may not have readily available AI resources and expertise, let alone be challenged by the limited generalizability of those task-spe… ▽ More Qualitative analysis of textual contents unpacks rich and valuable information by assigning labels to the data. However, this process is often labor-intensive, particularly when working with large datasets. While recent AI-based tools demonstrate utility, researchers may not have readily available AI resources and expertise, let alone be challenged by the limited generalizability of those task-specific models. In this study, we explored the use of large language models (LLMs) in supporting deductive coding, a major category of qualitative analysis where researchers use pre-determined codebooks to label the data into a fixed set of codes. Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning. Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results. We lay out challenges and opportunities in using LLMs to support qualitative coding and beyond. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 28th International Conference on Intelligent User Interfaces (IUI '23 Companion), March 27--31, 2023, Sydney, NSW, Australia

arXiv:2302.05244 [pdf, other]

A Song of Ice and Fire: Analyzing Textual Autotelic Agents in ScienceWorld

Authors: Laetitia Teodorescu, Xingdi Yuan, Marc-Alexandre Côté, Pierre-Yves Oudeyer

Abstract: Building open-ended agents that can autonomously discover a diversity of behaviours is one of the long-standing goals of artificial intelligence. This challenge can be studied in the framework of autotelic RL agents, i.e. agents that learn by selecting and pursuing their own goals, self-organizing a learning curriculum. Recent work identified language as a key dimension of autotelic learning, in p… ▽ More Building open-ended agents that can autonomously discover a diversity of behaviours is one of the long-standing goals of artificial intelligence. This challenge can be studied in the framework of autotelic RL agents, i.e. agents that learn by selecting and pursuing their own goals, self-organizing a learning curriculum. Recent work identified language as a key dimension of autotelic learning, in particular because it enables abstract goal sampling and guidance from social peers for hindsight relabelling. Within this perspective, we study the following open scientific questions: What is the impact of hindsight feedback from a social peer (e.g. selective vs. exhaustive)? How can the agent learn from very rare language goal examples in its experience replay? How can multiple forms of exploration be combined, and take advantage of easier goals as stepping stones to reach harder ones? To address these questions, we use ScienceWorld, a textual environment with rich abstract and combinatorial physics. We show the importance of selectivity from the social peer's feedback; that experience replay needs to over-sample examples of rare goals; and that following self-generated goal sequences where the agent's competence is intermediate leads to significant improvements in final performance. △ Less

Submitted 24 February, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

Comments: In review at ICML 2023

arXiv:2302.02662 [pdf, other]

Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning

Authors: Thomas Carta, Clément Romac, Thomas Wolf, Sylvain Lamprier, Olivier Sigaud, Pierre-Yves Oudeyer

Abstract: Recent works successfully leveraged Large Language Models' (LLM) abilities to capture abstract knowledge about world's physics to solve decision-making problems. Yet, the alignment between LLMs' knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach (named GLAM) to achieve this alignment through functional grounding:… ▽ More Recent works successfully leveraged Large Language Models' (LLM) abilities to capture abstract knowledge about world's physics to solve decision-making problems. Yet, the alignment between LLMs' knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach (named GLAM) to achieve this alignment through functional grounding: we consider an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals. Using an interactive textual environment designed to study higher-level forms of functional grounding, and a set of spatial and navigation tasks, we study several scientific questions: 1) Can LLMs boost sample efficiency for online learning of various RL tasks? 2) How can it boost different forms of generalization? 3) What is the impact of online learning? We study these questions by functionally grounding several variants (size, architecture) of FLAN-T5. △ Less

Submitted 6 September, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

Journal ref: PMLR 202 (2023):3676-3713

arXiv:2212.07906 [pdf, other]

Flow-Lenia: Towards open-ended evolution in cellular automata through mass conservation and parameter localization

Authors: Erwan Plantec, Gautier Hamon, Mayalen Etcheverry, Pierre-Yves Oudeyer, Clément Moulin-Frier, Bert Wang-Chak Chan

Abstract: The design of complex self-organising systems producing life-like phenomena, such as the open-ended evolution of virtual creatures, is one of the main goals of artificial life. Lenia, a family of cellular automata (CA) generalizing Conway's Game of Life to continuous space, time and states, has attracted a lot of attention because of the wide diversity of self-organizing patterns it can generate.… ▽ More The design of complex self-organising systems producing life-like phenomena, such as the open-ended evolution of virtual creatures, is one of the main goals of artificial life. Lenia, a family of cellular automata (CA) generalizing Conway's Game of Life to continuous space, time and states, has attracted a lot of attention because of the wide diversity of self-organizing patterns it can generate. Among those, some spatially localized patterns (SLPs) resemble life-like artificial creatures and display complex behaviors. However, those creatures are found in only a small subspace of the Lenia parameter space and are not trivial to discover, necessitating advanced search algorithms. Furthermore, each of these creatures exist only in worlds governed by specific update rules and thus cannot interact in the same one. This paper proposes as mass-conservative extension of Lenia, called Flow Lenia, that solve both of these issues. We present experiments demonstrating its effectiveness in generating SLPs with complex behaviors and show that the update rule parameters can be optimized to generate SLPs showing behaviors of interest. Finally, we show that Flow Lenia enables the integration of the parameters of the CA update rules within the CA dynamics, making them dynamic and localized, allowing for multi-species simulations, with locally coherent update rules that define properties of the emerging creatures, and that can be mixed with neighbouring rules. We argue that this paves the way for the intrinsic evolution of self-organized artificial life forms within continuous CAs. △ Less

Submitted 24 March, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

arXiv:2211.14228 [pdf, other]

doi 10.1007/s40593-023-00340-7

GPT-3-driven pedagogical agents for training children's curious question-asking skills

Authors: Rania Abdelghani, Yen-Hsiang Wang, Xingdi Yuan, Tong Wang, Pauline Lucas, Hélène Sauzéon, Pierre-Yves Oudeyer

Abstract: In order to train children's ability to ask curiosity-driven questions, previous research has explored designing specific exercises relying on providing semantic and linguistic cues to help formulate such questions. But despite showing pedagogical efficiency, this method is still limited as it relies on generating the said cues by hand, which can be a very costly process. In this context, we propo… ▽ More In order to train children's ability to ask curiosity-driven questions, previous research has explored designing specific exercises relying on providing semantic and linguistic cues to help formulate such questions. But despite showing pedagogical efficiency, this method is still limited as it relies on generating the said cues by hand, which can be a very costly process. In this context, we propose to leverage advances in the natural language processing field (NLP) and investigate the efficiency of using a large language model (LLM) for automating the production of the pedagogical content of a curious question-asking (QA) training. We study generating the said content using the "prompt-based" method that consists of explaining the task to the LLM in natural text. We evaluate the output using human experts annotations and comparisons with hand-generated content. Results suggested indeed the relevance and usefulness of this content. We also conduct a field study in primary school (75 children aged 9-10), where we evaluate children's QA performance when having this training. We compare 3 types of content : 1) hand-generated content that proposes "closed" cues leading to predefined questions; 2) GPT-3-generated content that proposes the same type of cues; 3) GPT-3-generated content that proposes "open" cues leading to several possible questions. We see a similar QA performance between the two "closed" trainings (showing the scalability of the approach using GPT-3), and a better one for participants with the "open" training. These results suggest the efficiency of using LLMs to support children in generating more curious questions, using a natural language prompting approach that affords usability by teachers and other users not specialists of AI techniques. Furthermore, results also show that open-ended content may be more suitable for training curious question-asking skills. △ Less

Submitted 30 May, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

arXiv:2210.06468 [pdf, other]

Contrastive Multimodal Learning for Emergence of Graphical Sensory-Motor Communication

Authors: Tristan Karch, Yoann Lemesle, Romain Laroche, Clément Moulin-Frier, Pierre-Yves Oudeyer

Abstract: In this paper, we investigate whether artificial agents can develop a shared language in an ecological setting where communication relies on a sensory-motor channel. To this end, we introduce the Graphical Referential Game (GREG) where a speaker must produce a graphical utterance to name a visual referent object while a listener has to select the corresponding object among distractor referents, gi… ▽ More In this paper, we investigate whether artificial agents can develop a shared language in an ecological setting where communication relies on a sensory-motor channel. To this end, we introduce the Graphical Referential Game (GREG) where a speaker must produce a graphical utterance to name a visual referent object while a listener has to select the corresponding object among distractor referents, given the delivered message. The utterances are drawing images produced using dynamical motor primitives combined with a sketching library. To tackle GREG we present CURVES: a multimodal contrastive deep learning mechanism that represents the energy (alignment) between named referents and utterances generated through gradient ascent on the learned energy landscape. We demonstrate that CURVES not only succeeds at solving the GREG but also enables agents to self-organize a language that generalizes to feature compositions never seen during training. In addition to evaluating the communication performance of our approach, we also explore the structure of the emerging language. Specifically, we show that the resulting language forms a coherent lexicon shared between agents and that basic compositional rules on the graphical productions could not explain the compositional generalization. △ Less

Submitted 14 February, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

arXiv:2209.11000 [pdf, other]

Selecting Better Samples from Pre-trained LLMs: A Case Study on Question Generation

Authors: Xingdi Yuan, Tong Wang, Yen-Hsiang Wang, Emery Fine, Rania Abdelghani, Pauline Lucas, Hélène Sauzéon, Pierre-Yves Oudeyer

Abstract: Large Language Models (LLMs) have in recent years demonstrated impressive prowess in natural language generation. A common practice to improve generation diversity is to sample multiple outputs from the model. However, there lacks a simple and robust way of selecting the best output from these stochastic samples. As a case study framed in the context of question generation, we propose two prompt-b… ▽ More Large Language Models (LLMs) have in recent years demonstrated impressive prowess in natural language generation. A common practice to improve generation diversity is to sample multiple outputs from the model. However, there lacks a simple and robust way of selecting the best output from these stochastic samples. As a case study framed in the context of question generation, we propose two prompt-based approaches to selecting high-quality questions from a set of LLM-generated candidates. Our method works under the constraints of 1) a black-box (non-modifiable) question generation model and 2) lack of access to human-annotated references -- both of which are realistic limitations for real-world deployment of LLMs. With automatic as well as human evaluations, we empirically demonstrate that our approach can effectively select questions of higher qualities than greedy generation. △ Less

Submitted 22 September, 2022; originally announced September 2022.

arXiv:2207.04118 [pdf, other]

Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents

Authors: Laetitia Teodorescu, Eric Yuan, Marc-Alexandre Côté, Pierre-Yves Oudeyer

Abstract: In this extended abstract we discuss the opportunities and challenges of studying intrinsically-motivated agents for exploration in textual environments. We argue that there is important synergy between text environments and autonomous agents. We identify key properties of text worlds that make them suitable for exploration by autonmous agents, namely, depth, breadth, progress niches and the ease… ▽ More In this extended abstract we discuss the opportunities and challenges of studying intrinsically-motivated agents for exploration in textual environments. We argue that there is important synergy between text environments and autonomous agents. We identify key properties of text worlds that make them suitable for exploration by autonmous agents, namely, depth, breadth, progress niches and the ease of use of language goals; we identify drivers of exploration for such agents that are implementable in text worlds. We discuss the opportunities of using autonomous agents to make progress on text environment benchmarks. Finally we list some specific challenges that need to be overcome in this area. △ Less

Submitted 8 July, 2022; originally announced July 2022.

arXiv:2206.09674 [pdf, other]

EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL

Authors: Thomas Carta, Pierre-Yves Oudeyer, Olivier Sigaud, Sylvain Lamprier

Abstract: Reinforcement learning (RL) in long horizon and sparse reward tasks is notoriously difficult and requires a lot of training steps. A standard solution to speed up the process is to leverage additional reward signals, shaping it to better guide the learning process. In the context of language-conditioned RL, the abstraction and generalisation properties of the language input provide opportunities f… ▽ More Reinforcement learning (RL) in long horizon and sparse reward tasks is notoriously difficult and requires a lot of training steps. A standard solution to speed up the process is to leverage additional reward signals, shaping it to better guide the learning process. In the context of language-conditioned RL, the abstraction and generalisation properties of the language input provide opportunities for more efficient ways of shaping the reward. In this paper, we leverage this idea and propose an automated reward shaping method where the agent extracts auxiliary objectives from the general language goal. These auxiliary objectives use a question generation (QG) and question answering (QA) system: they consist of questions leading the agent to try to reconstruct partial information about the global goal using its own trajectory. When it succeeds, it receives an intrinsic reward proportional to its confidence in its answer. This incentivizes the agent to generate trajectories which unambiguously explain various aspects of the general language goal. Our experimental study shows that this approach, which does not require engineer intervention to design the auxiliary objectives, improves sample efficiency by effectively directing exploration. △ Less

Submitted 13 October, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

Comments: 24 pages, 16 figures, 5 tables

arXiv:2206.05060 [pdf, other]

Social Network Structure Shapes Innovation: Experience-sharing in RL with SAPIENS

Authors: Eleni Nisioti, Mateo Mahaut, Pierre-Yves Oudeyer, Ida Momennejad, Clément Moulin-Frier

Abstract: Human culture relies on innovation: our ability to continuously explore how existing elements can be combined to create new ones. Innovation is not solitary, it relies on collective search and accumulation. Reinforcement learning (RL) approaches commonly assume that fully-connected groups are best suited for innovation. However, human laboratory and field studies have shown that hierarchical innov… ▽ More Human culture relies on innovation: our ability to continuously explore how existing elements can be combined to create new ones. Innovation is not solitary, it relies on collective search and accumulation. Reinforcement learning (RL) approaches commonly assume that fully-connected groups are best suited for innovation. However, human laboratory and field studies have shown that hierarchical innovation is more robustly achieved by dynamic social network structures. In dynamic settings, humans oscillate between innovating individually or in small clusters, and then sharing outcomes with others. To our knowledge, the role of social network structure on innovation has not been systematically studied in RL. Here, we use a multi-level problem setting (WordCraft), with three different innovation tasks to test the hypothesis that the social network structure affects the performance of distributed RL algorithms. We systematically design networks of DQNs sharing experiences from their replay buffers in varying structures (fully-connected, small world, dynamic, ring) and introduce a set of behavioral and mnemonic metrics that extend the classical reward-focused evaluation framework of RL. Comparing the level of innovation achieved by different social network structures across different tasks shows that, first, consistent with human findings, experience sharing within a dynamic structure achieves the highest level of innovation in tasks with a deceptive nature and large search spaces. Second, experience sharing is not as helpful when there is a single clear path to innovation. Third, the metrics we propose, can help understand the success of different social network structures on different tasks, with the diversity of experiences on an individual and group level lending crucial insights. △ Less

Submitted 18 November, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

arXiv:2206.01134 [pdf, other]

doi 10.1038/s42256-022-00591-4

Language and Culture Internalisation for Human-Like Autotelic AI

Authors: Cédric Colas, Tristan Karch, Clément Moulin-Frier, Pierre-Yves Oudeyer

Abstract: Building autonomous agents able to grow open-ended repertoires of skills across their lives is a fundamental goal of artificial intelligence (AI). A promising developmental approach recommends the design of intrinsically motivated agents that learn new skills by generating and pursuing their own goals - autotelic agents. But despite recent progress, existing algorithms still show serious limitatio… ▽ More Building autonomous agents able to grow open-ended repertoires of skills across their lives is a fundamental goal of artificial intelligence (AI). A promising developmental approach recommends the design of intrinsically motivated agents that learn new skills by generating and pursuing their own goals - autotelic agents. But despite recent progress, existing algorithms still show serious limitations in terms of goal diversity, exploration, generalisation or skill composition. This perspective calls for the immersion of autotelic agents into rich socio-cultural worlds, an immensely important attribute of our environment that shapes human cognition but is mostly omitted in modern AI. Inspired by the seminal work of Vygotsky, we propose Vygotskian autotelic agents - agents able to internalise their interactions with others and turn them into cognitive tools. We focus on language and show how its structure and informational content may support the development of new cognitive functions in artificial agents as it does in humans. We justify the approach by uncovering several examples of new artificial cognitive functions emerging from interactions between language and embodiment in recent works at the intersection of deep reinforcement learning and natural language processing. Looking forward, we highlight future opportunities and challenges for Vygotskian Autotelic AI research, including the use of language models as cultural models supporting artificial cognitive development. △ Less

Submitted 16 November, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

Journal ref: Nature Machine Intelligence 4, 1068-1076 (2022)

arXiv:2205.06111 [pdf, other]

Asking for Knowledge: Training RL Agents to Query External Knowledge Using Language

Authors: Iou-Jen Liu, Xingdi Yuan, Marc-Alexandre Côté, Pierre-Yves Oudeyer, Alexander G. Schwing

Abstract: To solve difficult tasks, humans ask questions to acquire knowledge from external sources. In contrast, classical reinforcement learning agents lack such an ability and often resort to exploratory behavior. This is exacerbated as few present-day environments support querying for knowledge. In order to study how agents can be taught to query external knowledge via language, we first introduce two n… ▽ More To solve difficult tasks, humans ask questions to acquire knowledge from external sources. In contrast, classical reinforcement learning agents lack such an ability and often resort to exploratory behavior. This is exacerbated as few present-day environments support querying for knowledge. In order to study how agents can be taught to query external knowledge via language, we first introduce two new environments: the grid-world-based Q-BabyAI and the text-based Q-TextWorld. In addition to physical interactions, an agent can query an external knowledge source specialized for these environments to gather information. Second, we propose the "Asking for Knowledge" (AFK) agent, which learns to generate language commands to query for meaningful knowledge that helps solve the tasks. AFK leverages a non-parametric memory, a pointer mechanism and an episodic exploration bonus to tackle (1) irrelevant information, (2) a large query language space, (3) delayed reward for making meaningful queries. Extensive experiments demonstrate that the AFK agent outperforms recent baselines on the challenging Q-BabyAI and Q-TextWorld environments. △ Less

Submitted 3 July, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

Comments: ICML 2022; Project page: https://ioujenliu.github.io/AFK/

arXiv:2204.03546 [pdf, other]

doi 10.1016/j.ijhcs.2022.102887

Conversational agents for fostering curiosity-driven learning in children

Authors: Rania Abdelghani, Pierre-Yves Oudeyer, Edith Law, Catherine de Vulpillières, Hélène Sauzéon

Abstract: Curiosity is an important factor that favors independent and individualized learning in children. Research suggests that it is also a competence that can be fostered by training specific metacognitive skills and information-searching behaviors. In this light, we develop a conversational agent that helps children generate curiosity-driven questions, and encourages their use to lead autonomous explo… ▽ More Curiosity is an important factor that favors independent and individualized learning in children. Research suggests that it is also a competence that can be fostered by training specific metacognitive skills and information-searching behaviors. In this light, we develop a conversational agent that helps children generate curiosity-driven questions, and encourages their use to lead autonomous explorations and gain new knowledge. The study was conducted with 51 primary school students who interacted with either a neutral agent or an incentive agent that helped curiosity-driven questioning by offering specific semantic cues. Results showed a significant increase in the number and the quality of the questions generated with the incentive agent. This interaction also resulted in longer explorations and stronger learning progress. Together, our results suggest that the more our agent is able to train children's curiosity-related metacognitive skills, the better they can maintain their information-searching behaviors and the more new knowledge they are likely to acquire. △ Less

Submitted 12 April, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

arXiv:2201.11014 [pdf, other]

Language-biased image classification: evaluation based on semantic representations

Authors: Yoann Lemesle, Masataka Sawayama, Guillermo Valle-Perez, Maxime Adolphe, Hélène Sauzéon, Pierre-Yves Oudeyer

Abstract: Humans show language-biased image recognition for a word-embedded image, known as picture-word interference. Such interference depends on hierarchical semantic categories and reflects that human language processing highly interacts with visual processing. Similar to humans, recent artificial models jointly trained on texts and images, e.g., OpenAI CLIP, show language-biased image classification. E… ▽ More Humans show language-biased image recognition for a word-embedded image, known as picture-word interference. Such interference depends on hierarchical semantic categories and reflects that human language processing highly interacts with visual processing. Similar to humans, recent artificial models jointly trained on texts and images, e.g., OpenAI CLIP, show language-biased image classification. Exploring whether the bias leads to interference similar to those observed in humans can contribute to understanding how much the model acquires hierarchical semantic representations from joint learning of language and vision. The present study introduces methodological tools from the cognitive science literature to assess the biases of artificial models. Specifically, we introduce a benchmark task to test whether words superimposed on images can distort the image classification across different category levels and, if it can, whether the perturbation is due to the shared semantic representation between language and vision. Our dataset is a set of word-embedded images and consists of a mixture of natural image datasets and hierarchical word labels with superordinate/basic category levels. Using this benchmark test, we evaluate the CLIP model. We show that presenting words distorts the image classification by the model across different category levels, but the effect does not depend on the semantic relationship between images and embedded words. This suggests that the semantic word representation in the CLIP visual processing is not shared with the image representation, although the word representation strongly dominates for word-embedded images. △ Less

Submitted 12 March, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

Comments: Accepted at ICLR 2022

arXiv:2112.07342 [pdf, other]

Learning to Guide and to Be Guided in the Architect-Builder Problem

Authors: Paul Barde, Tristan Karch, Derek Nowrouzezahrai, Clément Moulin-Frier, Christopher Pal, Pierre-Yves Oudeyer

Abstract: We are interested in interactive agents that learn to coordinate, namely, a $builder$ -- which performs actions but ignores the goal of the task, i.e. has no access to rewards -- and an $architect$ which guides the builder towards the goal of the task. We define and explore a formal setting where artificial agents are equipped with mechanisms that allow them to simultaneously learn a task while at… ▽ More We are interested in interactive agents that learn to coordinate, namely, a $builder$ -- which performs actions but ignores the goal of the task, i.e. has no access to rewards -- and an $architect$ which guides the builder towards the goal of the task. We define and explore a formal setting where artificial agents are equipped with mechanisms that allow them to simultaneously learn a task while at the same time evolving a shared communication protocol. Ideally, such learning should only rely on high-level communication priors and be able to handle a large variety of tasks and meanings while deriving communication protocols that can be reused across tasks. We present the Architect-Builder Problem (ABP): an asymmetrical setting in which an architect must learn to guide a builder towards constructing a specific structure. The architect knows the target structure but cannot act in the environment and can only send arbitrary messages to the builder. The builder on the other hand can act in the environment, but receives no rewards nor has any knowledge about the task, and must learn to solve it relying only on the messages sent by the architect. Crucially, the meaning of messages is initially not defined nor shared between the agents but must be negotiated throughout learning. Under these constraints, we propose Architect-Builder Iterated Guiding (ABIG), a solution to ABP where the architect leverages a learned model of the builder to guide it while the builder uses self-imitation learning to reinforce its guided behavior. We analyze the key learning mechanisms of ABIG and test it in 2D tasks involving grasping cubes, placing them at a given location, or building various shapes. ABIG results in a low-level, high-frequency, guiding communication protocol that not only enables an architect-builder pair to solve the task at hand, but that can also generalize to unseen tasks. △ Less

Submitted 11 April, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: International Conference on Learning Representations (2022)

arXiv:2111.00360 [pdf, other]

doi 10.1007/s12369-021-00820-7

Identifying Functions and Behaviours of Social Robots during Learning Activities: Teachers' Perspective

Authors: Jessy Ceha, Edith Law, Dana Kulić, Pierre-Yves Oudeyer, Didier Roy

Abstract: With advances in artificial intelligence, research is increasingly exploring the potential functions that social robots can play in education. As teachers are a critical stakeholder in the use and application of educational technologies, we conducted a study to understand teachers' perspectives on how a social robot could support a variety of learning activities in the classroom. Through interview… ▽ More With advances in artificial intelligence, research is increasingly exploring the potential functions that social robots can play in education. As teachers are a critical stakeholder in the use and application of educational technologies, we conducted a study to understand teachers' perspectives on how a social robot could support a variety of learning activities in the classroom. Through interviews, robot puppeteering, and group brainstorming sessions with five elementary and middle school teachers from a local school in Canada, we take a socio-technical perspective to conceptualize possible robot functions and behaviours, and the effects they may have on the current way learning activities are designed, planned, and executed. Overall, the teachers responded positively to the idea of introducing a social robot as a technological tool for learning activities, envisioning differences in usage for teacher-robot and student-robot interactions. Further, Engeström's Activity System Model -- a framework for analyzing human needs, tasks, and outcomes -- illustrated a number of tensions associated with learning activities in the classroom. We discuss the fine-grained robot functions and behaviours conceived by teachers, and how they address the current tensions -- providing suggestions for improving the design of social robots for learning activities. △ Less

Submitted 30 October, 2021; originally announced November 2021.

Comments: This is a preprint of an article published in The International Journal of Social Robotics. The final authenticated version is available online at: https://doi.org/10.1007/s12369-021-00820-7

arXiv:2107.00956 [pdf, other]

SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents

Authors: Grgur Kovač, Rémy Portelas, Katja Hofmann, Pierre-Yves Oudeyer

Abstract: Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. Within the Deep Reinforcement Learning (DRL) field, this objective motivated multiple works on embodied language use. However, current approaches focus on language as a communication tool in very simplified and non-diverse social situations: the "naturalness" of lang… ▽ More Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. Within the Deep Reinforcement Learning (DRL) field, this objective motivated multiple works on embodied language use. However, current approaches focus on language as a communication tool in very simplified and non-diverse social situations: the "naturalness" of language is reduced to the concept of high vocabulary size and variability. In this paper, we argue that aiming towards human-level AI requires a broader set of key social skills: 1) language use in complex and variable social contexts; 2) beyond language, complex embodied communication in multimodal settings within constantly evolving social worlds. We explain how concepts from cognitive sciences could help AI to draw a roadmap towards human-like intelligence, with a focus on its social dimensions. As a first step, we propose to expand current research to a broader set of core social skills. To do this, we present SocialAI, a benchmark to assess the acquisition of social skills of DRL agents using multiple grid-world environments featuring other (scripted) social agents. We then study the limits of a recent SOTA DRL approach when tested on SocialAI and discuss important next steps towards proficient social agents. Videos and code are available at https://sites.google.com/view/socialai. △ Less

Submitted 1 September, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

Comments: under review. This paper extends and generalizes work in arXiv:2104.13207

arXiv:2106.14421 [pdf, other]

Causal Reinforcement Learning using Observational and Interventional Data

Authors: Maxime Gasse, Damien Grasset, Guillaume Gaudron, Pierre-Yves Oudeyer

Abstract: Learning efficiently a causal model of the environment is a key challenge of model-based RL agents operating in POMDPs. We consider here a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment (interventional data), but has also access to a large collection of offline experiences, obtained by observing another agent interac… ▽ More Learning efficiently a causal model of the environment is a key challenge of model-based RL agents operating in POMDPs. We consider here a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment (interventional data), but has also access to a large collection of offline experiences, obtained by observing another agent interacting with the environment (observational data). A key ingredient, that makes this situation non-trivial, is that we allow the observed agent to interact with the environment based on hidden information, which is not observed by the learning agent. We then ask the following questions: can the online and offline experiences be safely combined for learning a causal model ? And can we expect the offline experiences to improve the agent's performances ? To answer these questions, we import ideas from the well-established causal framework of do-calculus, and we express model-based reinforcement learning as a causal inference problem. Then, we propose a general yet simple methodology for leveraging offline data during learning. In a nutshell, the method relies on learning a latent-based causal transition model that explains both the interventional and observational regimes, and then using the recovered latent variable to infer the standard POMDP transition model via deconfounding. We prove our method is correct and efficient in the sense that it attains better generalization guarantees due to the offline data (in the asymptotic case), and we illustrate its effectiveness empirically on synthetic toy problems. Our contribution aims at bridging the gap between the fields of reinforcement learning and causality. △ Less

Submitted 28 June, 2021; originally announced June 2021.

arXiv:2106.13871 [pdf, other]

doi 10.1145/3478513.3480570

Transflower: probabilistic autoregressive dance generation with multimodal attention

Authors: Guillermo Valle-Pérez, Gustav Eje Henter, Jonas Beskow, André Holzapfel, Pierre-Yves Oudeyer, Simon Alexanderson

Abstract: Dance requires skillful composition of complex movements that follow rhythmic, tonal and timbral features of music. Formally, generating dance conditioned on a piece of music can be expressed as a problem of modelling a high-dimensional continuous motion signal, conditioned on an audio signal. In this work we make two contributions to tackle this problem. First, we present a novel probabilistic au… ▽ More Dance requires skillful composition of complex movements that follow rhythmic, tonal and timbral features of music. Formally, generating dance conditioned on a piece of music can be expressed as a problem of modelling a high-dimensional continuous motion signal, conditioned on an audio signal. In this work we make two contributions to tackle this problem. First, we present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context, using a multimodal transformer encoder. Second, we introduce the currently largest 3D dance-motion dataset, obtained with a variety of motion-capture technologies, and including both professional and casual dancers. Using this dataset, we compare our new model against two baselines, via objective metrics and a user study, and show that both the ability to model a probability distribution, as well as being able to attend over a large motion and music context are necessary to produce interesting, diverse, and realistic dance that matches the music. △ Less

Submitted 11 June, 2022; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: Article presented at SIGGRAPH Asia 2021, and published in ACM Transactions on Graphics

arXiv:2106.08858 [pdf, other]

Grounding Spatio-Temporal Language with Transformers

Authors: Tristan Karch, Laetitia Teodorescu, Katja Hofmann, Clément Moulin-Frier, Pierre-Yves Oudeyer

Abstract: Language is an interface to the outside world. In order for embodied agents to use it, language must be grounded in other, sensorimotor modalities. While there is an extended literature studying how machines can learn grounded language, the topic of how to learn spatio-temporal linguistic concepts is still largely uncharted. To make progress in this direction, we here introduce a novel spatio-temp… ▽ More Language is an interface to the outside world. In order for embodied agents to use it, language must be grounded in other, sensorimotor modalities. While there is an extended literature studying how machines can learn grounded language, the topic of how to learn spatio-temporal linguistic concepts is still largely uncharted. To make progress in this direction, we here introduce a novel spatio-temporal language grounding task where the goal is to learn the meaning of spatio-temporal descriptions of behavioral traces of an embodied agent. This is achieved by training a truth function that predicts if a description matches a given history of observations. The descriptions involve time-extended predicates in past and present tense as well as spatio-temporal references to objects in the scene. To study the role of architectural biases in this task, we train several models including multimodal Transformer architectures; the latter implement different attention computations between words and objects across space and time. We test models on two classes of generalization: 1) generalization to randomly held-out sentences; 2) generalization to grammar primitives. We observe that maintaining object identity in the attention computation of our Transformers is instrumental to achieving good performance on generalization overall, and that summarizing object traces in a single token has little influence on performance. We then discuss how this opens new perspectives for language-guided autonomous embodied agents. We also release our code under open-source license as well as pretrained models and datasets to encourage the wider community to build upon and extend our work in the future. △ Less

Submitted 11 October, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: Contains main article and supplementaries

Journal ref: Neurips 2021

arXiv:2105.11977 [pdf, other]

doi 10.1109/TCDS.2022.3231731.

Towards Teachable Autotelic Agents

Authors: Olivier Sigaud, Ahmed Akakzia, Hugo Caselles-Dupré, Cédric Colas, Pierre-Yves Oudeyer, Mohamed Chetouani

Abstract: Autonomous discovery and direct instruction are two distinct sources of learning in children but education sciences demonstrate that mixed approaches such as assisted discovery or guided play result in improved skill acquisition. In the field of Artificial Intelligence, these extremes respectively map to autonomous agents learning from their own signals and interactive learning agents fully taught… ▽ More Autonomous discovery and direct instruction are two distinct sources of learning in children but education sciences demonstrate that mixed approaches such as assisted discovery or guided play result in improved skill acquisition. In the field of Artificial Intelligence, these extremes respectively map to autonomous agents learning from their own signals and interactive learning agents fully taught by their teachers. In between should stand teachable autotelic agents (TAA): agents that learn from both internal and teaching signals to benefit from the higher efficiency of assisted discovery. Designing such agents will enable real-world non-expert users to orient the learning trajectories of agents towards their expectations. More fundamentally, this may also be a key step to build agents with human-level intelligence. This paper presents a roadmap towards the design of teachable autonomous agents. Building on developmental psychology and education sciences, we start by identifying key features enabling assisted discovery processes in child-tutor interactions. This leads to the production of a checklist of features that future TAA will need to demonstrate. The checklist allows us to precisely pinpoint the various limitations of current reinforcement learning agents and to identify the promising first steps towards TAA. It also shows the way forward by highlighting key research directions towards the design or autonomous agents that can be taught by ordinary people via natural pedagogy. △ Less

Submitted 20 March, 2023; v1 submitted 25 May, 2021; originally announced May 2021.

Journal ref: Sigaud, O., Akakzia, A., Caselles-Dupré, H., Colas, C., Oudeyer, P. Y., & Chetouani, M. (2022). Towards Teachable Autotelic Agents. IEEE Transactions on Cognitive and Developmental Systems

arXiv:2104.13207 [pdf, other]

SocialAI 0.1: Towards a Benchmark to Stimulate Research on Socio-Cognitive Abilities in Deep Reinforcement Learning Agents

Authors: Grgur Kovač, Rémy Portelas, Katja Hofmann, Pierre-Yves Oudeyer

Abstract: Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. This problem motivated many research directions on embodied language use. Current approaches focus on language as a communication tool in very simplified and non diverse social situations: the "naturalness" of language is reduced to the concept of high vocabulary siz… ▽ More Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. This problem motivated many research directions on embodied language use. Current approaches focus on language as a communication tool in very simplified and non diverse social situations: the "naturalness" of language is reduced to the concept of high vocabulary size and variability. In this paper, we argue that aiming towards human-level AI requires a broader set of key social skills: 1) language use in complex and variable social contexts; 2) beyond language, complex embodied communication in multimodal settings within constantly evolving social worlds. In this work we explain how concepts from cognitive sciences could help AI to draw a roadmap towards human-like intelligence, with a focus on its social dimensions. We then study the limits of a recent SOTA Deep RL approach when tested on a first grid-world environment from the upcoming SocialAI, a benchmark to assess the social skills of Deep RL agents. Videos and code are available at https://sites.google.com/view/socialai01 . △ Less

Submitted 27 April, 2021; originally announced April 2021.

Comments: Accepted at NAACL ViGIL Workshop 2021

arXiv:2103.09815 [pdf, other]

TeachMyAgent: a Benchmark for Automatic Curriculum Learning in Deep RL

Authors: Clément Romac, Rémy Portelas, Katja Hofmann, Pierre-Yves Oudeyer

Abstract: Training autonomous agents able to generalize to multiple tasks is a key target of Deep Reinforcement Learning (DRL) research. In parallel to improving DRL algorithms themselves, Automatic Curriculum Learning (ACL) study how teacher algorithms can train DRL agents more efficiently by adapting task selection to their evolving abilities. While multiple standard benchmarks exist to compare DRL agents… ▽ More Training autonomous agents able to generalize to multiple tasks is a key target of Deep Reinforcement Learning (DRL) research. In parallel to improving DRL algorithms themselves, Automatic Curriculum Learning (ACL) study how teacher algorithms can train DRL agents more efficiently by adapting task selection to their evolving abilities. While multiple standard benchmarks exist to compare DRL agents, there is currently no such thing for ACL algorithms. Thus, comparing existing approaches is difficult, as too many experimental parameters differ from paper to paper. In this work, we identify several key challenges faced by ACL algorithms. Based on these, we present TeachMyAgent (TA), a benchmark of current ACL algorithms leveraging procedural task generation. It includes 1) challenge-specific unit-tests using variants of a procedural Box2D bipedal walker environment, and 2) a new procedural Parkour environment combining most ACL challenges, making it ideal for global performance assessment. We then use TeachMyAgent to conduct a comparative study of representative existing approaches, showcasing the competitiveness of some ACL algorithms that do not use expert knowledge. We also show that the Parkour environment remains an open problem. We open-source our environments, all studied ACL algorithms (collected from open-source code or re-implemented), and DRL students in a Python package available at https://github.com/flowersteam/TeachMyAgent. △ Less

Submitted 9 June, 2021; v1 submitted 17 March, 2021; originally announced March 2021.

arXiv:2103.06769 [pdf, other]

doi 10.1007/s13218-020-00696-1

Intelligent behavior depends on the ecological niche: Scaling up AI to human-like intelligence in socio-cultural environments

Authors: Manfred Eppe, Pierre-Yves Oudeyer

Abstract: This paper outlines a perspective on the future of AI, discussing directions for machines models of human-like intelligence. We explain how developmental and evolutionary theories of human cognition should further inform artificial intelligence. We emphasize the role of ecological niches in sculpting intelligent behavior, and in particular that human intelligence was fundamentally shaped to adapt… ▽ More This paper outlines a perspective on the future of AI, discussing directions for machines models of human-like intelligence. We explain how developmental and evolutionary theories of human cognition should further inform artificial intelligence. We emphasize the role of ecological niches in sculpting intelligent behavior, and in particular that human intelligence was fundamentally shaped to adapt to a constantly changing socio-cultural environment. We argue that a major limit of current work in AI is that it is missing this perspective, both theoretically and experimentally. Finally, we discuss the promising approach of developmental artificial intelligence, modeling infant development through multi-scale interaction between intrinsically motivated learning, embodiment and a fastly changing socio-cultural environment. This paper takes the form of an interview of Pierre-Yves Oudeyer by Mandred Eppe, organized within the context of a KI - K{ü}nstliche Intelligenz special issue in developmental robotics. △ Less

Submitted 11 March, 2021; originally announced March 2021.

Comments: Keywords: developmental AI, general artificial intelligence, human-like AI, embodiment, cultural evolution, language, socio-cultural skills

Journal ref: KI - Künstliche Intelligenz KI - Künstliche Intelligenz (German Journal of Artificial Intelligence), 2021

arXiv:2012.09830 [pdf, other]

Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey

Authors: Cédric Colas, Tristan Karch, Olivier Sigaud, Pierre-Yves Oudeyer

Abstract: Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by $autotelic$ $agents$: intrinsically motivated learning agents that can learn to represent, generate, select and solve their own problems. In recent ye… ▽ More Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by $autotelic$ $agents$: intrinsically motivated learning agents that can learn to represent, generate, select and solve their own problems. In recent years, the convergence of developmental approaches with deep reinforcement learning (RL) methods has been leading to the emergence of a new field: $developmental$ $reinforcement$ $learning$. Developmental RL is concerned with the use of deep RL algorithms to tackle a developmental problem -- the $intrinsically$ $motivated$ $acquisition$ $of$ $open$-$ended$ $repertoires$ $of$ $skills$. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions. This raises new challenges compared to standard RL algorithms originally designed to tackle pre-defined sets of goals using external reward signals. The present paper introduces developmental RL and proposes a computational framework based on goal-conditioned RL to tackle the intrinsically motivated skills acquisition problem. It proceeds to present a typology of the various goal representations used in the literature, before reviewing existing methods to learn to represent and prioritize goals in autonomous systems. We finally close the paper by discussing some open challenges in the quest of intrinsically motivated skills acquisition. △ Less

Submitted 12 July, 2022; v1 submitted 17 December, 2020; originally announced December 2020.

arXiv:2011.08463 [pdf, other]

Meta Automatic Curriculum Learning

Authors: Rémy Portelas, Clément Romac, Katja Hofmann, Pierre-Yves Oudeyer

Abstract: A major challenge in the Deep RL (DRL) community is to train agents able to generalize their control policy over situations never seen in training. Training on diverse tasks has been identified as a key ingredient for good generalization, which pushed researchers towards using rich procedural task generation systems controlled through complex continuous parameter spaces. In such complex task space… ▽ More A major challenge in the Deep RL (DRL) community is to train agents able to generalize their control policy over situations never seen in training. Training on diverse tasks has been identified as a key ingredient for good generalization, which pushed researchers towards using rich procedural task generation systems controlled through complex continuous parameter spaces. In such complex task spaces, it is essential to rely on some form of Automatic Curriculum Learning (ACL) to adapt the task sampling distribution to a given learning agent, instead of randomly sampling tasks, as many could end up being either trivial or unfeasible. Since it is hard to get prior knowledge on such task spaces, many ACL algorithms explore the task space to detect progress niches over time, a costly tabula-rasa process that needs to be performed for each new learning agents, although they might have similarities in their capabilities profiles. To address this limitation, we introduce the concept of Meta-ACL, and formalize it in the context of black-box RL learners, i.e. algorithms seeking to generalize curriculum generation to an (unknown) distribution of learners. In this work, we present AGAIN, a first instantiation of Meta-ACL, and showcase its benefits for curriculum generation over classical ACL in multiple simulated environments including procedurally generated parkour environments with learners of varying morphologies. Videos and code are available at https://sites.google.com/view/meta-acl . △ Less

Submitted 1 September, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

Comments: This paper extends and generalizes work in arXiv:2004.03168

arXiv:2010.07208 [pdf, other]

doi 10.1109/TCDS.2017.2704912

Emergent Jaw Predominance in Vocal Development through Stochastic Optimization

Authors: Clément Moulin-Frier, Jules Brochard, Freek Stulp, Pierre-Yves Oudeyer

Abstract: Infant vocal babbling strongly relies on jaw oscillations, especially at the stage of canonical babbling, which underlies the syllabic structure of world languages. In this paper, we propose, model and analyze an hypothesis to explain this predominance of the jaw in early babbling. This hypothesis states that general stochastic optimization principles, when applied to learning sensorimotor control… ▽ More Infant vocal babbling strongly relies on jaw oscillations, especially at the stage of canonical babbling, which underlies the syllabic structure of world languages. In this paper, we propose, model and analyze an hypothesis to explain this predominance of the jaw in early babbling. This hypothesis states that general stochastic optimization principles, when applied to learning sensorimotor control, automatically generate ordered babbling stages with a predominant exploration of jaw movements in early stages. The reason is that those movements impact the auditory effects more than other articulators. In previous computational models, such general principles were shown to selectively freeze and free degrees of freedom in a model reproducing the proximo-distal development observed in infant arm reaching. The contribution of this paper is to show how, using the same methods, we are able to explain such patterns in vocal development. We present three experiments. The two first ones show that the recruitment order of articulators emerging from stochastic optimization depends on the target sound to be achieved but that on average the jaw is largely chosen as the first recruited articulator. The third experiment analyses in more detail how the emerging recruitment order is shaped by the dynamics of the optimization process. △ Less

Submitted 8 October, 2020; originally announced October 2020.

Journal ref: IEEE Transactions on Cognitive and Developmental Systems (Volume: 12 , Issue: 3 , Sept. 2020)

arXiv:2010.04452 [pdf, other]

doi 10.1613/jair.1.12588

EpidemiOptim: A Toolbox for the Optimization of Control Policies in Epidemiological Models

Authors: Cédric Colas, Boris Hejblum, Sébastien Rouillon, Rodolphe Thiébaut, Pierre-Yves Oudeyer, Clément Moulin-Frier, Mélanie Prague

Abstract: Epidemiologists model the dynamics of epidemics in order to propose control strategies based on pharmaceutical and non-pharmaceutical interventions (contact limitation, lock down, vaccination, etc). Hand-designing such strategies is not trivial because of the number of possible interventions and the difficulty to predict long-term effects. This task can be cast as an optimization problem where sta… ▽ More Epidemiologists model the dynamics of epidemics in order to propose control strategies based on pharmaceutical and non-pharmaceutical interventions (contact limitation, lock down, vaccination, etc). Hand-designing such strategies is not trivial because of the number of possible interventions and the difficulty to predict long-term effects. This task can be cast as an optimization problem where state-of-the-art machine learning algorithms such as deep reinforcement learning, might bring significant value. However, the specificity of each domain -- epidemic modelling or solving optimization problem -- requires strong collaborations between researchers from different fields of expertise. This is why we introduce EpidemiOptim, a Python toolbox that facilitates collaborations between researchers in epidemiology and optimization. EpidemiOptim turns epidemiological models and cost functions into optimization problems via a standard interface commonly used by optimization practitioners (OpenAI Gym). Reinforcement learning algorithms based on Q-Learning with deep neural networks (DQN) and evolutionary algorithms (NSGA-II) are already implemented. We illustrate the use of EpidemiOptim to find optimal policies for dynamical on-off lock-down control under the optimization of death toll and economic recess using a Susceptible-Exposed-Infectious-Removed (SEIR) model for COVID-19. Using EpidemiOptim and its interactive visualization platform in Jupyter notebooks, epidemiologists, optimization practitioners and others (e.g. economists) can easily compare epidemiological models, costs functions and optimization algorithms to address important choices to be made by health decision-makers. △ Less

Submitted 9 October, 2020; originally announced October 2020.

Journal ref: Journal of Artificial Intelligence Research-2021

arXiv:2008.04388 [pdf]

doi 10.1109/TCDS.2022.3216911

GRIMGEP: Learning Progress for Robust Goal Sampling in Visual Deep Reinforcement Learning

Authors: Grgur Kovač, Adrien Laversanne-Finot, Pierre-Yves Oudeyer

Abstract: Designing agents, capable of learning autonomously a wide range of skills is critical in order to increase the scope of reinforcement learning. It will both increase the diversity of learned skills and reduce the burden of manually designing reward functions for each skill. Self-supervised agents, setting their own goals, and trying to maximize the diversity of those goals have shown great promise… ▽ More Designing agents, capable of learning autonomously a wide range of skills is critical in order to increase the scope of reinforcement learning. It will both increase the diversity of learned skills and reduce the burden of manually designing reward functions for each skill. Self-supervised agents, setting their own goals, and trying to maximize the diversity of those goals have shown great promise towards this end. However, a currently known limitation of agents trying to maximize the diversity of sampled goals is that they tend to get attracted to noise or more generally to parts of the environments that cannot be controlled (distractors). When agents have access to predefined goal features or expert knowledge, absolute Learning Progress (ALP) provides a way to distinguish between regions that can be controlled and those that cannot. However, those methods often fall short when the agents are only provided with raw sensory inputs such as images. In this work we extend those concepts to unsupervised image-based goal exploration. We propose a framework that allows agents to autonomously identify and ignore noisy distracting regions while searching for novelty in the learnable regions to both improve overall performance and avoid catastrophic forgetting. Our framework can be combined with any state-of-the-art novelty seeking goal exploration approaches. We construct a rich 3D image based environment with distractors. Experiments on this environment show that agents using our framework successfully identify interesting regions of the environment, resulting in drastically improved performances. The source code is available at https://sites.google.com/view/grimgep. △ Less

Submitted 7 November, 2022; v1 submitted 10 August, 2020; originally announced August 2020.

Comments: Published in IEEE Transactions on Cognitive and Developmental Systems

ACM Class: I.2.6

arXiv:2007.01195 [pdf, other]

Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic Systems

Authors: Mayalen Etcheverry, Clement Moulin-Frier, Pierre-Yves Oudeyer

Abstract: Self-organization of complex morphological patterns from local interactions is a fascinating phenomenon in many natural and artificial systems. In the artificial world, typical examples of such morphogenetic systems are cellular automata. Yet, their mechanisms are often very hard to grasp and so far scientific discoveries of novel patterns have primarily been relying on manual tuning and ad hoc ex… ▽ More Self-organization of complex morphological patterns from local interactions is a fascinating phenomenon in many natural and artificial systems. In the artificial world, typical examples of such morphogenetic systems are cellular automata. Yet, their mechanisms are often very hard to grasp and so far scientific discoveries of novel patterns have primarily been relying on manual tuning and ad hoc exploratory search. The problem of automated diversity-driven discovery in these systems was recently introduced [26, 62], highlighting that two key ingredients are autonomous exploration and unsupervised representation learning to describe "relevant" degrees of variations in the patterns. In this paper, we motivate the need for what we call Meta-diversity search, arguing that there is not a unique ground truth interesting diversity as it strongly depends on the final observer and its motives. Using a continuous game-of-life system for experiments, we provide empirical evidences that relying on monolithic architectures for the behavioral embedding design tends to bias the final discoveries (both for hand-defined and unsupervisedly-learned features) which are unlikely to be aligned with the interest of a final end-user. To address these issues, we introduce a novel dynamic and modular architecture that enables unsupervised learning of a hierarchy of diverse representations. Combined with intrinsically motivated goal exploration algorithms, we show that this system forms a discovery assistant that can efficiently adapt its diversity search towards preferences of a user using only a very small amount of user feedback. △ Less

Submitted 2 September, 2021; v1 submitted 2 July, 2020; originally announced July 2020.

Journal ref: Advances in Neural Information Processing Systems 33 (NeurIPS 2020 - oral)

arXiv:2006.07185 [pdf, other]

Grounding Language to Autonomously-Acquired Skills via Goal Generation

Authors: Ahmed Akakzia, Cédric Colas, Pierre-Yves Oudeyer, Mohamed Chetouani, Olivier Sigaud

Abstract: We are interested in the autonomous acquisition of repertoires of skills. Language-conditioned reinforcement learning (LC-RL) approaches are great tools in this quest, as they allow to express abstract goals as sets of constraints on the states. However, most LC-RL agents are not autonomous and cannot learn without external instructions and feedback. Besides, their direct language condition cannot… ▽ More We are interested in the autonomous acquisition of repertoires of skills. Language-conditioned reinforcement learning (LC-RL) approaches are great tools in this quest, as they allow to express abstract goals as sets of constraints on the states. However, most LC-RL agents are not autonomous and cannot learn without external instructions and feedback. Besides, their direct language condition cannot account for the goal-directed behavior of pre-verbal infants and strongly limits the expression of behavioral diversity for a given language input. To resolve these issues, we propose a new conceptual approach to language-conditioned RL: the Language-Goal-Behavior architecture (LGB). LGB decouples skill learning and language grounding via an intermediate semantic representation of the world. To showcase the properties of LGB, we present a specific implementation called DECSTR. DECSTR is an intrinsically motivated learning agent endowed with an innate semantic representation describing spatial relations between physical objects. In a first stage (G -> B), it freely explores its environment and targets self-generated semantic configurations. In a second stage (L -> G), it trains a language-conditioned goal generator to generate semantic goals that match the constraints expressed in language-based inputs. We showcase the additional properties of LGB w.r.t. both an end-to-end LC-RL approach and a similar approach leveraging non-semantic, continuous intermediate representations. Intermediate semantic representations help satisfy language commands in a diversity of ways, enable strategy switching after a failure and facilitate language grounding. △ Less

Submitted 25 January, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: Published at ICLR 2021

arXiv:2006.07043 [pdf, other]

Language-Conditioned Goal Generation: a New Approach to Language Grounding for RL

Authors: Cédric Colas, Ahmed Akakzia, Pierre-Yves Oudeyer, Mohamed Chetouani, Olivier Sigaud

Abstract: In the real world, linguistic agents are also embodied agents: they perceive and act in the physical world. The notion of Language Grounding questions the interactions between language and embodiment: how do learning agents connect or ground linguistic representations to the physical world ? This question has recently been approached by the Reinforcement Learning community under the framework of i… ▽ More In the real world, linguistic agents are also embodied agents: they perceive and act in the physical world. The notion of Language Grounding questions the interactions between language and embodiment: how do learning agents connect or ground linguistic representations to the physical world ? This question has recently been approached by the Reinforcement Learning community under the framework of instruction-following agents. In these agents, behavioral policies or reward functions are conditioned on the embedding of an instruction expressed in natural language. This paper proposes another approach: using language to condition goal generators. Given any goal-conditioned policy, one could train a language-conditioned goal generator to generate language-agnostic goals for the agent. This method allows to decouple sensorimotor learning from language acquisition and enable agents to demonstrate a diversity of behaviors for any given instruction. We propose a particular instantiation of this approach and demonstrate its benefits. △ Less

Submitted 12 June, 2020; originally announced June 2020.

arXiv:2005.06369 [pdf, other]

Progressive growing of self-organized hierarchical representations for exploration

Authors: Mayalen Etcheverry, Pierre-Yves Oudeyer, Chris Reinke

Abstract: Designing agent that can autonomously discover and learn a diversity of structures and skills in unknown changing environments is key for lifelong machine learning. A central challenge is how to learn incrementally representations in order to progressively build a map of the discovered structures and re-use it to further explore. To address this challenge, we identify and target several key functi… ▽ More Designing agent that can autonomously discover and learn a diversity of structures and skills in unknown changing environments is key for lifelong machine learning. A central challenge is how to learn incrementally representations in order to progressively build a map of the discovered structures and re-use it to further explore. To address this challenge, we identify and target several key functionalities. First, we aim to build lasting representations and avoid catastrophic forgetting throughout the exploration process. Secondly we aim to learn a diversity of representations allowing to discover a "diversity of diversity" of structures (and associated skills) in complex high-dimensional environments. Thirdly, we target representations that can structure the agent discoveries in a coarse-to-fine manner. Finally, we target the reuse of such representations to drive exploration toward an "interesting" type of diversity, for instance leveraging human guidance. Current approaches in state representation learning rely generally on monolithic architectures which do not enable all these functionalities. Therefore, we present a novel technique to progressively construct a Hierarchy of Observation Latent Models for Exploration Stratification, called HOLMES. This technique couples the use of a dynamic modular model architecture for representation learning with intrinsically-motivated goal exploration processes (IMGEPs). The paper shows results in the domain of automated discovery of diverse self-organized patterns, considering as testbed the experimental framework from Reinke et al. (2019). △ Less

Submitted 13 May, 2020; originally announced May 2020.

arXiv:2004.04546 [pdf, other]

SpatialSim: Recognizing Spatial Configurations of Objects with Graph Neural Networks

Authors: Laetitia Teodorescu, Katja Hofmann, Pierre-Yves Oudeyer

Abstract: Recognizing precise geometrical configurations of groups of objects is a key capability of human spatial cognition, yet little studied in the deep learning literature so far. In particular, a fundamental problem is how a machine can learn and compare classes of geometric spatial configurations that are invariant to the point of view of an external observer. In this paper we make two key contributi… ▽ More Recognizing precise geometrical configurations of groups of objects is a key capability of human spatial cognition, yet little studied in the deep learning literature so far. In particular, a fundamental problem is how a machine can learn and compare classes of geometric spatial configurations that are invariant to the point of view of an external observer. In this paper we make two key contributions. First, we propose SpatialSim (Spatial Similarity), a novel geometrical reasoning benchmark, and argue that progress on this benchmark would pave the way towards a general solution to address this challenge in the real world. This benchmark is composed of two tasks: Identification and Comparison, each one instantiated in increasing levels of difficulty. Secondly, we study how relational inductive biases exhibited by fully-connected message-passing Graph Neural Networks (MPGNNs) are useful to solve those tasks, and show their advantages over less relational baselines such as Deep Sets and unstructured models such as Multi-Layer Perceptrons. Finally, we highlight the current limits of GNNs in these tasks. △ Less

Submitted 16 July, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

arXiv:2004.03472 [pdf, other]

doi 10.1145/3313831.3376776

Pedagogical Agents for Fostering Question-Asking Skills in Children

Authors: Mehdi Alaimi, Edith Law, Kevin Daniel Pantasdo, Pierre-Yves Oudeyer, Helene Sauzeon

Abstract: Question asking is an important tool for constructing academic knowledge, and a self-reinforcing driver of curiosity. However, research has found that question asking is infrequent in the classroom and children's questions are often superficial, lacking deep reasoning. In this work, we developed a pedagogical agent that encourages children to ask divergent-thinking questions, a more complex form o… ▽ More Question asking is an important tool for constructing academic knowledge, and a self-reinforcing driver of curiosity. However, research has found that question asking is infrequent in the classroom and children's questions are often superficial, lacking deep reasoning. In this work, we developed a pedagogical agent that encourages children to ask divergent-thinking questions, a more complex form of questions that is associated with curiosity. We conducted a study with 95 fifth grade students, who interacted with an agent that encourages either convergent-thinking or divergent-thinking questions. Results showed that both interventions increased the number of divergent-thinking questions and the fluency of question asking, while they did not significantly alter children's perception of curiosity despite their high intrinsic motivation scores. In addition, children's curiosity trait has a mediating effect on question asking under the divergent-thinking agent, suggesting that question-asking interventions must be personalized to each student based on their tendency to be curious. △ Less

Submitted 7 April, 2020; originally announced April 2020.

Comments: Accepted at CHI 2020

ACM Class: K.3.0; K.3.1; K.4.2; J.0; J.4

Showing 1–50 of 84 results for author: Oudeyer, P