About

Stefan Wermter

University of Hamburg, Dept. of Informatics, Faculty Member

Other Affiliations:
add
Research Interests:
Deep Learning, Gesture Recognition, Self-Organization, Human-Robot Interaction, Artificial Neural Networks, Natural Language Processing, and 2 moreHybrid Intelligent Systems and Speech Recognitionedit
About:
edit
Advisors:
edit

Inspired by the success of the Transformer architecture in natural language processing and computer vision, we investigate the use of Transformers in Reinforcement Learning (RL), specifically in modeling the environment's dynamics using Transformer Dynamics Models (TDMs). We evaluate the capabilities of TDMs for continuous control in real-time planning scenarios with Model Predictive Control (MPC). While Transformers excel in long-horizon prediction, their tokenization mechanism and autoregressive nature lead to costly planning over long horizons, especially as the environment's dimensionality increases. To alleviate this issue, we use a TDM for short-term planning, and learn an autoregressive discrete Q-function using a separate Q-Transformer (QT) model to estimate a long-term return beyond the short-horizon planning. Our proposed method, QT-TDM, integrates the robust predictive capabilities of Transformers as dynamics models with the efficacy of a model-free Q-Transformer to mitigate the computational burden associated with real-time planning. Experiments in diverse state-based continuous control tasks show that QT-TDM is superior in performance and sample efficiency compared to existing Transformer-based RL models while achieving fast and computationally efficient inference.

Publication Date: 2024

Publication Name: arXiv:2407.18841

Research Interests:
Reinforcement Learning, Machine Learning, and Transformers

Architectures for vision-based robot manipulation often utilize separate domain adaption models to allow sim-to-real transfer and an inverse kinematics solver to allow the actual policy to operate in Cartesian space. We present a novel end-to-end visuomotor architecture that combines domain adaption and inherent inverse kinematics in one model. Using the same latent encoding, it jointly learns to reconstruct canonical simulation images from randomized inputs and to predict the corresponding joint angles that minimize the Cartesian error towards a depicted target object via differentiable forward kinematics. We evaluate our model in a sim-to-real grasping experiment with the NICO humanoid robot by comparing different randomization and adaption conditions both directly and with additional real-world finetuning. Our combined method significantly increases the resulting accuracy and allows a finetuned model to reach a success rate of 80.30%, outperforming a real-world model trained with six times as much real data.

Publication Date: 2024

Publication Name: Proceedings of the International Joint Conference on Neural Networks, Yokohama, Japan

Research Interests:
Robot Vision, Humanoid Robotics, Neurorobotics, Humanoid robots, Inverse Kinematics, and 2 moreHumanoid robot and Neuro-robotics

We address the Continual Learning (CL) problem, where a model has to learn a sequence of tasks from non-stationary distributions while preserving prior knowledge as it encounters new experiences. With the advancement of foundation models, CL research has shifted focus from the initial learning-from-scratch paradigm to the use of generic features from large-scale pre-training. However, existing approaches to CL with pre-trained models only focus on separating the class-specific features from the final representation layer and neglect the power of intermediate representations that capture low- and mid-level features naturally more invariant to domain shifts. In this work, we propose LayUP, a new class-prototype-based approach to continual learning that leverages second-order feature statistics from multiple intermediate layers of a pre-trained network. Our method is conceptually simple, does not require any replay buffer, and works out of the box with any foundation model. LayUP improves over the state-of-the-art on four of the seven class-incremental learning settings at a considerably reduced memory and computational footprint compared with the next best baseline. Our results demonstrate that fully exhausting the representational capacities of pre-trained models in CL goes far beyond their final embeddings.

DOI: 10.48550/arXiv.2312.08888

Publication Date: 2024

Publication Name: Transactions on Machine Learning Research (TMLR)

Research Interests:
Machine Learning and Continual Learning

Imitation can allow us to quickly gain an understanding of a new task. Through a demonstration, we can gain direct knowledge about which actions need to be performed and which goals they have. In this paper, we introduce a new approach to imitation learning that tackles the challenges of a robot imitating a human, such as the change in perspective and body schema. Our approach can use a single human demonstration to abstract information about the demonstrated task, and use that information to generalise and replicate it. We facilitate this ability by a new integration of two state-of-the-art methods: a diffusion action segmentation model to abstract temporal information from the demonstration and an open vocabulary object detector for spatial information. Furthermore, we refine the abstracted information and use symbolic reasoning to create an action plan utilising inverse kinematics, to allow the robot to imitate the demonstrated action.

DOI: 10.48550/arXiv.2401.08381

Publication Date: 2024

Publication Name: IEEE International Conference on Development and Learning

Research Interests:
Robotics, Robotics (Computer Science), Machine Learning, Humanoid Robotics, Human-Robot Interaction, and 6 moreHumanoid robots, Robots, Human Robot Interaction, Inverse Kinematics, Robot Imitation, and Humanoid robot

Recent advancements in large language models have showcased their remarkable generalizability across various domains. However, their reasoning abilities still have significant room for improvement, especially when confronted with scenarios requiring multi-step reasoning. Although large language models possess extensive knowledge, their reasoning often fails to effectively utilize this knowledge to establish a coherent thinking paradigm. These models sometimes show hallucinations as their reasoning procedures are unconstrained by logical principles. Aiming at improving the zero-shot chain-of-thought reasoning ability of large language models, we propose LoT (Logical Thoughts), a self-improvement prompting framework that leverages principles rooted in symbolic logic, particularly Reductio ad Absurdum, to systematically verify and rectify the reasoning processes step by step. Experimental evaluations conducted on language tasks in diverse domains, including arithmetic, commonsense, symbolic, causal inference, and social problems, demonstrate the efficacy of enhanced reasoning by logic. The implementation code for LoT can be accessed at: https://github.com/xf-zhao/LoT

DOI: 10.48550/arXiv.2309.13339

Publication Date: 2024

Publication Name: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Research Interests:
Artificial Intelligence, Machine Learning, Artficial Neural Networks, and Large language models

This paper introduces a novel zero-shot motion planning method that allows users to quickly design smooth robot motions in Cartesian space. A Bézier curve-based Cartesian plan is transformed into a joint space trajectory by our neuro-inspired inverse kinematics (IK) method CycleIK, for which we enable platform independence by scaling it to arbitrary robot designs. The motion planner is evaluated on the physical hardware of the two humanoid robots NICO and NICOL in a human-in-the-loop grasping scenario. Our method is deployed with an embodied agent that is a large language model (LLM) at its core. We generalize the embodied agent, that was introduced for NICOL, to also be embodied by NICO. The agent can execute a discrete set of physical actions and allows the user to verbally instruct various different robots. We contribute a grasping primitive to its action space that allows for precise manipulation of household objects. The new CycleIK method is compared to popular numerical IK solvers and state-of-the-art neural IK methods in simulation and is shown to be competitive with or outperform all evaluated methods when the algorithm runtime is very short. The grasping primitive is evaluated on both NICOL and NICO robots with a reported grasp success of 72% to 82% for each robot, respectively.

DOI: 10.48550/arXiv.2404.08825

Publication Date: 2024

Research Interests:
Robotics, Humanoid Robotics, Humanoid robots, Robots, Inverse Kinematics, and 2 moreHumanoid robot and Large language models

Intention-based Human-Robot Interaction (HRI) systems allow robots to perceive and interpret user actions to proactively interact with humans and adapt to their behavior. Therefore, intention prediction is pivotal in creating a natural interactive collaboration between humans and robots. In this paper, we examine the use of Large Language Models (LLMs) for inferring human intention during a collaborative object categorization task with a physical robot. We introduce a hierarchical approach for interpreting user non-verbal cues, like hand gestures, body poses, and facial expressions and combining them with environment states and user verbal cues captured using an existing Automatic Speech Recognition (ASR) system. Our evaluation demonstrates the potential of LLMs to interpret non-verbal cues and to combine them with their context-understanding capabilities and real-world knowledge to support intention prediction during human-robot interaction.

DOI: 10.48550/arXiv.2404.08424

Publication Date: 2024

Research Interests:
Robotics, Machine Learning, Human-Robot Interaction, Robots, Human Robot Interaction, and Large language models

Humanoid robots can benefit from their similarity to the human shape by learning from humans. When humans teach other humans how to perform actions, they often demonstrate the actions and the learning human can try to imitate the demonstration. Being able to mentally transfer from a demonstration seen from a third-person perspective to how it should look from a first-person perspective is fundamental for this ability in humans. As this is a challenging task, it is often simplified for robots by creating a demonstration in the first-person perspective. Creating these demonstrations requires more effort but allows for an easier imitation. We introduce a novel diffusion model aimed at enabling the robot to directly learn from the third-person demonstrations. Our model is capable of learning and generating the first-person perspective from the third-person perspective by translating the size and rotations of objects and the environment between two perspectives. This allows us to utilise the benefits of easy-to-produce third-person demonstrations and easy-to-imitate first-person demonstrations. The model can either represent the first-person perspective in an RGB image or calculate the joint values. Our approach significantly outperforms other image-to-image models in this task.

DOI: 10.48550/arXiv.2404.07735

Publication Date: 2024

Research Interests:
Robotics, Machine Learning, Humanoid Robotics, Humanoid robots, Robots, and Humanoid robot

Although there has been rapid progress in endowing robots with the ability to solve complex manipulation tasks, generating control policies for bimanual robots to solve tasks involving two hands is still challenging because of the difficulties in effective temporal and spatial coordination. With emergent abilities in terms of step-by-step reasoning and in-context learning, Large Language Models (LLMs) have taken control of a variety of robotic tasks. However, the nature of language communication via a single sequence of discrete symbols makes LLM-based coordination in continuous space a particular challenge for bimanual tasks. To tackle this challenge for the first time by an LLM, we present LAnguage-model-based Bimanual ORchestration (LABOR), an agent utilizing an LLM to analyze task configurations and devise coordination control policies for addressing long-horizon bimanual tasks. In the simulated environment, the LABOR agent is evaluated through several everyday tasks on the NICOL humanoid robot. Reported success rates indicate that overall coordination efficiency is close to optimal performance, while the analysis of failure causes, classified into spatial and temporal coordination and skill selection, shows that these vary over tasks. The project website can be found at https://labor-agent.github.io/

DOI: 10.48550/arXiv.2404.02018

Publication Date: 2024

Research Interests:
Robotics, Machine Learning, Humanoid Robotics, Humanoid robots, Robots, and 2 moreHumanoid robot and Large language models

Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be quite challenging. This lack of transparency in RL models has been a long-standing problem, making it difficult for users to grasp the reasons behind an agent's behaviour. Various approaches have been explored to address this problem, with one promising avenue being reward decomposition (RD). RD is appealing as it sidesteps some of the concerns associated with other methods that attempt to rationalize an agent's behaviour in a post-hoc manner. RD works by exposing various facets of the rewards that contribute to the agent's objectives during training. However, RD alone has limitations as it primarily offers insights based on sub-rewards and does not delve into the intricate cause-and-effect relationships that occur within an RL agent's neural model. In this paper, we present an extension of RD that goes beyond subrewards to provide more informative explanations. Our approach is centred on a causal learning framework that leverages information-theoretic measures for explanation objectives that encourage three crucial properties of causal factors: causal sufficiency, sparseness, and orthogonality. These properties help us distill the cause-and-effect relationships between the agent's states and actions or rewards, allowing for a deeper understanding of its decision-making processes. Our framework is designed to generate local explanations and can be applied to a wide range of RL tasks with multiple reward channels. Through a series of experiments, we demonstrate that our approach offers more meaningful and insightful explanations for the agents action selections.

Stefan Wermter

Publication Date: 2024

Publication Name: arXiv:2407.18841

Research Interests: Reinforcement Learning, Machine Learning, and Transformers<div>()</div>

Publication Date: 2024

Publication Name: Proceedings of the International Joint Conference on Neural Networks, Yokohama, Japan

DOI: 10.48550/arXiv.2312.08888

Publication Date: 2024

Publication Name: Transactions on Machine Learning Research (TMLR)

Research Interests: Machine Learning and Continual Learning<div>()</div>

DOI: 10.48550/arXiv.2401.08381

Publication Date: 2024

Publication Name: IEEE International Conference on Development and Learning

DOI: 10.48550/arXiv.2309.13339

Publication Date: 2024

Publication Name: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Research Interests: Artificial Intelligence, Machine Learning, Artficial Neural Networks, and Large language models<div>()</div>

DOI: 10.48550/arXiv.2404.08825

Publication Date: 2024

DOI: 10.48550/arXiv.2404.08424

Publication Date: 2024

Research Interests: Robotics, Machine Learning, Human-Robot Interaction, Robots, Human Robot Interaction, and Large language models<div>()</div>

DOI: 10.48550/arXiv.2404.07735

Publication Date: 2024

Research Interests: Robotics, Machine Learning, Humanoid Robotics, Humanoid robots, Robots, and Humanoid robot<div>()</div>

DOI: 10.48550/arXiv.2404.02018

Publication Date: 2024

Publication Date: 2024

Publication Name: Proceedings of the Third Conference on Causal Learning and Reasoning

Research Interests: Reinforcement Learning, Machine Learning, Artificial Neural Networks, and Artifical Neural Networks<div>()</div>

DOI: 10.48550/arXiv.2211.08843

Publication Date: 2024

Publication Name: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

DOI: 10.1145/3610978.3640580

Publication Date: 2024

Publication Name: 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI'24 Companion)

Research Interests: Robotics, Artificial Intelligence, Machine Learning, Human-Robot Interaction, Robots, and Human Robot Interaction<div>()</div>

DOI: 10.1145/3610977.3637471

Publication Date: 2024

Publication Name: 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI '24)

Research Interests: Robotics, Machine Learning, Python, Deep Learning, and Robots<div>()</div>

DOI: 10.1016/j.knosys.2024.111437

Publication Date: 2024

Publication Name: Knowledge-Based Systems

Research Interests: Artificial Intelligence and Machine Learning<div>()</div>

DOI: 10.48550/arXiv.2407.00518

Publication Date: 2024

Publication Name: Proceedings of the International Conference on Artificial Neural Networks

DOI: 10.48550/arXiv.2401.08381

Publication Date: 2024

Publication Name: IEEE International Conference on Development and Learning

Research Interests: Robotics, Humanoid Robotics, Humanoid robots, and Inverse Kinematics<div>()</div>

DOI: 10.1109/TCDS.2023.3315513

Publication Date: 2024

Publication Name: IEEE Transactions on Cognitive and Developmental Systems

Research Interests: Robotics, Behavioral Sciences, Robot Learning, Robots, Continual Learning, and Self-Supervised Learning<div>()</div>

DOI: 10.1016/j.nlp.2024.100072

Publication Date: 2024

Publication Name: Natural Language Processing Journal Volume 7

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Name: IEEE Access

Publisher: IEEE

Publication Date: 2017

Publication Name: 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN)

Publication Date: 2023

Publication Name: Proceedings of the IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)

DOI: 10.48550/arXiv.2310.15571

Publication Date: 2023

Publication Name: arXiv:2310.15571 [cs.CL]

Research Interests: Artificial Intelligence, Machine Learning, Neural Networks, Artificial Neural Networks, and Continual Learning<div>()</div>

DOI: 10.48550/arXiv.2310.11884

Publication Date: 2023

Publication Name: arXiv:2310.11884 [cs.AI]

Publication Date: 2023

Publication Name: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Research Interests: Robotics, Artificial Intelligence, Reinforcement Learning, Machine Learning, Humanoid Robotics, and Cross Entropy<div>()</div>

Publication Date: 2023

Publication Name: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Research Interests: Robotics, Machine Learning, Humanoid Robotics, Multimodal Interaction, and Large language models<div>()</div>

DOI: 10.1007/978-3-031-44210-0_42

Research Interests:
Reinforcement Learning, Machine Learning, and Transformers

Research Interests:
Machine Learning and Continual Learning

Research Interests:
Artificial Intelligence, Machine Learning, Artficial Neural Networks, and Large language models

Research Interests:
Robotics, Machine Learning, Human-Robot Interaction, Robots, Human Robot Interaction, and Large language models

Research Interests:
Robotics, Machine Learning, Humanoid Robotics, Humanoid robots, Robots, and Humanoid robot

Research Interests:
Reinforcement Learning, Machine Learning, Artificial Neural Networks, and Artifical Neural Networks

Research Interests:
Robotics, Artificial Intelligence, Machine Learning, Human-Robot Interaction, Robots, and Human Robot Interaction

Research Interests:
Robotics, Machine Learning, Python, Deep Learning, and Robots

Research Interests:
Artificial Intelligence and Machine Learning

Research Interests:
Robotics, Humanoid Robotics, Humanoid robots, and Inverse Kinematics

Research Interests:
Robotics, Behavioral Sciences, Robot Learning, Robots, Continual Learning, and Self-Supervised Learning

Research Interests:
Artificial Intelligence, Machine Learning, Neural Networks, Artificial Neural Networks, and Continual Learning

Research Interests:
Robotics, Artificial Intelligence, Reinforcement Learning, Machine Learning, Humanoid Robotics, and Cross Entropy

Research Interests:
Robotics, Machine Learning, Humanoid Robotics, Multimodal Interaction, and Large language models

Research Interests:
Artificial Intelligence, Machine Learning, Semantic Segmentation, and Neural Fields

Research Interests:
Machine Learning, Humanoid Robotics, Genetic Algorithms, Humanoid robots, Generative Adversarial Networks, and generative adversarial networks (GANs)

Research Interests:
Robotics, Artificial Intelligence, Machine Learning, and Multimodality

Research Interests:
Machine Learning, Transfer Learning, Automatic Speech Recognition, Domain Adaptation, and Continual Learning

Research Interests:
Machine Learning, Automatic Speech Recognition, Speech Recognition, Conformers, and Noise Robustness

Research Interests:
Artificial Intelligence, Machine Learning, Gesture Recognition, Echo State Networks, long short-term memory (LSTM), and Long Short Term Memory

Research Interests:
Computer Science, Artificial Intelligence, Embodied Cognition, Grasp, and arXiv

Research Interests:
Computer Science, Distributed Computing, Artificial Intelligence, Computer Vision, and Probabilistic Logic

Research Interests:
Computer Science, Artificial Intelligence, and Robustness (evolution)