Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 153 results for author: Wermter, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.18841  [pdf, other

    cs.LG

    QT-TDM: Planning with Transformer Dynamics Model and Autoregressive Q-Learning

    Authors: Mostafa Kotb, Cornelius Weber, Muhammad Burhan Hafez, Stefan Wermter

    Abstract: Inspired by the success of the Transformer architecture in natural language processing and computer vision, we investigate the use of Transformers in Reinforcement Learning (RL), specifically in modeling the environment's dynamics using Transformer Dynamics Models (TDMs). We evaluate the capabilities of TDMs for continuous control in real-time planning scenarios with Model Predictive Control (MPC)… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  2. arXiv:2407.13505  [pdf, other

    cs.RO cs.AI

    Robots Can Multitask Too: Integrating a Memory Architecture and LLMs for Enhanced Cross-Task Robot Action Generation

    Authors: Hassan Ali, Philipp Allgeuer, Carlo Mazzola, Giulia Belgiovine, Burak Can Kaplan, Stefan Wermter

    Abstract: Large Language Models (LLMs) have been recently used in robot applications for grounding LLM common-sense reasoning with the robot's perception and physical abilities. In humanoid robots, memory also plays a critical role in fostering real-world embodiment and facilitating long-term interactive capabilities, especially in multi-task setups where the robot must remember previous task states, enviro… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.11211  [pdf, other

    cs.CV cs.AI cs.CL

    Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion

    Authors: Philipp Allgeuer, Kyra Ahrens, Stefan Wermter

    Abstract: We introduce NOVIC, an innovative uNconstrained Open Vocabulary Image Classifier that uses an autoregressive transformer to generatively output classification labels as language. Leveraging the extensive knowledge of CLIP models, NOVIC harnesses the embedding space to enable zero-shot transfer from pure text to images. Traditional CLIP models, despite their ability for open vocabulary classificati… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  4. arXiv:2407.00518  [pdf, other

    cs.RO

    When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration

    Authors: Philipp Allgeuer, Hassan Ali, Stefan Wermter

    Abstract: We investigate the use of Large Language Models (LLMs) to equip neural robotic agents with human-like social and cognitive competencies, for the purpose of open-ended human-robot conversation and collaboration. We introduce a modular and extensible methodology for grounding an LLM with the sensory perceptions and capabilities of a physical robot, and integrate multiple deep learning models through… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Journal ref: International Conference on Artificial Neural Networks 2024

  5. arXiv:2406.18505  [pdf, other

    cs.LG cs.AI cs.CL cs.RO

    Mental Modeling of Reinforcement Learning Agents by Language Models

    Authors: Wenhao Lu, Xufeng Zhao, Josua Spisak, Jae Hee Lee, Stefan Wermter

    Abstract: Can emergent language models faithfully model the intelligence of decision-making agents? Though modern language models exhibit already some reasoning ability, and theoretically can potentially express any probable distribution over tokens, it remains underexplored how the world knowledge these pretrained models have memorized can be utilized to comprehend an agent's behaviour in the physical worl… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: https://lukaswill.github.io/

  6. arXiv:2406.09988  [pdf, other

    cs.AI cs.CL cs.RO

    Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning

    Authors: Xiaowen Sun, Xufeng Zhao, Jae Hee Lee, Wenhao Lu, Matthias Kerzel, Stefan Wermter

    Abstract: The state of an object reflects its current status or condition and is important for a robot's task planning and manipulation. However, detecting an object's state and generating a state-sensitive plan for robots is challenging. Recently, pre-trained Large Language Models (LLMs) and Vision-Language Models (VLMs) have shown impressive capabilities in generating plans. However, to the best of our kn… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  7. arXiv:2405.15019  [pdf, other

    cs.RO cs.AI cs.LG

    Agentic Skill Discovery

    Authors: Xufeng Zhao, Cornelius Weber, Stefan Wermter

    Abstract: Language-conditioned robotic skills make it possible to apply the high-level reasoning of Large Language Models (LLMs) to low-level robotic control. A remaining challenge is to acquire a diverse set of fundamental skills. Existing approaches either manually decompose a complex task into atomic robotic actions in a top-down fashion, or bootstrap as many combinations as possible in a bottom-up fashi… ▽ More

    Submitted 16 August, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Webpage see https://agentic-skill-discovery.github.io/

  8. arXiv:2405.02929  [pdf, other

    cs.CV cs.AI

    Unified Dynamic Scanpath Predictors Outperform Individually Trained Neural Models

    Authors: Fares Abawi, Di Fu, Stefan Wermter

    Abstract: Previous research on scanpath prediction has mainly focused on group models, disregarding the fact that the scanpaths and attentional behaviors of individuals are diverse. The disregard of these differences is especially detrimental to social human-robot interaction, whereby robots commonly emulate human gaze based on heuristics or predefined patterns. However, human gaze patterns are heterogeneou… ▽ More

    Submitted 7 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  9. arXiv:2404.08825  [pdf, other

    cs.RO cs.AI

    Inverse Kinematics for Neuro-Robotic Grasping with Humanoid Embodied Agents

    Authors: Jan-Gerrit Habekost, Connor Gäde, Philipp Allgeuer, Stefan Wermter

    Abstract: This paper introduces a novel zero-shot motion planning method that allows users to quickly design smooth robot motions in Cartesian space. A Bézier curve-based Cartesian plan is transformed into a joint space trajectory by our neuro-inspired inverse kinematics (IK) method CycleIK, for which we enable platform independence by scaling it to arbitrary robot designs. The motion planner is evaluated o… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  10. arXiv:2404.08424  [pdf, other

    cs.RO cs.AI cs.HC

    Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task

    Authors: Hassan Ali, Philipp Allgeuer, Stefan Wermter

    Abstract: Intention-based Human-Robot Interaction (HRI) systems allow robots to perceive and interpret user actions to proactively interact with humans and adapt to their behavior. Therefore, intention prediction is pivotal in creating a natural interactive collaboration between humans and robots. In this paper, we examine the use of Large Language Models (LLMs) for inferring human intention during a collab… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  11. arXiv:2404.07735  [pdf, other

    cs.RO cs.AI

    Diffusing in Someone Else's Shoes: Robotic Perspective Taking with Diffusion

    Authors: Josua Spisak, Matthias Kerzel, Stefan Wermter

    Abstract: Humanoid robots can benefit from their similarity to the human shape by learning from humans. When humans teach other humans how to perform actions, they often demonstrate the actions and the learning human can try to imitate the demonstration. Being able to mentally transfer from a demonstration seen from a third-person perspective to how it should look from a first-person perspective is fundamen… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  12. arXiv:2404.02018  [pdf, other

    cs.RO cs.AI

    Large Language Models for Orchestrating Bimanual Robots

    Authors: Kun Chu, Xufeng Zhao, Cornelius Weber, Mengdi Li, Wenhao Lu, Stefan Wermter

    Abstract: Although there has been rapid progress in endowing robots with the ability to solve complex manipulation tasks, generating control policies for bimanual robots to solve tasks involving two hands is still challenging because of the difficulties in effective temporal and spatial coordination. With emergent abilities in terms of step-by-step reasoning and in-context learning, Large Language Models (L… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: The project website can be found at http://labor-agent.github.io

  13. Human Impression of Humanoid Robots Mirroring Social Cues

    Authors: Di Fu, Fares Abawi, Philipp Allgeuer, Stefan Wermter

    Abstract: Mirroring non-verbal social cues such as affect or movement can enhance human-human and human-robot interactions in the real world. The robotic platforms and control methods also impact people's perception of human-robot interaction. However, limited studies have compared robot imitation across different platforms and control methods. Our research addresses this gap by conducting two experiments c… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI '24 Companion), March 11-14, 2024, Boulder, CO, USA. arXiv admin note: text overlap with arXiv:2302.09648

  14. arXiv:2401.08381  [pdf, other

    cs.RO cs.LG

    Robotic Imitation of Human Actions

    Authors: Josua Spisak, Matthias Kerzel, Stefan Wermter

    Abstract: Imitation can allow us to quickly gain an understanding of a new task. Through a demonstration, we can gain direct knowledge about which actions need to be performed and which goals they have. In this paper, we introduce a new approach to imitation learning that tackles the challenges of a robot imitating a human, such as the change in perspective and body schema. Our approach can use a single hum… ▽ More

    Submitted 3 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at the ICDL 2024

  15. arXiv:2401.00104  [pdf, other

    cs.LG cs.AI stat.ME

    Causal State Distillation for Explainable Reinforcement Learning

    Authors: Wenhao Lu, Xufeng Zhao, Thilo Fryen, Jae Hee Lee, Mengdi Li, Sven Magg, Stefan Wermter

    Abstract: Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be quite challenging. This lack of transparency in RL models has been a long-standing problem, making it difficult for users to grasp the reasons behind an agent's behaviour. Various approaches have been explored to address this problem, with one promi… ▽ More

    Submitted 1 April, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

    Comments: https://lukaswill.github.io/; Accepted as oral by CLeaR 2024

  16. arXiv:2312.08888  [pdf, other

    cs.LG cs.CV

    Read Between the Layers: Leveraging Multi-Layer Representations for Rehearsal-Free Continual Learning with Pre-Trained Models

    Authors: Kyra Ahrens, Hans Hergen Lehmann, Jae Hee Lee, Stefan Wermter

    Abstract: We address the Continual Learning (CL) problem, wherein a model must learn a sequence of tasks from non-stationary distributions while preserving prior knowledge upon encountering new experiences. With the advancement of foundation models, CL research has pivoted from the initial learning-from-scratch paradigm towards utilizing generic features from large-scale pre-training. However, existing appr… ▽ More

    Submitted 5 July, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Accepted for publication in Transactions of Machine Learning Research (TMLR) journal

  17. arXiv:2311.02379  [pdf, other

    cs.RO cs.AI cs.LG

    Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

    Authors: Kun Chu, Xufeng Zhao, Cornelius Weber, Mengdi Li, Stefan Wermter

    Abstract: Reinforcement Learning (RL) plays an important role in the robotic manipulation domain since it allows self-learning from trial-and-error interactions with the environment. Still, sample efficiency and reward specification seriously limit its potential. One possible solution involves learning from expert guidance. However, obtaining a human expert is impractical due to the high cost of supervising… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: CoRL 2023 Workshop (oral)

  18. Visually Grounded Continual Language Learning with Selective Specialization

    Authors: Kyra Ahrens, Lennart Bengtson, Jae Hee Lee, Stefan Wermter

    Abstract: A desirable trait of an artificial agent acting in the visual world is to continually learn a sequence of language-informed tasks while striking a balance between sufficiently specializing in each task and building a generalized knowledge for transfer. Selective specialization, i.e., a careful selection of model components to specialize in each task, is a strategy to provide control over this trad… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 Findings

    Journal ref: Findings of the Association for Computational Linguistics: EMNLP 2023

  19. arXiv:2310.11884  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.NE

    From Neural Activations to Concepts: A Survey on Explaining Concepts in Neural Networks

    Authors: Jae Hee Lee, Sergio Lanza, Stefan Wermter

    Abstract: In this paper, we review recent approaches for explaining concepts in neural networks. Concepts can act as a natural link between learning and reasoning: once the concepts are identified that a neural learning system uses, one can integrate those concepts with a reasoning system for inference or use a reasoning system to act upon them to improve or enhance the learning system. On the other hand, k… ▽ More

    Submitted 3 May, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted in Neurosymbolic Artificial Intelligence

  20. arXiv:2309.13339  [pdf, other

    cs.CL cs.AI cs.LG cs.SC

    Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic

    Authors: Xufeng Zhao, Mengdi Li, Wenhao Lu, Cornelius Weber, Jae Hee Lee, Kun Chu, Stefan Wermter

    Abstract: Recent advancements in large language models have showcased their remarkable generalizability across various domains. However, their reasoning abilities still have significant room for improvement, especially when confronted with scenarios requiring multi-step reasoning. Although large language models possess extensive knowledge, their reasoning often fails to effectively utilize this knowledge to… ▽ More

    Submitted 25 March, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: Accepted in COLING 2024. Code see https://github.com/xf-zhao/LoT

  21. Continual Robot Learning using Self-Supervised Task Inference

    Authors: Muhammad Burhan Hafez, Stefan Wermter

    Abstract: Endowing robots with the human ability to learn a growing set of skills over the course of a lifetime as opposed to mastering single tasks is an open problem in robot learning. While multi-task learning approaches have been proposed to address this problem, they pay little attention to task inference. In order to continually learn new tasks, the robot first needs to infer the task at hand without… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: Accepted for publication in IEEE Transactions on Cognitive and Developmental Systems

  22. arXiv:2309.02145  [pdf, other

    cs.CL cs.SD eess.AS

    Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

    Authors: Patrick Eickhoff, Matthias Möller, Theresa Pekarek Rosin, Johannes Twiefel, Stefan Wermter

    Abstract: In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have reported state-of-the-art performance on various benchmarks. These systems intrinsically learn how to handle and remove noise conditions from speech. Previous research has shown, that it is possible to extract the denoising capabilities of these models into a preproces… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Submitted and accepted for ICANN 2023 (32nd International Conference on Artificial Neural Networks)

  23. arXiv:2307.11554  [pdf, other

    cs.RO cs.AI

    CycleIK: Neuro-inspired Inverse Kinematics

    Authors: Jan-Gerrit Habekost, Erik Strahl, Philipp Allgeuer, Matthias Kerzel, Stefan Wermter

    Abstract: The paper introduces CycleIK, a neuro-robotic approach that wraps two novel neuro-inspired methods for the inverse kinematics (IK) task, a Generative Adversarial Network (GAN), and a Multi-Layer Perceptron architecture. These methods can be used in a standalone fashion, but we also show how embedding these into a hybrid neuro-genetic IK pipeline allows for further optimization via sequential least… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: Accepted at ICANN 2023 (32nd International Conference on Artificial Neural Networks)

  24. arXiv:2307.08471  [pdf, other

    cs.RO cs.AI

    Clarifying the Half Full or Half Empty Question: Multimodal Container Classification

    Authors: Josua Spisak, Matthias Kerzel, Stefan Wermter

    Abstract: Multimodal integration is a key component of allowing robots to perceive the world. Multimodality comes with multiple challenges that have to be considered, such as how to integrate and fuse the data. In this paper, we compare different possibilities of fusing visual, tactile and proprioceptive data. The data is directly recorded on the NICOL robot in an experimental setup in which the robot has t… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: Preprint for ICANN 2023

  25. Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition

    Authors: Theresa Pekarek Rosin, Stefan Wermter

    Abstract: While Automatic Speech Recognition (ASR) models have shown significant advances with the introduction of unsupervised or self-supervised training techniques, these improvements are still only limited to a subsection of languages and speakers. Transfer learning enables the adaptation of large-scale multilingual models to not only low-resource languages but also to more specific speaker groups. Howe… ▽ More

    Submitted 18 October, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: 13 pages, 7 figures, accepted and presented at ICANN 2023

    Journal ref: Artificial Neural Networks and Machine Learning - ICANN 2023, Lecture Notes in Computer Science, vol 14260, 489-500

  26. arXiv:2307.02924  [pdf, other

    cs.RO cs.HC

    The Emotional Dilemma: Influence of a Human-like Robot on Trust and Cooperation

    Authors: Dennis Becker, Diana Rueda, Felix Beese, Brenda Scarleth Gutierrez Torres, Myriem Lafdili, Kyra Ahrens, Di Fu, Erik Strahl, Tom Weber, Stefan Wermter

    Abstract: Increasing anthropomorphic robot behavioral design could affect trust and cooperation positively. However, studies have shown contradicting results and suggest a task-dependent relationship between robots that display emotions and trust. Therefore, this study analyzes the effect of robots that display human-like emotions on trust, cooperation, and participants' emotions. In the between-group study… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Comments: Accepted at 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)

  27. arXiv:2306.13410  [pdf, other

    cs.LG

    Explainable Lifelong Stream Learning Based on "Glocal" Pairwise Fusion

    Authors: Chu Kiong Loo, Wei Shiung Liew, Stefan Wermter

    Abstract: Real-time on-device continual learning applications are used on mobile phones, consumer robots, and smart appliances. Such devices have limited processing and memory storage capabilities, whereas continual learning acquires data over a long period of time. By necessity, lifelong learning algorithms have to be able to operate under such constraints while delivering good performance. This study pres… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

    Comments: 24 pages, 8 figures

  28. arXiv:2305.08744  [pdf, other

    eess.AS cs.LG cs.SD

    Integrating Uncertainty into Neural Network-based Speech Enhancement

    Authors: Huajian Fang, Dennis Becker, Stefan Wermter, Timo Gerkmann

    Abstract: Supervised masking approaches in the time-frequency domain aim to employ deep neural networks to estimate a multiplicative mask to extract clean speech. This leads to a single estimate for each input without any guarantees or measures of reliability. In this paper, we study the benefits of modeling uncertainty in clean speech estimation. Prediction uncertainty is typically categorized into aleator… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: Accepted version

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1587-1600, 2023

  29. arXiv:2305.08528  [pdf, other

    cs.RO

    NICOL: A Neuro-inspired Collaborative Semi-humanoid Robot that Bridges Social Interaction and Reliable Manipulation

    Authors: Matthias Kerzel, Philipp Allgeuer, Erik Strahl, Nicolas Frick, Jan-Gerrit Habekost, Manfred Eppe, Stefan Wermter

    Abstract: Robotic platforms that can efficiently collaborate with humans in physical tasks constitute a major goal in robotics. However, many existing robotic platforms are either designed for social interaction or industrial object manipulation tasks. The design of collaborative robots seldom emphasizes both their social interaction and physical collaboration abilities. To bridge this gap, we present the n… ▽ More

    Submitted 30 October, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Journal ref: Published in IEEE Access 2023

  30. arXiv:2305.02054  [pdf

    cs.LG cs.AI cs.RO

    Map-based Experience Replay: A Memory-Efficient Solution to Catastrophic Forgetting in Reinforcement Learning

    Authors: Muhammad Burhan Hafez, Tilman Immisch, Tom Weber, Stefan Wermter

    Abstract: Deep Reinforcement Learning agents often suffer from catastrophic forgetting, forgetting previously found solutions in parts of the input space when training on new data. Replay Memories are a common solution to the problem, decorrelating and shuffling old and new training samples. They naively store state transitions as they come in, without regard for redundancy. We introduce a novel cognitive-i… ▽ More

    Submitted 28 August, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Journal ref: Frontiers in Neurorobotics 17:1127642 (2023)

  31. arXiv:2305.01507  [pdf, other

    cs.NE cs.LG

    A Parameter-free Adaptive Resonance Theory-based Topological Clustering Algorithm Capable of Continual Learning

    Authors: Naoki Masuyama, Takanori Takebayashi, Yusuke Nojima, Chu Kiong Loo, Hisao Ishibuchi, Stefan Wermter

    Abstract: In general, a similarity threshold (i.e., a vigilance parameter) for a node learning process in Adaptive Resonance Theory (ART)-based algorithms has a significant impact on clustering performance. In addition, an edge deletion threshold in a topological clustering algorithm plays an important role in adaptively generating well-separated clusters during a self-organizing process. In this paper, we… ▽ More

    Submitted 2 May, 2023; v1 submitted 30 April, 2023; originally announced May 2023.

    Comments: This paper is currently under review

  32. arXiv:2304.14371  [pdf, other

    cs.CV cs.LG eess.IV

    Neural Field Conditioning Strategies for 2D Semantic Segmentation

    Authors: Martin Gromniak, Sven Magg, Stefan Wermter

    Abstract: Neural fields are neural networks which map coordinates to a desired signal. When a neural field should jointly model multiple signals, and not memorize only one, it needs to be conditioned on a latent code which describes the signal at hand. Despite being an important aspect, there has been little research on conditioning strategies for neural fields. In this work, we explore the use of neural fi… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: 13 pages, 4 figures, submitted to ICANN 2023

  33. arXiv:2304.12958  [pdf, other

    cs.LG cs.AI cs.RO

    A Closer Look at Reward Decomposition for High-Level Robotic Explanations

    Authors: Wenhao Lu, Xufeng Zhao, Sven Magg, Martin Gromniak, Mengdi Li, Stefan Wermter

    Abstract: Explaining the behaviour of intelligent agents learned by reinforcement learning (RL) to humans is challenging yet crucial due to their incomprehensible proprioceptive states, variational intermediate goals, and resultant unpredictability. Moreover, one-step explanations for RL agents can be ambiguous as they fail to account for the agent's future behaviour at each transition, adding to the comple… ▽ More

    Submitted 3 November, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: https://lukaswill.github.io

  34. arXiv:2304.07219  [pdf, other

    cs.LG cs.AI

    Model Predictive Control with Self-supervised Representation Learning

    Authors: Jonas Matthies, Muhammad Burhan Hafez, Mostafa Kotb, Stefan Wermter

    Abstract: Over the last few years, we have not seen any major developments in model-free or model-based learning methods that would make one obsolete relative to the other. In most cases, the used technique is heavily dependent on the use case scenario or other attributes, e.g. the environment. Both approaches have their own advantages, for example, sample efficiency or computational efficiency. However, wh… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

  35. arXiv:2303.15042  [pdf, other

    eess.AS cs.LG cs.RO cs.SD

    Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise

    Authors: Huajian Fang, Niklas Wittmer, Johannes Twiefel, Stefan Wermter, Timo Gerkmann

    Abstract: Human-robot interaction relies on a noise-robust audio processing module capable of estimating target speech from audio recordings impacted by environmental noise, as well as self-induced noise, so-called ego-noise. While external ambient noise sources vary from environment to environment, ego-noise is mainly caused by the internal motors and joints of a robot. Ego-noise and environmental noise re… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

  36. arXiv:2303.14285  [pdf, other

    cs.RO

    The Robot in the Room: Influence of Robot Facial Expressions and Gaze on Human-Human-Robot Collaboration

    Authors: Di Fu, Fares Abawi, Stefan Wermter

    Abstract: Robot facial expressions and gaze are important factors for enhancing human-robot interaction (HRI), but their effects on human collaboration and perception are not well understood, for instance, in collaborative game scenarios. In this study, we designed a collaborative triadic HRI game scenario, where two participants worked together to insert objects into a shape sorter. One participant assumed… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: 7 pages, 6 figures, 1 table

  37. arXiv:2303.08268  [pdf, other

    cs.RO cs.AI cs.CL cs.LG cs.SD eess.AS

    Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

    Authors: Xufeng Zhao, Mengdi Li, Cornelius Weber, Muhammad Burhan Hafez, Stefan Wermter

    Abstract: Programming robot behavior in a complex world faces challenges on multiple levels, from dextrous low-level skills to high-level planning and reasoning. Recent pre-trained Large Language Models (LLMs) have shown remarkable reasoning ability in few-shot robotic planning. However, it remains challenging to ground LLMs in multimodal sensory input and continuous action output, while enabling a robot to… ▽ More

    Submitted 11 October, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: IROS2023, Detroit. See the project website at https://matcha-agent.github.io

  38. arXiv:2303.03787  [pdf, other

    cs.LG

    Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method and Contrastive Learning

    Authors: Mostafa Kotb, Cornelius Weber, Stefan Wermter

    Abstract: Model-based reinforcement learning (MBRL) with real-time planning has shown great potential in locomotion and manipulation control tasks. However, the existing planning methods, such as the Cross-Entropy Method (CEM), do not scale well to complex high-dimensional environments. One of the key reasons for underperformance is the lack of exploration, as these planning methods only aim to maximize the… ▽ More

    Submitted 10 September, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: 7 pages, 4 figures

  39. Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition

    Authors: Leyuan Qu, Cornelius Weber, Stefan Wermter

    Abstract: Due to the dynamic nature of human language, automatic speech recognition (ASR) systems need to continuously acquire new vocabulary. Out-Of-Vocabulary (OOV) words, such as trending words and new named entities, pose problems to modern ASR systems that require long training times to adapt their large numbers of parameters. Different from most previous research focusing on language model post-proces… ▽ More

    Submitted 21 February, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: Neural Networks, Volume 161, April 2023, Pages 494-504

  40. Wrapyfi: A Python Wrapper for Integrating Robots, Sensors, and Applications across Multiple Middleware

    Authors: Fares Abawi, Philipp Allgeuer, Di Fu, Stefan Wermter

    Abstract: Message oriented and robotics middleware play an important role in facilitating robot control, abstracting complex functionality, and unifying communication patterns between sensors and devices. However, using multiple middleware frameworks presents a challenge in integrating different robots within a single system. To address this challenge, we present Wrapyfi, a Python wrapper supporting multipl… ▽ More

    Submitted 19 January, 2024; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: Accepted at HRI 2024

  41. arXiv:2302.00270  [pdf, other

    cs.LG cs.AI

    Internally Rewarded Reinforcement Learning

    Authors: Mengdi Li, Xufeng Zhao, Jae Hee Lee, Cornelius Weber, Stefan Wermter

    Abstract: We study a class of reinforcement learning problems where the reward signals for policy learning are generated by an internal reward model that is dependent on and jointly optimized with the policy. This interdependence between the policy and the reward model leads to an unstable learning process because reward signals from an immature reward model are noisy and impede policy learning, and convers… ▽ More

    Submitted 24 August, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023. Update: adopt the term "reward model" instead of using "critic" to prevent confusion with the term "critic" in actor-critic algorithms. Project webpage at https://ir-rl.github.io

  42. arXiv:2301.03353  [pdf, other

    cs.CL cs.AI cs.NE cs.RO

    Learning Bidirectional Action-Language Translation with Limited Supervision and Incongruent Input

    Authors: Ozan Özdemir, Matthias Kerzel, Cornelius Weber, Jae Hee Lee, Muhammad Burhan Hafez, Patrick Bruns, Stefan Wermter

    Abstract: Human infant learning happens during exploration of the environment, by interaction with objects, and by listening to and repeating utterances casually, which is analogous to unsupervised learning. Only occasionally, a learning infant would receive a matching verbal description of an action it is committing, which is similar to supervised learning. Such a learning mechanism can be mimicked with de… ▽ More

    Submitted 22 February, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

    Comments: Published in: Applied Artificial Intelligence, 37:1, 2179167

    Journal ref: Applied Artificial Intelligence Volume 37, 2023 - Issue 1

  43. arXiv:2212.06972  [pdf, other

    cs.SD cs.CL eess.AS

    Disentangling Prosody Representations with Unsupervised Speech Reconstruction

    Authors: Leyuan Qu, Taihao Li, Cornelius Weber, Theresa Pekarek-Rosin, Fuji Ren, Stefan Wermter

    Abstract: Human speech can be characterized by different components, including semantic content, speaker identity and prosodic information. Significant progress has been made in disentangling representations for semantic content and speaker identity in Automatic Speech Recognition (ASR) and speaker verification tasks respectively. However, it is still an open challenging research question to extract prosodi… ▽ More

    Submitted 25 September, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

  44. arXiv:2212.04231  [pdf, other

    cs.CV cs.CL

    Harnessing the Power of Multi-Task Pretraining for Ground-Truth Level Natural Language Explanations

    Authors: Björn Plüster, Jakob Ambsdorf, Lukas Braach, Jae Hee Lee, Stefan Wermter

    Abstract: Natural language explanations promise to offer intuitively understandable explanations of a neural network's decision process in complex vision-language tasks, as pursued in recent VL-NLE models. While current models offer impressive performance on task accuracy and explanation plausibility, they suffer from a range of issues: Some models feature a modular design where the explanation generation m… ▽ More

    Submitted 29 March, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: Minor changes

  45. arXiv:2211.15566  [pdf, other

    cs.AI

    Neuro-Symbolic Spatio-Temporal Reasoning

    Authors: Jae Hee Lee, Michael Sioutis, Kyra Ahrens, Marjan Alirezaie, Matthias Kerzel, Stefan Wermter

    Abstract: Knowledge about space and time is necessary to solve problems in the physical world: An AI agent situated in the physical world and interacting with objects often needs to reason about positions of and relations between objects; and as soon as the agent plans its actions to solve a task, it needs to consider the temporal aspect (e.g., what actions to perform over time). Spatio-temporal knowledge,… ▽ More

    Submitted 13 January, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: Contribution to the book "A Compendium of Neuro-Symbolic Artificial Intelligence", which is to appear in the first half of 2023

  46. arXiv:2211.15377  [pdf, other

    eess.AS cs.CV cs.LG cs.NE cs.SD

    Whose Emotion Matters? Speaking Activity Localisation without Prior Knowledge

    Authors: Hugo Carneiro, Cornelius Weber, Stefan Wermter

    Abstract: The task of emotion recognition in conversations (ERC) benefits from the availability of multiple modalities, as provided, for example, in the video-based Multimodal EmotionLines Dataset (MELD). However, only a few research approaches use both acoustic and visual information from the MELD videos. There are two reasons for this: First, label-to-video alignments in MELD are noisy, making those video… ▽ More

    Submitted 15 August, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: 17 pages, 8 figures, 7 tables, Published in Neurocomputing

    MSC Class: 68T20 ACM Class: I.2.0

    Journal ref: Neurocomputing (2023); Volume 545; 126271

  47. arXiv:2211.12930  [pdf, other

    cs.RO cs.AI

    Introspection-based Explainable Reinforcement Learning in Episodic and Non-episodic Scenarios

    Authors: Niclas Schroeter, Francisco Cruz, Stefan Wermter

    Abstract: With the increasing presence of robotic systems and human-robot environments in today's society, understanding the reasoning behind actions taken by a robot is becoming more important. To increase this understanding, users are provided with explanations as to why a specific action was taken. Among other effects, these explanations improve the trust of users in their robotic partners. One option fo… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

  48. arXiv:2211.12054  [pdf, other

    cs.CV cs.AI cs.CL

    Visually Grounded Commonsense Knowledge Acquisition

    Authors: Yuan Yao, Tianyu Yu, Ao Zhang, Mengdi Li, Ruobing Xie, Cornelius Weber, Zhiyuan Liu, Hai-Tao Zheng, Stefan Wermter, Tat-Seng Chua, Maosong Sun

    Abstract: Large-scale commonsense knowledge bases empower a broad range of AI applications, where the automatic extraction of commonsense knowledge (CKE) is a fundamental and challenging problem. CKE from text is known for suffering from the inherent sparsity and reporting bias of commonsense in text. Visual perception, on the other hand, contains rich commonsense knowledge about real-world entities, e.g.,… ▽ More

    Submitted 25 March, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: Accepted by AAAI 2023

  49. arXiv:2211.08843  [pdf, other

    cs.SD cs.AI eess.AS

    Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer

    Authors: Leyuan Qu, Wei Wang, Cornelius Weber, Pengcheng Yue, Taihao Li, Stefan Wermter

    Abstract: Humans can effortlessly modify various prosodic attributes, such as the placement of stress and the intensity of sentiment, to convey a specific emotion while maintaining consistent linguistic content. Motivated by this capability, we propose EmoAug, a novel style transfer model designed to enhance emotional expression and tackle the data scarcity issue in speech emotion recognition tasks. EmoAug… ▽ More

    Submitted 28 December, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Accepted by ICASSP2024

  50. arXiv:2210.07851  [pdf, other

    cs.RO cs.AI

    Learning to Autonomously Reach Objects with NICO and Grow-When-Required Networks

    Authors: Nima Rahrakhshan, Matthias Kerzel, Philipp Allgeuer, Nicolas Duczek, Stefan Wermter

    Abstract: The act of reaching for an object is a fundamental yet complex skill for a robotic agent, requiring a high degree of visuomotor control and coordination. In consideration of dynamic environments, a robot capable of autonomously adapting to novel situations is desired. In this paper, a developmental robotics approach is used to autonomously learn visuomotor coordination on the NICO (Neuro-Inspired… ▽ More

    Submitted 17 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted at the 2022 IEEE-RAS International Conference on Humanoid Robots (Humanoids 2022)