-
MuTT: A Multimodal Trajectory Transformer for Robot Skills
Authors:
Claudius Kienle,
Benjamin Alt,
Onur Celik,
Philipp Becker,
Darko Katic,
Rainer Jäkel,
Gerhard Neumann
Abstract:
High-level robot skills represent an increasingly popular paradigm in robot programming. However, configuring the skills' parameters for a specific task remains a manual and time-consuming endeavor. Existing approaches for learning or optimizing these parameters often require numerous real-world executions or do not work in dynamic environments. To address these challenges, we propose MuTT, a nove…
▽ More
High-level robot skills represent an increasingly popular paradigm in robot programming. However, configuring the skills' parameters for a specific task remains a manual and time-consuming endeavor. Existing approaches for learning or optimizing these parameters often require numerous real-world executions or do not work in dynamic environments. To address these challenges, we propose MuTT, a novel encoder-decoder transformer architecture designed to predict environment-aware executions of robot skills by integrating vision, trajectory, and robot skill parameters. Notably, we pioneer the fusion of vision and trajectory, introducing a novel trajectory projection. Furthermore, we illustrate MuTT's efficacy as a predictor when combined with a model-based robot skill optimizer. This approach facilitates the optimization of robot skill parameters for the current environment, without the need for real-world executions during optimization. Designed for compatibility with any representation of robot skills, MuTT demonstrates its versatility across three comprehensive experiments, showcasing superior performance across two different skill representations.
△ Less
Submitted 22 August, 2024; v1 submitted 22 July, 2024;
originally announced July 2024.
-
Variational Distillation of Diffusion Policies into Mixture of Experts
Authors:
Hongyi Zhou,
Denis Blessing,
Ge Li,
Onur Celik,
Xiaogang Jia,
Gerhard Neumann,
Rudolf Lioutikov
Abstract:
This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. This ability allows Diffusion Models to r…
▽ More
This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. This ability allows Diffusion Models to replicate the inherent diversity in human behavior, making them the preferred models in behavior learning such as Learning from Human Demonstrations (LfD). However, diffusion models come with some drawbacks, including the intractability of likelihoods and long inference times due to their iterative sampling process. The inference times, in particular, pose a significant challenge to real-time applications such as robot control. In contrast, MoEs effectively address the aforementioned issues while retaining the ability to represent complex distributions but are notoriously difficult to train. VDD is the first method that distills pre-trained diffusion models into MoE models, and hence, combines the expressiveness of Diffusion Models with the benefits of Mixture Models. Specifically, VDD leverages a decompositional upper bound of the variational objective that allows the training of each expert separately, resulting in a robust optimization scheme for MoEs. VDD demonstrates across nine complex behavior learning tasks, that it is able to: i) accurately distill complex distributions learned by the diffusion model, ii) outperform existing state-of-the-art distillation methods, and iii) surpass conventional methods for training MoE.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
MaIL: Improving Imitation Learning with Mamba
Authors:
Xiaogang Jia,
Qian Wang,
Atalay Donat,
Bowen Xing,
Ge Li,
Hongyi Zhou,
Onur Celik,
Denis Blessing,
Rudolf Lioutikov,
Gerhard Neumann
Abstract:
This work introduces Mamba Imitation Learning (MaIL), a novel imitation learning (IL) architecture that offers a computationally efficient alternative to state-of-the-art (SoTA) Transformer policies. Transformer-based policies have achieved remarkable results due to their ability in handling human-recorded data with inherently non-Markovian behavior. However, their high performance comes with the…
▽ More
This work introduces Mamba Imitation Learning (MaIL), a novel imitation learning (IL) architecture that offers a computationally efficient alternative to state-of-the-art (SoTA) Transformer policies. Transformer-based policies have achieved remarkable results due to their ability in handling human-recorded data with inherently non-Markovian behavior. However, their high performance comes with the drawback of large models that complicate effective training. While state space models (SSMs) have been known for their efficiency, they were not able to match the performance of Transformers. Mamba significantly improves the performance of SSMs and rivals against Transformers, positioning it as an appealing alternative for IL policies. MaIL leverages Mamba as a backbone and introduces a formalism that allows using Mamba in the encoder-decoder structure. This formalism makes it a versatile architecture that can be used as a standalone policy or as part of a more advanced architecture, such as a diffuser in the diffusion process. Extensive evaluations on the LIBERO IL benchmark and three real robot experiments show that MaIL: i) outperforms Transformers in all LIBERO tasks, ii) achieves good performance even with small datasets, iii) is able to effectively process multi-modal sensory inputs, iv) is more robust to input noise compared to Transformers.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts
Authors:
Onur Celik,
Aleksandar Taranovic,
Gerhard Neumann
Abstract:
Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy. However, learning diverse skills is challenging in RL due to the commonly used Gaussian policy parameterization. We propose \textbf{Di}verse \textbf{Skil}l \textbf{L}earning (Di-SkilL\footnote{Videos and code are available on the project webpage: \url{https://alrhub.github.io/di-skill-website/}}), an RL meth…
▽ More
Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy. However, learning diverse skills is challenging in RL due to the commonly used Gaussian policy parameterization. We propose \textbf{Di}verse \textbf{Skil}l \textbf{L}earning (Di-SkilL\footnote{Videos and code are available on the project webpage: \url{https://alrhub.github.io/di-skill-website/}}), an RL method for learning diverse skills using Mixture of Experts, where each expert formalizes a skill as a contextual motion primitive. Di-SkilL optimizes each expert and its associate context distribution to a maximum entropy objective that incentivizes learning diverse skills in similar contexts. The per-expert context distribution enables automatic curricula learning, allowing each expert to focus on its best-performing sub-region of the context space. To overcome hard discontinuities and multi-modalities without any prior knowledge of the environment's unknown context probability space, we leverage energy-based models to represent the per-expert context distributions and demonstrate how we can efficiently train them using the standard policy gradient objective. We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills.
△ Less
Submitted 10 June, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
MP3: Movement Primitive-Based (Re-)Planning Policy
Authors:
Fabian Otto,
Hongyi Zhou,
Onur Celik,
Ge Li,
Rudolf Lioutikov,
Gerhard Neumann
Abstract:
We introduce a novel deep reinforcement learning (RL) approach called Movement Primitive-based Planning Policy (MP3). By integrating movement primitives (MPs) into the deep RL framework, MP3 enables the generation of smooth trajectories throughout the whole learning process while effectively learning from sparse and non-Markovian rewards. Additionally, MP3 maintains the capability to adapt to chan…
▽ More
We introduce a novel deep reinforcement learning (RL) approach called Movement Primitive-based Planning Policy (MP3). By integrating movement primitives (MPs) into the deep RL framework, MP3 enables the generation of smooth trajectories throughout the whole learning process while effectively learning from sparse and non-Markovian rewards. Additionally, MP3 maintains the capability to adapt to changes in the environment during execution. Although many early successes in robot RL have been achieved by combining RL with MPs, these approaches are often limited to learning single stroke-based motions, lacking the ability to adapt to task variations or adjust motions during execution. Building upon our previous work, which introduced an episode-based RL method for the non-linear adaptation of MP parameters to different task variations, this paper extends the approach to incorporating replanning strategies. This allows adaptation of the MP parameters throughout motion execution, addressing the lack of online motion adaptation in stochastic domains requiring feedback. We compared our approach against state-of-the-art deep RL and RL with MPs methods. The results demonstrated improved performance in sophisticated, sparse reward settings and in domains requiring replanning.
△ Less
Submitted 2 July, 2023; v1 submitted 22 June, 2023;
originally announced June 2023.
-
Curriculum-Based Imitation of Versatile Skills
Authors:
Maximilian Xiling Li,
Onur Celik,
Philipp Becker,
Denis Blessing,
Rudolf Lioutikov,
Gerhard Neumann
Abstract:
Learning skills by imitation is a promising concept for the intuitive teaching of robots. A common way to learn such skills is to learn a parametric model by maximizing the likelihood given the demonstrations. Yet, human demonstrations are often multi-modal, i.e., the same task is solved in multiple ways which is a major challenge for most imitation learning methods that are based on such a maximu…
▽ More
Learning skills by imitation is a promising concept for the intuitive teaching of robots. A common way to learn such skills is to learn a parametric model by maximizing the likelihood given the demonstrations. Yet, human demonstrations are often multi-modal, i.e., the same task is solved in multiple ways which is a major challenge for most imitation learning methods that are based on such a maximum likelihood (ML) objective. The ML objective forces the model to cover all data, it prevents specialization in the context space and can cause mode-averaging in the behavior space, leading to suboptimal or potentially catastrophic behavior. Here, we alleviate those issues by introducing a curriculum using a weight for each data point, allowing the model to specialize on data it can represent while incentivizing it to cover as much data as possible by an entropy bonus. We extend our algorithm to a Mixture of (linear) Experts (MoE) such that the single components can specialize on local context regions, while the MoE covers all data points. We evaluate our approach in complex simulated and real robot control tasks and show it learns from versatile human demonstrations and significantly outperforms current SOTA methods. A reference implementation can be found at https://github.com/intuitive-robots/ml-cur
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Information Maximizing Curriculum: A Curriculum-Based Approach for Imitating Diverse Skills
Authors:
Denis Blessing,
Onur Celik,
Xiaogang Jia,
Moritz Reuss,
Maximilian Xiling Li,
Rudolf Lioutikov,
Gerhard Neumann
Abstract:
Imitation learning uses data for training policies to solve complex tasks. However, when the training data is collected from human demonstrators, it often leads to multimodal distributions because of the variability in human actions. Most imitation learning methods rely on a maximum likelihood (ML) objective to learn a parameterized policy, but this can result in suboptimal or unsafe behavior due…
▽ More
Imitation learning uses data for training policies to solve complex tasks. However, when the training data is collected from human demonstrators, it often leads to multimodal distributions because of the variability in human actions. Most imitation learning methods rely on a maximum likelihood (ML) objective to learn a parameterized policy, but this can result in suboptimal or unsafe behavior due to the mode-averaging property of the ML objective. In this work, we propose Information Maximizing Curriculum, a curriculum-based approach that assigns a weight to each data point and encourages the model to specialize in the data it can represent, effectively mitigating the mode-averaging problem by allowing the model to ignore data from modes it cannot represent. To cover all modes and thus, enable diverse behavior, we extend our approach to a mixture of experts (MoE) policy, where each mixture component selects its own subset of the training data for learning. A novel, maximum entropy-based objective is proposed to achieve full coverage of the dataset, thereby enabling the policy to encompass all modes within the data distribution. We demonstrate the effectiveness of our approach on complex simulated control tasks using diverse human demonstrations, achieving superior performance compared to state-of-the-art methods.
△ Less
Submitted 31 October, 2023; v1 submitted 27 March, 2023;
originally announced March 2023.
-
Deep Black-Box Reinforcement Learning with Movement Primitives
Authors:
Fabian Otto,
Onur Celik,
Hongyi Zhou,
Hanna Ziesche,
Ngo Anh Vien,
Gerhard Neumann
Abstract:
\Episode-based reinforcement learning (ERL) algorithms treat reinforcement learning (RL) as a black-box optimization problem where we learn to select a parameter vector of a controller, often represented as a movement primitive, for a given task descriptor called a context. ERL offers several distinct benefits in comparison to step-based RL. It generates smooth control trajectories, can handle non…
▽ More
\Episode-based reinforcement learning (ERL) algorithms treat reinforcement learning (RL) as a black-box optimization problem where we learn to select a parameter vector of a controller, often represented as a movement primitive, for a given task descriptor called a context. ERL offers several distinct benefits in comparison to step-based RL. It generates smooth control trajectories, can handle non-Markovian reward definitions, and the resulting exploration in parameter space is well suited for solving sparse reward settings. Yet, the high dimensionality of the movement primitive parameters has so far hampered the effective use of deep RL methods. In this paper, we present a new algorithm for deep ERL. It is based on differentiable trust region layers, a successful on-policy deep RL algorithm. These layers allow us to specify trust regions for the policy update that are solved exactly for each state using convex optimization, which enables policies learning with the high precision required for the ERL. We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks. In doing so, we investigate different reward formulations - dense, sparse, and non-Markovian. While step-based algorithms perform well only on dense rewards, ERL performs favorably on sparse and non-Markovian rewards. Moreover, our results show that the sparse and the non-Markovian rewards are also often better suited to define the desired behavior, allowing us to obtain considerably higher quality policies compared to step-based RL.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Mission Design of DESTINY+: Toward Active Asteroid (3200) Phaethon and Multiple Small Bodies
Authors:
Naoya Ozaki,
Takayuki Yamamoto,
Ferran Gonzalez-Franquesa,
Roger Gutierrez-Ramon,
Nishanth Pushparaj,
Takuya Chikazawa,
Diogene Alessandro Dei Tos,
Onur Çelik,
Nicola Marmo,
Yasuhiro Kawakatsu,
Tomoko Arai,
Kazutaka Nishiyama,
Takeshi Takashima
Abstract:
DESTINY+ is an upcoming JAXA Epsilon medium-class mission to fly by the Geminids meteor shower parent body (3200) Phaethon. It will be the world's first spacecraft to escape from a near-geostationary transfer orbit into deep space using a low-thrust propulsion system. In doing so, DESTINY+ will demonstrate a number of technologies that include a highly efficient ion engine system, lightweight sola…
▽ More
DESTINY+ is an upcoming JAXA Epsilon medium-class mission to fly by the Geminids meteor shower parent body (3200) Phaethon. It will be the world's first spacecraft to escape from a near-geostationary transfer orbit into deep space using a low-thrust propulsion system. In doing so, DESTINY+ will demonstrate a number of technologies that include a highly efficient ion engine system, lightweight solar array panels, and advanced asteroid flyby observation instruments. These demonstrations will pave the way for JAXA's envisioned low-cost, high-frequency space exploration plans. Following the Phaethon flyby observation, DESTINY+ will visit additional asteroids as its extended mission. The mission design is divided into three phases: a spiral-shaped apogee-raising phase, a multi-lunar-flyby phase to escape Earth, and an interplanetary and asteroids flyby phase. The main challenges include the optimization of the many-revolution low-thrust spiral phase under operational constraints; the design of a multi-lunar-flyby sequence in a multi-body environment; and the design of multiple asteroid flybys connected via Earth gravity assists. This paper shows a novel, practical approach to tackle these complex problems, and presents feasible solutions found within the mass budget and mission constraints. Among them, the baseline solution is shown and discussed in depth; DESTINY+ will spend two years raising its apogee with ion engines, followed by four lunar gravity assists, and a flyby of asteroids (3200) Phaethon and (155140) 2005 UD. Finally, the flight operations plan for the spiral phase and the asteroid flyby phase are presented in detail.
△ Less
Submitted 14 April, 2022; v1 submitted 6 January, 2022;
originally announced January 2022.
-
Specializing Versatile Skill Libraries using Local Mixture of Experts
Authors:
Onur Celik,
Dongzhuoran Zhou,
Ge Li,
Philipp Becker,
Gerhard Neumann
Abstract:
A long-cherished vision in robotics is to equip robots with skills that match the versatility and precision of humans. For example, when playing table tennis, a robot should be capable of returning the ball in various ways while precisely placing it at the desired location. A common approach to model such versatile behavior is to use a Mixture of Experts (MoE) model, where each expert is a context…
▽ More
A long-cherished vision in robotics is to equip robots with skills that match the versatility and precision of humans. For example, when playing table tennis, a robot should be capable of returning the ball in various ways while precisely placing it at the desired location. A common approach to model such versatile behavior is to use a Mixture of Experts (MoE) model, where each expert is a contextual motion primitive. However, learning such MoEs is challenging as most objectives force the model to cover the entire context space, which prevents specialization of the primitives resulting in rather low-quality components. Starting from maximum entropy reinforcement learning (RL), we decompose the objective into optimizing an individual lower bound per mixture component. Further, we introduce a curriculum by allowing the components to focus on a local context region, enabling the model to learn highly accurate skill representations. To this end, we use local context distributions that are adapted jointly with the expert primitives. Our lower bound advocates an iterative addition of new components, where new components will concentrate on local context regions not covered by the current MoE. This local and incremental learning results in a modular MoE model of high accuracy and versatility, where both properties can be scaled by adding more components on the fly. We demonstrate this by an extensive ablation and on two challenging simulated robot skill learning tasks. We compare our achieved performance to LaDiPS and HiREPS, a known hierarchical policy search method for learning diverse skills.
△ Less
Submitted 10 January, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
VR-Caps: A Virtual Environment for Capsule Endoscopy
Authors:
Kagan Incetan,
Ibrahim Omer Celik,
Abdulhamid Obeid,
Guliz Irem Gokceler,
Kutsev Bengisu Ozyoruk,
Yasin Almalioglu,
Richard J. Chen,
Faisal Mahmood,
Hunter Gilbert,
Nicholas J. Durr,
Mehmet Turan
Abstract:
Current capsule endoscopes and next-generation robotic capsules for diagnosis and treatment of gastrointestinal diseases are complex cyber-physical platforms that must orchestrate complex software and hardware functions. The desired tasks for these systems include visual localization, depth estimation, 3D mapping, disease detection and segmentation, automated navigation, active control, path reali…
▽ More
Current capsule endoscopes and next-generation robotic capsules for diagnosis and treatment of gastrointestinal diseases are complex cyber-physical platforms that must orchestrate complex software and hardware functions. The desired tasks for these systems include visual localization, depth estimation, 3D mapping, disease detection and segmentation, automated navigation, active control, path realization and optional therapeutic modules such as targeted drug delivery and biopsy sampling. Data-driven algorithms promise to enable many advanced functionalities for capsule endoscopes, but real-world data is challenging to obtain. Physically-realistic simulations providing synthetic data have emerged as a solution to the development of data-driven algorithms. In this work, we present a comprehensive simulation platform for capsule endoscopy operations and introduce VR-Caps, a virtual active capsule environment that simulates a range of normal and abnormal tissue conditions (e.g., inflated, dry, wet etc.) and varied organ types, capsule endoscope designs (e.g., mono, stereo, dual and 360°camera), and the type, number, strength, and placement of internal and external magnetic sources that enable active locomotion. VR-Caps makes it possible to both independently or jointly develop, optimize, and test medical imaging and analysis software for the current and next-generation endoscopic capsule systems. To validate this approach, we train state-of-the-art deep neural networks to accomplish various medical image analysis tasks using simulated data from VR-Caps and evaluate the performance of these models on real medical data. Results demonstrate the usefulness and effectiveness of the proposed virtual platform in developing algorithms that quantify fractional coverage, camera trajectory, 3D map reconstruction, and disease classification.
△ Less
Submitted 14 January, 2021; v1 submitted 29 August, 2020;
originally announced August 2020.
-
Testing of Support Tools for Plagiarism Detection
Authors:
Tomáš Foltýnek,
Dita Dlabolová,
Alla Anohina-Naumeca,
Salim Razı,
Július Kravjar,
Laima Kamzola,
Jean Guerrero-Dib,
Özgür Çelik,
Debora Weber-Wulff
Abstract:
There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software to determine if a text is plagiarized or not. Software cannot determine plagiarism, but it can work as a support tool for identifying some text similarity that…
▽ More
There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software to determine if a text is plagiarized or not. Software cannot determine plagiarism, but it can work as a support tool for identifying some text similarity that may constitute plagiarism. But how well do the various systems work? This paper reports on a collaborative test of 15 web-based text-matching systems that can be used when plagiarism is suspected. It was conducted by researchers from seven countries using test material in eight different languages, evaluating the effectiveness of the systems on single-source and multi-source documents. A usability examination was also performed. The sobering results show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Tritangents and Their Space Sextics
Authors:
Turku Ozlum Celik,
Avinash Kulkarni,
Yue Ren,
Mahsa Sayyary Namin
Abstract:
Two classical results in algebraic geometry are that the branch curve of a del Pezzo surface of degree 1 can be embedded as a space sextic curve and that every space sextic curve has exactly 120 tritangents corresponding to its odd theta characteristics. In this paper we revisit both results from the computational perspective. Specifically, we give an algorithm to construct space sextic curves tha…
▽ More
Two classical results in algebraic geometry are that the branch curve of a del Pezzo surface of degree 1 can be embedded as a space sextic curve and that every space sextic curve has exactly 120 tritangents corresponding to its odd theta characteristics. In this paper we revisit both results from the computational perspective. Specifically, we give an algorithm to construct space sextic curves that arise from blowing up projective plane at eight points and provide algorithms to compute the 120 tritangents and their Steiner system of any space sextic. Furthermore, we develop efficient inverses to the aforementioned methods. We present an algorithm to either reconstruct the original eight points in the projective plane from a space sextic or certify that this is not possible. Moreover, we extend a construction of Lehavi which recovers a space sextic from its tritangents and Steiner system. All algorithms in this paper have been implemented in magma.
△ Less
Submitted 29 May, 2018;
originally announced May 2018.