Search | arXiv e-print repository

Let It Flow: Simultaneous Optimization of 3D Flow and Object Clustering

Authors: Patrik Vacek, David Hurych, Tomáš Svoboda, Karel Zimmermann

Abstract: We study the problem of self-supervised 3D scene flow estimation from real large-scale raw point cloud sequences, which is crucial to various tasks like trajectory prediction or instance segmentation. In the absence of ground truth scene flow labels, contemporary approaches concentrate on deducing optimizing flow across sequential pairs of point clouds by incorporating structure based regularizati… ▽ More We study the problem of self-supervised 3D scene flow estimation from real large-scale raw point cloud sequences, which is crucial to various tasks like trajectory prediction or instance segmentation. In the absence of ground truth scene flow labels, contemporary approaches concentrate on deducing optimizing flow across sequential pairs of point clouds by incorporating structure based regularization on flow and object rigidity. The rigid objects are estimated by a variety of 3D spatial clustering methods. While state-of-the-art methods successfully capture overall scene motion using the Neural Prior structure, they encounter challenges in discerning multi-object motions. We identified the structural constraints and the use of large and strict rigid clusters as the main pitfall of the current approaches and we propose a novel clustering approach that allows for combination of overlapping soft clusters as well as non-overlapping rigid clusters representation. Flow is then jointly estimated with progressively growing non-overlapping rigid clusters together with fixed size overlapping soft clusters. We evaluate our method on multiple datasets with LiDAR point clouds, demonstrating the superior performance over the self-supervised baselines reaching new state of the art results. Our method especially excels in resolving flow in complicated dynamic scenes with multiple independently moving objects close to each other which includes pedestrians, cyclists and other vulnerable road users. Our codes are publicly available on https://github.com/ctu-vras/let-it-flow. △ Less

Submitted 20 May, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

arXiv:2312.08879 [pdf, other]

Regularizing Self-supervised 3D Scene Flows with Surface Awareness and Cyclic Consistency

Authors: Patrik Vacek, David Hurych, Karel Zimmermann, Patrick Perez, Tomas Svoboda

Abstract: Learning without supervision how to predict 3D scene flows from point clouds is essential to many perception systems. We propose a novel learning framework for this task which improves the necessary regularization. Relying on the assumption that scene elements are mostly rigid, current smoothness losses are built on the definition of ``rigid clusters" in the input point clouds. The definition of t… ▽ More Learning without supervision how to predict 3D scene flows from point clouds is essential to many perception systems. We propose a novel learning framework for this task which improves the necessary regularization. Relying on the assumption that scene elements are mostly rigid, current smoothness losses are built on the definition of ``rigid clusters" in the input point clouds. The definition of these clusters is challenging and has a significant impact on the quality of predicted flows. We introduce two new consistency losses that enlarge clusters while preventing them from spreading over distinct objects. In particular, we enforce \emph{temporal} consistency with a forward-backward cyclic loss and \emph{spatial} consistency by considering surface orientation similarity in addition to spatial proximity. The proposed losses are model-independent and can thus be used in a plug-and-play fashion to significantly improve the performance of existing models, as demonstrated on two most widely used architectures. We also showcase the effectiveness and generalization capability of our framework on four standard sensor-unique driving datasets, achieving state-of-the-art performance in 3D scene flow estimation. Our codes are available on https://github.com/ctu-vras/sac-flow. △ Less

Submitted 26 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2311.16712 [pdf, other]

Onedata4Sci: Life science data management solution based on Onedata

Authors: Tomáš Svoboda, Tomáš Raček, Josef Handl, Jozef Sabo, Adrián Rošinec, Łukasz Opioła, Wojciech Jesionek, Milan Ešner, Markéta Pernisová, Natallia Madzia Valasevich, Aleš Křenek, Radka Svobodová

Abstract: Life-science experimental methods generate vast and ever-increasing volumes of data, which provide highly valuable research resources. However, management of these data is nontrivial and applicable software solutions are currently subject to intensive development. The solutions mainly fall into one of the two groups: general data management systems (e.g. Onedata, iRODS, B2SHARE, CERNBox) or very s… ▽ More Life-science experimental methods generate vast and ever-increasing volumes of data, which provide highly valuable research resources. However, management of these data is nontrivial and applicable software solutions are currently subject to intensive development. The solutions mainly fall into one of the two groups: general data management systems (e.g. Onedata, iRODS, B2SHARE, CERNBox) or very specialised data management solutions (e.g. solutions for biomolecular simulation data, biological imaging data, genomic data). To bridge this gap between them, we provide Onedata4Sci, a prototype data management solution, which is focused on the management of life science data and covers four key steps of the data life cycle, i.e. data acquisition, user access, computational processing and archiving. Onedata4Sci is based on the Onedata data management system. It is written in Python, fully containerised, with the support for processing the stored data in Kubernetes. The applicability of Onedata4Sci is shown in three distinct use cases -- plant imaging data, cellular imaging data, and cryo-electron microscopy data. Despite the use cases covering very different types of data and user patterns, Onedata4Sci demonstrated an ability to successfully handle all these conditions. Complete source codes of Onedata4Sci are available on GitHub (https://github.com/CERIT-SC/onedata4sci), and its documentation and manual for installation are also provided. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2309.09007 [pdf, other]

MonoForce: Self-supervised Learning of Physics-aware Model for Predicting Robot-terrain Interaction

Authors: Ruslan Agishev, Karel Zimmermann, Vladimír Kubelka, Martin Pecka, Tomáš Svoboda

Abstract: While autonomous navigation of mobile robots on rigid terrain is a well-explored problem, navigating on deformable terrain such as tall grass or bushes remains a challenge. To address it, we introduce an explainable, physics-aware and end-to-end differentiable model which predicts the outcome of robot-terrain interaction from camera images, both on rigid and non-rigid terrain. The proposed MonoFor… ▽ More While autonomous navigation of mobile robots on rigid terrain is a well-explored problem, navigating on deformable terrain such as tall grass or bushes remains a challenge. To address it, we introduce an explainable, physics-aware and end-to-end differentiable model which predicts the outcome of robot-terrain interaction from camera images, both on rigid and non-rigid terrain. The proposed MonoForce model consists of a black-box module which predicts robot-terrain interaction forces from onboard cameras, followed by a white-box module, which transforms these forces and a control signals into predicted trajectories, using only the laws of classical mechanics. The differentiable white-box module allows backpropagating the predicted trajectory errors into the black-box module, serving as a self-supervised loss that measures consistency between the predicted forces and ground-truth trajectories of the robot. Experimental evaluation on a public dataset and our data has shown that while the prediction capabilities are comparable to state-of-the-art algorithms on rigid terrain, MonoForce shows superior accuracy on non-rigid terrain such as tall grass or bushes. To facilitate the reproducibility of our results, we release both the code and datasets. △ Less

Submitted 27 April, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

MSC Class: 68T40 ACM Class: I.2.9

arXiv:2309.08302 [pdf, other]

T-UDA: Temporal Unsupervised Domain Adaptation in Sequential Point Clouds

Authors: Awet Haileslassie Gebrehiwot, David Hurych, Karel Zimmermann, Patrick Pérez, Tomáš Svoboda

Abstract: Deep perception models have to reliably cope with an open-world setting of domain shifts induced by different geographic regions, sensor properties, mounting positions, and several other reasons. Since covering all domains with annotated data is technically intractable due to the endless possible variations, researchers focus on unsupervised domain adaptation (UDA) methods that adapt models traine… ▽ More Deep perception models have to reliably cope with an open-world setting of domain shifts induced by different geographic regions, sensor properties, mounting positions, and several other reasons. Since covering all domains with annotated data is technically intractable due to the endless possible variations, researchers focus on unsupervised domain adaptation (UDA) methods that adapt models trained on one (source) domain with annotations available to another (target) domain for which only unannotated data are available. Current predominant methods either leverage semi-supervised approaches, e.g., teacher-student setup, or exploit privileged data, such as other sensor modalities or temporal data consistency. We introduce a novel domain adaptation method that leverages the best of both trends. Our approach combines input data's temporal and cross-sensor geometric consistency with the mean teacher method. Dubbed T-UDA for "temporal UDA", such a combination yields massive performance gains for the task of 3D semantic segmentation of driving scenes. Experiments are conducted on Waymo Open Dataset, nuScenes and SemanticKITTI, for two popular 3D point cloud architectures, Cylinder3D and MinkowskiNet. Our codes are publicly available at https://github.com/ctu-vras/T-UDA. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: Will appear at IEEE/RSJ International Conference on Intelligent Robots and Systems 2023 (IROS 2023)

arXiv:2207.06079 [pdf, other]

doi 10.1109/LRA.2022.3226029

Teachers in concordance for pseudo-labeling of 3D sequential data

Authors: Awet Haileslassie Gebrehiwot, Patrik Vacek, David Hurych, Karel Zimmermann, Patrick Perez, Tomáš Svoboda

Abstract: Automatic pseudo-labeling is a powerful tool to tap into large amounts of sequential unlabeled data. It is specially appealing in safety-critical applications of autonomous driving, where performance requirements are extreme, datasets are large, and manual labeling is very challenging. We propose to leverage sequences of point clouds to boost the pseudolabeling technique in a teacher-student setup… ▽ More Automatic pseudo-labeling is a powerful tool to tap into large amounts of sequential unlabeled data. It is specially appealing in safety-critical applications of autonomous driving, where performance requirements are extreme, datasets are large, and manual labeling is very challenging. We propose to leverage sequences of point clouds to boost the pseudolabeling technique in a teacher-student setup via training multiple teachers, each with access to different temporal information. This set of teachers, dubbed Concordance, provides higher quality pseudo-labels for student training than standard methods. The output of multiple teachers is combined via a novel pseudo label confidence-guided criterion. Our experimental evaluation focuses on the 3D point cloud domain and urban driving scenarios. We show the performance of our method applied to 3D semantic segmentation and 3D object detection on three benchmark datasets. Our approach, which uses only 20% manual labels, outperforms some fully supervised methods. A notable performance boost is achieved for classes rarely appearing in training data. △ Less

Submitted 5 July, 2023; v1 submitted 13 July, 2022; originally announced July 2022.

Comments: This work has been submitted to the IEEE for publication

MSC Class: 68T07 ACM Class: I.4.6; I.4.8

Journal ref: in IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 536-543, Feb. 2023

arXiv:2206.08185 [pdf, other]

doi 10.55417/fr.2023001

UAVs Beneath the Surface: Cooperative Autonomy for Subterranean Search and Rescue in DARPA SubT

Authors: Matej Petrlik, Pavel Petracek, Vit Kratky, Tomas Musil, Yurii Stasinchuk, Matous Vrba, Tomas Baca, Daniel Hert, Martin Pecka, Tomas Svoboda, Martin Saska

Abstract: This paper presents a novel approach for autonomous cooperating UAVs in search and rescue operations in subterranean domains with complex topology. The proposed system was ranked second in the Virtual Track of the DARPA SubT Finals as part of the team CTU-CRAS-NORLAB. In contrast to the winning solution that was developed specifically for the Virtual Track, the proposed solution also proved to be… ▽ More This paper presents a novel approach for autonomous cooperating UAVs in search and rescue operations in subterranean domains with complex topology. The proposed system was ranked second in the Virtual Track of the DARPA SubT Finals as part of the team CTU-CRAS-NORLAB. In contrast to the winning solution that was developed specifically for the Virtual Track, the proposed solution also proved to be a robust system for deployment onboard physical UAVs flying in the extremely harsh and confined environment of the real-world competition. The proposed approach enables fully autonomous and decentralized deployment of a UAV team with seamless simulation-to-world transfer, and proves its advantage over less mobile UGV teams in the flyable space of diverse environments. The main contributions of the paper are present in the mapping and navigation pipelines. The mapping approach employs novel map representations -- SphereMap for efficient risk-aware long-distance planning, FacetMap for surface coverage, and the compressed topological-volumetric LTVMap for allowing multi-robot cooperation under low-bandwidth communication. These representations are used in navigation together with novel methods for visibility-constrained informed search in a general 3D environment with no assumptions about the environment structure, while balancing deep exploration with sensor-coverage exploitation. The proposed solution also includes a visual-perception pipeline for on-board detection and localization of objects of interest in four RGB stream at 5 Hz each without a dedicated GPU. Apart from participation in the DARPA SubT, the performance of the UAV system is supported by extensive experimental verification in diverse environments with both qualitative and quantitative evaluation. △ Less

Submitted 3 February, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

Comments: Submitted to Field Robotics Special Issue: DARPA Subterranean Challenge, Advancement and Lessons Learned from the Finals

Journal ref: Field Robotics, vol. 3, no. 1 pp. 1-68, January, 2023

arXiv:2206.07634 [pdf, other]

Real3D-Aug: Point Cloud Augmentation by Placing Real Objects with Occlusion Handling for 3D Detection and Segmentation

Authors: Petr Šebek, Šimon Pokorný, Patrik Vacek, Tomáš Svoboda

Abstract: Object detection and semantic segmentation with the 3D lidar point cloud data require expensive annotation. We propose a data augmentation method that takes advantage of already annotated data multiple times. We propose an augmentation framework that reuses real data, automatically finds suitable placements in the scene to be augmented, and handles occlusions explicitly. Due to the usage of the re… ▽ More Object detection and semantic segmentation with the 3D lidar point cloud data require expensive annotation. We propose a data augmentation method that takes advantage of already annotated data multiple times. We propose an augmentation framework that reuses real data, automatically finds suitable placements in the scene to be augmented, and handles occlusions explicitly. Due to the usage of the real data, the scan points of newly inserted objects in augmentation sustain the physical characteristics of the lidar, such as intensity and raydrop. The pipeline proves competitive in training top-performing models for 3D object detection and semantic segmentation. The new augmentation provides a significant performance gain in rare and essential classes, notably 6.65% average precision gain for "Hard" pedestrian class in KITTI object detection or 2.14 mean IoU gain in the SemanticKITTI segmentation challenge over the state of the art. △ Less

Submitted 11 July, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: Submitted on 15th June 2022 to IEEE RA-L journal

Journal ref: Computer Vision Winter Workshop 2023

arXiv:2110.05911 [pdf, other]

System for multi-robotic exploration of underground environments CTU-CRAS-NORLAB in the DARPA Subterranean Challenge

Authors: Tomáš Rouček, Martin Pecka, Petr Čížek, Tomáš Petříček, Jan Bayer, Vojtěch Šalanský, Teymur Azayev, Daniel Heřt, Matěj Petrlík, Tomáš Báča, Vojtěch Spurný, Vít Krátký, Pavel Petráček, Dominic Baril, Maxime Vaidis, Vladimír Kubelka, François Pomerleau, Jan Faigl, Karel Zimmermann, Martin Saska, Tomáš Svoboda, Tomáš Krajník

Abstract: We present a field report of CTU-CRAS-NORLAB team from the Subterranean Challenge (SubT) organised by the Defense Advanced Research Projects Agency (DARPA). The contest seeks to advance technologies that would improve the safety and efficiency of search-and-rescue operations in GPS-denied environments. During the contest rounds, teams of mobile robots have to find specific objects while operating… ▽ More We present a field report of CTU-CRAS-NORLAB team from the Subterranean Challenge (SubT) organised by the Defense Advanced Research Projects Agency (DARPA). The contest seeks to advance technologies that would improve the safety and efficiency of search-and-rescue operations in GPS-denied environments. During the contest rounds, teams of mobile robots have to find specific objects while operating in environments with limited radio communication, e.g. mining tunnels, underground stations or natural caverns. We present a heterogeneous exploration robotic system of the CTU-CRAS-NORLAB team, which achieved the third rank at the SubT Tunnel and Urban Circuit rounds and surpassed the performance of all other non-DARPA-funded teams. The field report describes the team's hardware, sensors, algorithms and strategies, and discusses the lessons learned by participating at the DARPA SubT contest. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: This paper have already been accepted to be published Filed Robotics special issue about DARPA SubT challange

arXiv:2004.14878 [pdf, other]

doi 10.1109/TNNLS.2023.3240857

PreCNet: Next-Frame Video Prediction Based on Predictive Coding

Authors: Zdenek Straka, Tomas Svoboda, Matej Hoffmann

Abstract: Predictive coding, currently a highly influential theory in neuroscience, has not been widely adopted in machine learning yet. In this work, we transform the seminal model of Rao and Ballard (1999) into a modern deep learning framework while remaining maximally faithful to the original schema. The resulting network we propose (PreCNet) is tested on a widely used next frame video prediction benchma… ▽ More Predictive coding, currently a highly influential theory in neuroscience, has not been widely adopted in machine learning yet. In this work, we transform the seminal model of Rao and Ballard (1999) into a modern deep learning framework while remaining maximally faithful to the original schema. The resulting network we propose (PreCNet) is tested on a widely used next frame video prediction benchmark, which consists of images from an urban environment recorded from a car-mounted camera, and achieves state-of-the-art performance. Performance on all measures (MSE, PSNR, SSIM) was further improved when a larger training set (2M images from BDD100k), pointing to the limitations of the KITTI training set. This work demonstrates that an architecture carefully based in a neuroscience model, without being explicitly tailored to the task at hand, can exhibit exceptional performance. △ Less

Submitted 8 February, 2023; v1 submitted 30 April, 2020; originally announced April 2020.

Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

arXiv:1804.01953 [pdf, other]

doi 10.1109/LRA.2018.2857927

Data-driven Policy Transfer with Imprecise Perception Simulation

Authors: Martin Pecka, Karel Zimmermann, Matěj Petrlík, Tomáš Svoboda

Abstract: The paper presents a complete pipeline for learning continuous motion control policies for a mobile robot when only a non-differentiable physics simulator of robot-terrain interactions is available. The multi-modal state estimation of the robot is also complex and difficult to simulate, so we simultaneously learn a generative model which refines simulator outputs. We propose a coarse-to-fine learn… ▽ More The paper presents a complete pipeline for learning continuous motion control policies for a mobile robot when only a non-differentiable physics simulator of robot-terrain interactions is available. The multi-modal state estimation of the robot is also complex and difficult to simulate, so we simultaneously learn a generative model which refines simulator outputs. We propose a coarse-to-fine learning paradigm, where the coarse motion planning is alternated with imitation learning and policy transfer to the real robot. The policy is jointly optimized with the generative model. We evaluate the method on a real-world platform in a batch of experiments. △ Less

Submitted 21 June, 2022; v1 submitted 5 April, 2018; originally announced April 2018.

Comments: \c{opyright} 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

ACM Class: I.6.3

Journal ref: in IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3916-3921, Oct. 2018

arXiv:1708.02074 [pdf, other]

Learning for Active 3D Mapping

Authors: Karel Zimmermann, Tomas Petricek, Vojtech Salansky, Tomas Svoboda

Abstract: We propose an active 3D mapping method for depth sensors, which allow individual control of depth-measuring rays, such as the newly emerging solid-state lidars. The method simultaneously (i) learns to reconstruct a dense 3D occupancy map from sparse depth measurements, and (ii) optimizes the reactive control of depth-measuring rays. To make the first step towards the online control optimization, w… ▽ More We propose an active 3D mapping method for depth sensors, which allow individual control of depth-measuring rays, such as the newly emerging solid-state lidars. The method simultaneously (i) learns to reconstruct a dense 3D occupancy map from sparse depth measurements, and (ii) optimizes the reactive control of depth-measuring rays. To make the first step towards the online control optimization, we propose a fast prioritized greedy algorithm, which needs to update its cost function in only a small fraction of pos- sible rays. The approximation ratio of the greedy algorithm is derived. An experimental evaluation on the subset of the KITTI dataset demonstrates significant improve- ment in the 3D map accuracy when learning-to-reconstruct from sparse measurements is coupled with the optimization of depth-measuring rays. △ Less

Submitted 7 August, 2017; originally announced August 2017.

Comments: ICCV 2017 (oral). See video: https://www.youtube.com/watch?v=KNex0zjeGYE

arXiv:1703.04316 [pdf, other]

doi 10.1109/IROS.2017.8206546

Fast Simulation of Vehicles with Non-deformable Tracks

Authors: Martin Pecka, Karel Zimmermann, Tomáš Svoboda

Abstract: This paper presents a novel technique that allows for both computationally fast and sufficiently plausible simulation of vehicles with non-deformable tracks. The method is based on an effect we have called Contact Surface Motion. A comparison with several other methods for simulation of tracked vehicle dynamics is presented with the aim to evaluate methods that are available off-the-shelf or with… ▽ More This paper presents a novel technique that allows for both computationally fast and sufficiently plausible simulation of vehicles with non-deformable tracks. The method is based on an effect we have called Contact Surface Motion. A comparison with several other methods for simulation of tracked vehicle dynamics is presented with the aim to evaluate methods that are available off-the-shelf or with minimum effort in general-purpose robotics simulators. The proposed method is implemented as a plugin for the open-source physics-based simulator Gazebo using the Open Dynamics Engine. △ Less

Submitted 21 June, 2022; v1 submitted 13 March, 2017; originally announced March 2017.

Comments: \c{opyright} 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

ACM Class: I.6.3

Journal ref: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 6414-6419

arXiv:1612.02739 [pdf, ps, other]

doi 10.1109/TIE.2016.2580125

Controlling Robot Morphology from Incomplete Measurements

Authors: Martin Pecka, Karel Zimmermann, Michal Reinštein, Tomáš Svoboda

Abstract: Mobile robots with complex morphology are essential for traversing rough terrains in Urban Search & Rescue missions (USAR). Since teleoperation of the complex morphology causes high cognitive load of the operator, the morphology is controlled autonomously. The autonomous control measures the robot state and surrounding terrain which is usually only partially observable, and thus the data are often… ▽ More Mobile robots with complex morphology are essential for traversing rough terrains in Urban Search & Rescue missions (USAR). Since teleoperation of the complex morphology causes high cognitive load of the operator, the morphology is controlled autonomously. The autonomous control measures the robot state and surrounding terrain which is usually only partially observable, and thus the data are often incomplete. We marginalize the control over the missing measurements and evaluate an explicit safety condition. If the safety condition is violated, tactile terrain exploration by the body-mounted robotic arm gathers the missing data. △ Less

Submitted 8 December, 2016; originally announced December 2016.

Comments: Accepted into IEEE Transactions to Industrial Electronics, Special Section on Motion Control for Novel Emerging Robotic Devices and Systems

ACM Class: I.2.9

Showing 1–14 of 14 results for author: Svoboda, T