Search | arXiv e-print repository

Jointly Learning Cost and Constraints from Demonstrations for Safe Trajectory Generation

Authors: Shivam Chaubey, Francesco Verdoja, Ville Kyrki

Abstract: Learning from Demonstration allows robots to mimic human actions. However, these methods do not model constraints crucial to ensure safety of the learned skill. Moreover, even when explicitly modelling constraints, they rely on the assumption of a known cost function, which limits their practical usability for task with unknown cost. In this work we propose a two-step optimization process that all… ▽ More Learning from Demonstration allows robots to mimic human actions. However, these methods do not model constraints crucial to ensure safety of the learned skill. Moreover, even when explicitly modelling constraints, they rely on the assumption of a known cost function, which limits their practical usability for task with unknown cost. In this work we propose a two-step optimization process that allow to estimate cost and constraints by decoupling the learning of cost functions from the identification of unknown constraints within the demonstrated trajectories. Initially, we identify the cost function by isolating the effect of constraints on parts of the demonstrations. Subsequently, a constraint leaning method is used to identify the unknown constraints. Our approach is validated both on simulated trajectories and a real robotic manipulation task. Our experiments show the impact that incorrect cost estimation has on the learned constraints and illustrate how the proposed method is able to infer unknown constraints, such as obstacles, from demonstrated trajectories without any initial knowledge of the cost. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2403.10117 [pdf, other]

Do Visual-Language Maps Capture Latent Semantics?

Authors: Matti Pekkanen, Tsvetomila Mihaylova, Francesco Verdoja, Ville Kyrki

Abstract: Visual-language models (VLMs) have recently been introduced in robotic mapping by using the latent representations, i.e., embeddings, of the VLMs to represent the natural language semantics in the map. The main benefit is moving beyond a small set of human-created labels toward open-vocabulary scene understanding. While there is anecdotal evidence that maps built this way support downstream tasks,… ▽ More Visual-language models (VLMs) have recently been introduced in robotic mapping by using the latent representations, i.e., embeddings, of the VLMs to represent the natural language semantics in the map. The main benefit is moving beyond a small set of human-created labels toward open-vocabulary scene understanding. While there is anecdotal evidence that maps built this way support downstream tasks, such as navigation, rigorous analysis of the quality of the maps using these embeddings is lacking. We investigate two critical properties of map quality: queryability and consistency. The evaluation of queryability addresses the ability to retrieve information from the embeddings. We investigate two aspects of consistency: intra-map consistency and inter-map consistency. Intra-map consistency captures the ability of the embeddings to represent abstract semantic classes, and inter-map consistency captures the generalization properties of the representation. In this paper, we propose a way to analyze the quality of maps created using VLMs, which forms an open-source benchmark to be used when proposing new open-vocabulary map representations. We demonstrate the benchmark by evaluating the maps created by two state-of-the-art methods, VLMaps and OpenScene, using two encoders, LSeg and OpenSeg, using real-world data from the Matterport3D data set. We find that OpenScene outperforms VLMaps with both encoders, and LSeg outperforms OpenSeg with both methods. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: Sumitted to IEEE-IROS-2024

arXiv:2309.08324 [pdf, other]

Object-oriented mapping in dynamic environments

Authors: Matti Pekkanen, Francesco Verdoja, Ville Kyrki

Abstract: Grid maps, especially occupancy grid maps, are ubiquitous in many mobile robot applications. To simplify the process of learning the map, grid maps subdivide the world into a grid of cells, whose occupancies are independently estimated using only measurements in the perceptual field of the particular cell. However, the world consists of objects that span multiple cells, which means that measuremen… ▽ More Grid maps, especially occupancy grid maps, are ubiquitous in many mobile robot applications. To simplify the process of learning the map, grid maps subdivide the world into a grid of cells, whose occupancies are independently estimated using only measurements in the perceptual field of the particular cell. However, the world consists of objects that span multiple cells, which means that measurements falling onto a cell provide evidence on the occupancy of other cells belonging to the same object. This correlation is not captured by current models. In this work, we present a way to generalize the update of grid maps relaxing the assumption of independence by modeling the relationship between the measurements and the occupancy of each cell as a set of latent variables, and jointly estimating those variables and the posterior of the map. Additionally, we propose a method to estimate the latent variables by clustering based on semantic labels and an extension to the Normal Distributions Transfer Occupancy Map (NDT-OM) to facilitate the proposed map update method. We perform comprehensive experiments of map creation and localization with real world data sets, and show that the proposed method creates better maps in highly dynamic environments compared to state-of-the-art methods. Finally, we demonstrate the ability of the proposed method to remove occluded objects from the map in a lifelong map update scenario. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: Submitted to IEEE-ICRA 2024

arXiv:2305.16702 [pdf, other]

Localization under consistent assumptions over dynamics

Authors: Matti Pekkanen, Francesco Verdoja, Ville Kyrki

Abstract: Accurate maps are a prerequisite for virtually all autonomous vehicle tasks. Most state-of-the-art maps assume a static world, and therefore dynamic objects are filtered out of the measurements. However, this division ignores movable but non-moving, i.e. semi-static, objects, which are usually recorded in the map and treated as static objects, violating the static world assumption, causing error i… ▽ More Accurate maps are a prerequisite for virtually all autonomous vehicle tasks. Most state-of-the-art maps assume a static world, and therefore dynamic objects are filtered out of the measurements. However, this division ignores movable but non-moving, i.e. semi-static, objects, which are usually recorded in the map and treated as static objects, violating the static world assumption, causing error in the localization. In this paper, we present a method for modeling moving and movable objects for matching the map and the measurements consistently. This reduces the error resulting from inconsistent categorization and treatment of non-static measurements. A semantic segmentation network is used to categorize the measurements into static and semi-static classes, and a background subtraction-based filtering method is used to remove dynamic measurements. Experimental comparison against a state-of-the-art baseline solution using real-world data from Oxford Radar RobotCar data set shows that consistent assumptions over dynamics increase localization accuracy. △ Less

Submitted 14 September, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: Submitted to IEEE-ICRA-2024

arXiv:2302.10745 [pdf, other]

Constrained Generative Sampling of 6-DoF Grasps

Authors: Jens Lundell, Francesco Verdoja, Tran Nguyen Le, Arsalan Mousavian, Dieter Fox, Ville Kyrki

Abstract: Most state-of-the-art data-driven grasp sampling methods propose stable and collision-free grasps uniformly on the target object. For bin-picking, executing any of those reachable grasps is sufficient. However, for completing specific tasks, such as squeezing out liquid from a bottle, we want the grasp to be on a specific part of the object's body while avoiding other locations, such as the cap. T… ▽ More Most state-of-the-art data-driven grasp sampling methods propose stable and collision-free grasps uniformly on the target object. For bin-picking, executing any of those reachable grasps is sufficient. However, for completing specific tasks, such as squeezing out liquid from a bottle, we want the grasp to be on a specific part of the object's body while avoiding other locations, such as the cap. This work presents a generative grasp sampling network, VCGS, capable of constrained 6 Degrees of Freedom (DoF) grasp sampling. In addition, we also curate a new dataset designed to train and evaluate methods for constrained grasping. The new dataset, called CONG, consists of over 14 million training samples of synthetically rendered point clouds and grasps at random target areas on 2889 objects. VCGS is benchmarked against GraspNet, a state-of-the-art unconstrained grasp sampler, in simulation and on a real robot. The results demonstrate that VCGS achieves a 10-15% higher grasp success rate than the baseline while being 2-3 times as sample efficient. Supplementary material is available on our project website. △ Less

Submitted 17 August, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: Accepted at the International Conference on Intelligent Robots and Systems (IROS 2023)

arXiv:2212.03581 [pdf, other]

LSVL: Large-scale season-invariant visual localization for UAVs

Authors: Jouko Kinnari, Riccardo Renzulli, Francesco Verdoja, Ville Kyrki

Abstract: Localization of autonomous unmanned aerial vehicles (UAVs) relies heavily on Global Navigation Satellite Systems (GNSS), which are susceptible to interference. Especially in security applications, robust localization algorithms independent of GNSS are needed to provide dependable operations of autonomous UAVs also in interfered conditions. Typical non-GNSS visual localization approaches rely on kn… ▽ More Localization of autonomous unmanned aerial vehicles (UAVs) relies heavily on Global Navigation Satellite Systems (GNSS), which are susceptible to interference. Especially in security applications, robust localization algorithms independent of GNSS are needed to provide dependable operations of autonomous UAVs also in interfered conditions. Typical non-GNSS visual localization approaches rely on known starting pose, work only on a small-sized map, or require known flight paths before a mission starts. We consider the problem of localization with no information on initial pose or planned flight path. We propose a solution for global visual localization on a map at scale up to 100 km2, based on matching orthoprojected UAV images to satellite imagery using learned season-invariant descriptors. We show that the method is able to determine heading, latitude and longitude of the UAV at 12.6-18.7 m lateral translation error in as few as 23.2-44.4 updates from an uninformed initialization, also in situations of significant seasonal appearance difference (winter-summer) between the UAV image and the map. We evaluate the characteristics of multiple neural network architectures for generating the descriptors, and likelihood estimation methods that are able to provide fast convergence and low localization error. We also evaluate the operation of the algorithm using real UAV data and evaluate running time on a real-time embedded platform. We believe this is the first work that is able to recover the pose of an UAV at this scale and rate of convergence, while allowing significant seasonal difference between camera observations and map. △ Less

Submitted 7 December, 2022; originally announced December 2022.

arXiv:2208.10851 [pdf, other]

Bayesian Floor Field: Transferring people flow predictions across environments

Authors: Francesco Verdoja, Tomasz Piotr Kucner, Ville Kyrki

Abstract: Mapping people dynamics is a crucial skill for robots, because it enables them to coexist in human-inhabited environments. However, learning a model of people dynamics is a time consuming process which requires observation of large amount of people moving in an environment. Moreover, approaches for mapping dynamics are unable to transfer the learned models across environments: each model is only a… ▽ More Mapping people dynamics is a crucial skill for robots, because it enables them to coexist in human-inhabited environments. However, learning a model of people dynamics is a time consuming process which requires observation of large amount of people moving in an environment. Moreover, approaches for mapping dynamics are unable to transfer the learned models across environments: each model is only able to describe the dynamics of the environment it has been built in. However, the impact of architectural geometry on people's movement can be used to anticipate their patterns of dynamics, and recent work has looked into learning maps of dynamics from occupancy. So far however, approaches based on trajectories and those based on geometry have not been combined. In this work we propose a novel Bayesian approach to learn people dynamics able to combine knowledge about the environment geometry with observations from human trajectories. An occupancy-based deep prior is used to build an initial transition model without requiring any observations of pedestrian; the model is then updated when observations become available using Bayesian inference. We demonstrate the ability of our model to increase data efficiency and to generalize across real large-scale environments, which is unprecedented for maps of dynamics. △ Less

Submitted 2 April, 2024; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: Submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2110.01967 [pdf, other]

doi 10.1109/LRA.2022.3191038

Season-invariant GNSS-denied visual localization for UAVs

Authors: Jouko Kinnari, Francesco Verdoja, Ville Kyrki

Abstract: Localization without Global Navigation Satellite Systems (GNSS) is a critical functionality in autonomous operations of unmanned aerial vehicles (UAVs). Vision-based localization on a known map can be an effective solution, but it is burdened by two main problems: places have different appearance depending on weather and season, and the perspective discrepancy between the UAV camera image and the… ▽ More Localization without Global Navigation Satellite Systems (GNSS) is a critical functionality in autonomous operations of unmanned aerial vehicles (UAVs). Vision-based localization on a known map can be an effective solution, but it is burdened by two main problems: places have different appearance depending on weather and season, and the perspective discrepancy between the UAV camera image and the map make matching hard. In this work, we propose a localization solution relying on matching of UAV camera images to georeferenced orthophotos with a trained convolutional neural network model that is invariant to significant seasonal appearance difference (winter-summer) between the camera image and map. We compare the convergence speed and localization accuracy of our solution to six reference methods. The results show major improvements with respect to reference methods, especially under high seasonal variation. We finally demonstrate the ability of the method to successfully localize a real UAV, showing that the proposed method is robust to perspective changes. △ Less

Submitted 3 August, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: Published in IEEE Robotics and Automation Letters (Volume: 7, Issue: 4, October 2022)

arXiv:2103.14381 [pdf, other]

GNSS-denied geolocalization of UAVs by visual matching of onboard camera images with orthophotos

Authors: Jouko Kinnari, Francesco Verdoja, Ville Kyrki

Abstract: Localization of low-cost Unmanned Aerial Vehicles (UAVs) often relies on Global Navigation Satellite Systems (GNSS). GNSS are susceptible to both natural disruptions to radio signal and intentional jamming and spoofing by an adversary. A typical way to provide georeferenced localization without GNSS for small UAVs is to have a downward-facing camera and match camera images to a map. The downward-f… ▽ More Localization of low-cost Unmanned Aerial Vehicles (UAVs) often relies on Global Navigation Satellite Systems (GNSS). GNSS are susceptible to both natural disruptions to radio signal and intentional jamming and spoofing by an adversary. A typical way to provide georeferenced localization without GNSS for small UAVs is to have a downward-facing camera and match camera images to a map. The downward-facing camera adds cost, size, and weight to the UAV platform and the orientation limits its usability for other purposes. In this work, we propose a Monte-Carlo localization method for georeferenced localization of an UAV requiring no infrastructure using only inertial measurements, a camera facing an arbitrary direction, and an orthoimage map. We perform orthorectification of the UAV image, relying on a local planarity assumption of the environment, relaxing the requirement of downward-pointing camera. We propose a measure of goodness for the matching score of an orthorectified UAV image and a map. We demonstrate that the system is able to localize globally an UAV with modest requirements for initialization and map resolution. △ Less

Submitted 19 October, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

Comments: Accepted for publication at 20th International Conference on Advanced Robotics (ICAR 2021)

arXiv:2103.07298 [pdf, other]

Augmented Environment Representations with Complete Object Models

Authors: Krishnananda Prabhu Sivananda, Francesco Verdoja, Ville Kyrki

Abstract: While 2D occupancy maps commonly used in mobile robotics enable safe navigation in indoor environments, in order for robots to understand and interact with their environment and its inhabitants representing 3D geometry and semantic environment information is required. Semantic information is crucial in effective interpretation of the meanings humans attribute to different parts of a space, while 3… ▽ More While 2D occupancy maps commonly used in mobile robotics enable safe navigation in indoor environments, in order for robots to understand and interact with their environment and its inhabitants representing 3D geometry and semantic environment information is required. Semantic information is crucial in effective interpretation of the meanings humans attribute to different parts of a space, while 3D geometry is important for safety and high-level understanding. We propose a pipeline that can generate a multi-layer representation of indoor environments for robotic applications. The proposed representation includes 3D metric-semantic layers, a 2D occupancy layer, and an object instance layer where known objects are replaced with an approximate model obtained through a novel model-matching approach. The metric-semantic layer and the object instance layer are combined to form an augmented representation of the environment. Experiments show that the proposed shape matching method outperforms a state-of-the-art deep learning method when tasked to complete unseen parts of objects in the scene. The pipeline performance translates well from simulation to real world as shown by F1-score analysis, with semantic segmentation accuracy using Mask R-CNN acting as the major bottleneck. Finally, we also demonstrate on a real robotic platform how the multi-layer map can be used to improve navigation safety. △ Less

Submitted 17 August, 2022; v1 submitted 12 March, 2021; originally announced March 2021.

Comments: Accepted for publication in the 31st IEEE International Conference on Robot & Human Interactive Communication (RO-MAN 2022)

arXiv:2103.04783 [pdf, other]

DDGC: Generative Deep Dexterous Grasping in Clutter

Authors: Jens Lundell, Francesco Verdoja, Ville Kyrki

Abstract: Recent advances in multi-fingered robotic grasping have enabled fast 6-Degrees-Of-Freedom (DOF) single object grasping. Multi-finger grasping in cluttered scenes, on the other hand, remains mostly unexplored due to the added difficulty of reasoning over obstacles which greatly increases the computational time to generate high-quality collision-free grasps. In this work we address such limitations… ▽ More Recent advances in multi-fingered robotic grasping have enabled fast 6-Degrees-Of-Freedom (DOF) single object grasping. Multi-finger grasping in cluttered scenes, on the other hand, remains mostly unexplored due to the added difficulty of reasoning over obstacles which greatly increases the computational time to generate high-quality collision-free grasps. In this work we address such limitations by introducing DDGC, a fast generative multi-finger grasp sampling method that can generate high quality grasps in cluttered scenes from a single RGB-D image. DDGC is built as a network that encodes scene information to produce coarse-to-fine collision-free grasp poses and configurations. We experimentally benchmark DDGC against the simulated-annealing planner in GraspIt! on 1200 simulated cluttered scenes and 7 real world scenes. The results show that DDGC outperforms the baseline on synthesizing high-quality grasps and removing clutter while being 5 times faster. This, in turn, opens up the door for using multi-finger grasps in practical applications which has so far been limited due to the excessive computation time needed by other methods. △ Less

Submitted 12 August, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: Accepted to IEEE Robotics and Automation Letters 2021 and IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

arXiv:2012.09696 [pdf, other]

Multi-FinGAN: Generative Coarse-To-Fine Sampling of Multi-Finger Grasps

Authors: Jens Lundell, Enric Corona, Tran Nguyen Le, Francesco Verdoja, Philippe Weinzaepfel, Gregory Rogez, Francesc Moreno-Noguer, Ville Kyrki

Abstract: While there exists many methods for manipulating rigid objects with parallel-jaw grippers, grasping with multi-finger robotic hands remains a quite unexplored research topic. Reasoning and planning collision-free trajectories on the additional degrees of freedom of several fingers represents an important challenge that, so far, involves computationally costly and slow processes. In this work, we p… ▽ More While there exists many methods for manipulating rigid objects with parallel-jaw grippers, grasping with multi-finger robotic hands remains a quite unexplored research topic. Reasoning and planning collision-free trajectories on the additional degrees of freedom of several fingers represents an important challenge that, so far, involves computationally costly and slow processes. In this work, we present Multi-FinGAN, a fast generative multi-finger grasp sampling method that synthesizes high quality grasps directly from RGB-D images in about a second. We achieve this by training in an end-to-end fashion a coarse-to-fine model composed of a classification network that distinguishes grasp types according to a specific taxonomy and a refinement network that produces refined grasp poses and joint angles. We experimentally validate and benchmark our method against a standard grasp-sampling method on 790 grasps in simulation and 20 grasps on a real Franka Emika Panda. All experimental results using our method show consistent improvements both in terms of grasp quality metrics and grasp success rate. Remarkably, our approach is up to 20-30 times faster than the baseline, a significant improvement that opens the door to feedback-based grasp re-planning and task informative grasping. Code is available at https://irobotics.aalto.fi/multi-fingan/. △ Less

Submitted 15 March, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

Comments: Accepted to IEEE Conference on Robotics and Automation 2021 (ICRA). Code is available at https://irobotics.aalto.fi/multi-fingan/

arXiv:2011.06895 [pdf, other]

Online Object-Oriented Semantic Mapping and Map Updating

Authors: Nils Dengler, Tobias Zaenker, Francesco Verdoja, Maren Bennewitz

Abstract: Creating and maintaining an accurate representation of the environment is an essential capability for every service robot. Especially for household robots acting in indoor environments, semantic information is important. In this paper, we present a semantic mapping framework with modular map representations. Our system is capable of online mapping and object updating given object detections from R… ▽ More Creating and maintaining an accurate representation of the environment is an essential capability for every service robot. Especially for household robots acting in indoor environments, semantic information is important. In this paper, we present a semantic mapping framework with modular map representations. Our system is capable of online mapping and object updating given object detections from RGB-D data and provides various 2D and 3D~representations of the mapped objects. To undo wrong data associations, we perform a refinement step when updating object shapes. Furthermore, we maintain an existence likelihood for each object to deal with false positive and false negative detections and keep the map updated. Our mapping system is highly efficient and achieves a run time of more than 10 Hz. We evaluated our approach in various environments using two different robots, i.e., a Toyota HSR and a Fraunhofer Care-O-Bot-4. As the experimental results demonstrate, our system is able to generate maps that are close to the ground truth and outperforms an existing approach in terms of intersection over union, different distance metrics, and the number of correct object mappings △ Less

Submitted 18 August, 2021; v1 submitted 13 November, 2020; originally announced November 2020.

Comments: Accepted for IEEE European Conference on Mobile Robots(ECMR) 2021

arXiv:2010.08277 [pdf, other]

doi 10.1109/LRA.2021.3062585

Probabilistic Surface Friction Estimation Based on Visual and Haptic Measurements

Authors: Tran Nguyen Le, Francesco Verdoja, Fares J. Abu-Dakka, Ville Kyrki

Abstract: Accurately modeling local surface properties of objects is crucial to many robotic applications, from grasping to material recognition. Surface properties like friction are however difficult to estimate, as visual observation of the object does not convey enough information over these properties. In contrast, haptic exploration is time consuming as it only provides information relevant to the expl… ▽ More Accurately modeling local surface properties of objects is crucial to many robotic applications, from grasping to material recognition. Surface properties like friction are however difficult to estimate, as visual observation of the object does not convey enough information over these properties. In contrast, haptic exploration is time consuming as it only provides information relevant to the explored parts of the object. In this work, we propose a joint visuo-haptic object model that enables the estimation of surface friction coefficient over an entire object by exploiting the correlation of visual and haptic information, together with a limited haptic exploration by a robotic arm. We demonstrate the validity of the proposed method by showing its ability to estimate varying friction coefficients on a range of real multi-material objects. Furthermore, we illustrate how the estimated friction coefficients can improve grasping success rate by guiding a grasp planner toward high friction areas. △ Less

Submitted 12 March, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

Comments: Published in RA-L and ICRA2021. 8 pages, 8 figures. Tin IEEE Robotics and Automation Letters, 2021

arXiv:2008.02627 [pdf, other]

Notes on the Behavior of MC Dropout

Authors: Francesco Verdoja, Ville Kyrki

Abstract: Among the various options to estimate uncertainty in deep neural networks, Monte-Carlo dropout is widely popular for its simplicity and effectiveness. However the quality of the uncertainty estimated through this method varies and choices in architecture design and in training procedures have to be carefully considered and tested to obtain satisfactory results. In this paper we present a study off… ▽ More Among the various options to estimate uncertainty in deep neural networks, Monte-Carlo dropout is widely popular for its simplicity and effectiveness. However the quality of the uncertainty estimated through this method varies and choices in architecture design and in training procedures have to be carefully considered and tested to obtain satisfactory results. In this paper we present a study offering a different point of view on the behavior of Monte-Carlo dropout, which enables us to observe a few interesting properties of the technique to keep in mind when considering its use for uncertainty estimation. △ Less

Submitted 13 July, 2021; v1 submitted 6 August, 2020; originally announced August 2020.

Comments: Presented at the ICML 2021 Workshop on "Uncertainty and Robustness in Deep Learning"

arXiv:2005.11094 [pdf, other]

On the Potential of Smarter Multi-layer Maps

Authors: Francesco Verdoja, Ville Kyrki

Abstract: The most common way for robots to handle environmental information is by using maps. At present, each kind of data is hosted on a separate map, which complicates planning because a robot attempting to perform a task needs to access and process information from many different maps. Also, most often correlation among the information contained in maps obtained from different sources is not evaluated… ▽ More The most common way for robots to handle environmental information is by using maps. At present, each kind of data is hosted on a separate map, which complicates planning because a robot attempting to perform a task needs to access and process information from many different maps. Also, most often correlation among the information contained in maps obtained from different sources is not evaluated or exploited. In this paper, we argue that in robotics a shift from single-source maps to a multi-layer mapping formalism has the potential to revolutionize the way robots interact with knowledge about their environment. This observation stems from the raise in metric-semantic mapping research, but expands to include in its formulation also layers containing other information sources, e.g., people flow, room semantic, or environment topology. Such multi-layer maps, here named hypermaps, not only can ease processing spatial data information but they can bring added benefits arising from the interaction between maps. We imagine that a new research direction grounded in such multi-layer mapping formalism for robots can use artificial intelligence to process the information it stores to present to the robot task-specific information simplifying planning and bringing us one step closer to high-level reasoning in robots. △ Less

Submitted 22 May, 2020; originally announced May 2020.

Comments: Presented at the "Perception, Action, Learning (PAL)" workshop at the 2020 IEEE International Conference on Robotics and Automation (ICRA 2020)

arXiv:1909.12908 [pdf, other]

doi 10.1109/ICRA40945.2020.9197320

Beyond Top-Grasps Through Scene Completion

Authors: Jens Lundell, Francesco Verdoja, Ville Kyrki

Abstract: Current end-to-end grasp planning methods propose grasps in the order of seconds that attain high grasp success rates on a diverse set of objects, but often by constraining the workspace to top-grasps. In this work, we present a method that allows end-to-end top-grasp planning methods to generate full six-degree-of-freedom grasps using a single RGB-D view as input. This is achieved by estimating t… ▽ More Current end-to-end grasp planning methods propose grasps in the order of seconds that attain high grasp success rates on a diverse set of objects, but often by constraining the workspace to top-grasps. In this work, we present a method that allows end-to-end top-grasp planning methods to generate full six-degree-of-freedom grasps using a single RGB-D view as input. This is achieved by estimating the complete shape of the object to be grasped, then simulating different viewpoints of the object, passing the simulated viewpoints to an end-to-end grasp generation method, and finally executing the overall best grasp. The method was experimentally validated on a Franka Emika Panda by comparing 429 grasps generated by the state-of-the-art Fully Convolutional Grasp Quality CNN, both on simulated and real camera images. The results show statistically significant improvements in terms of grasp success rate when using simulated images over real camera images, especially when the real camera viewpoint is angled. Code and video are available at https://irobotics.aalto.fi/beyond-top-grasps-through-scene-completion/. △ Less

Submitted 2 March, 2020; v1 submitted 15 September, 2019; originally announced September 2019.

Comments: Accepted to 2020 IEEE Conference on Robotics and Automation (ICRA)

Journal ref: 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 2020, pp. 545-551

arXiv:1909.09526 [pdf, other]

doi 10.1109/MFI49285.2020.9235231

Hypermap Mapping Framework and its Application to Autonomous Semantic Exploration

Authors: Tobias Zaenker, Francesco Verdoja, Ville Kyrki

Abstract: Modern intelligent and autonomous robotic applications often require robots to have more information about their environment than that provided by traditional occupancy grid maps. For example, a robot tasked to perform autonomous semantic exploration has to label objects in the environment it is traversing while autonomously navigating. To solve this task the robot needs to at least maintain an oc… ▽ More Modern intelligent and autonomous robotic applications often require robots to have more information about their environment than that provided by traditional occupancy grid maps. For example, a robot tasked to perform autonomous semantic exploration has to label objects in the environment it is traversing while autonomously navigating. To solve this task the robot needs to at least maintain an occupancy map of the environment for navigation, an exploration map keeping track of which areas have already been visited, and a semantic map where locations and labels of objects in the environment are recorded. As the number of maps required grows, an application has to know and handle different map representations, which can be a burden. We present the Hypermap framework, which can manage multiple maps of different types. In this work, we explore the capabilities of the framework to handle occupancy grid layers and semantic polygonal layers, but the framework can be extended with new layer types in the future. Additionally, we present an algorithm to automatically generate semantic layers from RGB-D images. We demonstrate the utility of the framework using the example of autonomous exploration for semantic mapping. △ Less

Submitted 3 September, 2020; v1 submitted 20 September, 2019; originally announced September 2019.

Comments: In 2020 IEEE International Conference on Multisensor Fusion and Integration (MFI)

Journal ref: 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Karlsruhe, Germany, 2020, pp. 133-139

arXiv:1903.00645 [pdf, other]

doi 10.1109/IROS40897.2019.8967816

Robust Grasp Planning Over Uncertain Shape Completions

Authors: Jens Lundell, Francesco Verdoja, Ville Kyrki

Abstract: We present a method for planning robust grasps over uncertain shape completed objects. For shape completion, a deep neural network is trained to take a partial view of the object as input and outputs the completed shape as a voxel grid. The key part of the network is dropout layers which are enabled not only during training but also at run-time to generate a set of shape samples representing the s… ▽ More We present a method for planning robust grasps over uncertain shape completed objects. For shape completion, a deep neural network is trained to take a partial view of the object as input and outputs the completed shape as a voxel grid. The key part of the network is dropout layers which are enabled not only during training but also at run-time to generate a set of shape samples representing the shape uncertainty through Monte Carlo sampling. Given the set of shape completed objects, we generate grasp candidates on the mean object shape but evaluate them based on their joint performance in terms of analytical grasp metrics on all the shape candidates. We experimentally validate and benchmark our method against another state-of-the-art method with a Barrett hand on 90000 grasps in simulation and 200 grasps on a real Franka Emika Panda. All experimental results show statistically significant improvements both in terms of grasp quality metrics and grasp success rate, demonstrating that planning shape-uncertainty-aware grasps brings significant advantages over solely planning on a single shape estimate, especially when dealing with complex or unknown objects. △ Less

Submitted 12 August, 2019; v1 submitted 2 March, 2019; originally announced March 2019.

Comments: 7 pages, 5 figures. Accepted to 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019)

Journal ref: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 2019, pp. 1526-1532

arXiv:1809.04891 [pdf, other]

doi 10.1109/Humanoids43949.2019.9035016

Deep Network Uncertainty Maps for Indoor Navigation

Authors: Francesco Verdoja, Jens Lundell, Ville Kyrki

Abstract: Most mobile robots for indoor use rely on 2D laser scanners for localization, mapping and navigation. These sensors, however, cannot detect transparent surfaces or measure the full occupancy of complex objects such as tables. Deep Neural Networks have recently been proposed to overcome this limitation by learning to estimate object occupancy. These estimates are nevertheless subject to uncertainty… ▽ More Most mobile robots for indoor use rely on 2D laser scanners for localization, mapping and navigation. These sensors, however, cannot detect transparent surfaces or measure the full occupancy of complex objects such as tables. Deep Neural Networks have recently been proposed to overcome this limitation by learning to estimate object occupancy. These estimates are nevertheless subject to uncertainty, making the evaluation of their confidence an important issue for these measures to be useful for autonomous navigation and mapping. In this work we approach the problem from two sides. First we discuss uncertainty estimation in deep models, proposing a solution based on a fully convolutional neural network. The proposed architecture is not restricted by the assumption that the uncertainty follows a Gaussian model, as in the case of many popular solutions for deep model uncertainty estimation, such as Monte-Carlo Dropout. We present results showing that uncertainty over obstacle distances is actually better modeled with a Laplace distribution. Then, we propose a novel approach to build maps based on Deep Neural Network uncertainty models. In particular, we present an algorithm to build a map that includes information over obstacle distance estimates while taking into account the level of uncertainty in each estimate. We show how the constructed map can be used to increase global navigation safety by planning trajectories which avoid areas of high uncertainty, enabling higher autonomy for mobile robots in indoor settings. △ Less

Submitted 4 September, 2019; v1 submitted 13 September, 2018; originally announced September 2018.

Comments: Accepted for publication in "2019 IEEE-RAS International Conference on Humanoid Robots (Humanoids)"

Journal ref: 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), Toronto, ON, Canada, 2019, pp. 112-119

arXiv:1805.12338 [pdf, other]

doi 10.1109/IROS.2018.8594399

Hallucinating robots: Inferring Obstacle Distances from Partial Laser Measurements

Authors: Jens Lundell, Francesco Verdoja, Ville Kyrki

Abstract: Many mobile robots rely on 2D laser scanners for localization, mapping, and navigation. However, those sensors are unable to correctly provide distance to obstacles such as glass panels and tables whose actual occupancy is invisible at the height the sensor is measuring. In this work, instead of estimating the distance to obstacles from richer sensor readings such as 3D lasers or RGBD sensors, we… ▽ More Many mobile robots rely on 2D laser scanners for localization, mapping, and navigation. However, those sensors are unable to correctly provide distance to obstacles such as glass panels and tables whose actual occupancy is invisible at the height the sensor is measuring. In this work, instead of estimating the distance to obstacles from richer sensor readings such as 3D lasers or RGBD sensors, we present a method to estimate the distance directly from raw 2D laser data. To learn a mapping from raw 2D laser distances to obstacle distances we frame the problem as a learning task and train a neural network formed as an autoencoder. A novel configuration of network hyperparameters is proposed for the task at hand and is quantitatively validated on a test set. Finally, we qualitatively demonstrate in real time on a Care-O-bot 4 that the trained network can successfully infer obstacle distances from partial 2D laser readings. △ Less

Submitted 29 July, 2018; v1 submitted 31 May, 2018; originally announced May 2018.

Comments: In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Journal ref: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 2018, pp. 4781-4787

arXiv:1802.09843 [pdf, other]

doi 10.1007/s00138-020-01059-4

Graph Laplacian for Image Anomaly Detection

Authors: Francesco Verdoja, Marco Grangetto

Abstract: Reed-Xiaoli detector (RXD) is recognized as the benchmark algorithm for image anomaly detection; however, it presents known limitations, namely the dependence over the image following a multivariate Gaussian model, the estimation and inversion of a high-dimensional covariance matrix, and the inability to effectively include spatial awareness in its evaluation. In this work, a novel graph-based sol… ▽ More Reed-Xiaoli detector (RXD) is recognized as the benchmark algorithm for image anomaly detection; however, it presents known limitations, namely the dependence over the image following a multivariate Gaussian model, the estimation and inversion of a high-dimensional covariance matrix, and the inability to effectively include spatial awareness in its evaluation. In this work, a novel graph-based solution to the image anomaly detection problem is proposed; leveraging the graph Fourier transform, we are able to overcome some of RXD's limitations while reducing computational cost at the same time. Tests over both hyperspectral and medical images, using both synthetic and real anomalies, prove the proposed technique is able to obtain significant gains over performance by other algorithms in the state of the art. △ Less

Submitted 10 February, 2020; v1 submitted 27 February, 2018; originally announced February 2018.

Comments: Published in Machine Vision and Applications (Springer)

Journal ref: Machine Vision and Applications, vol. 31, no. 1, Feb. 2020

Showing 1–22 of 22 results for author: Verdoja, F