Robotics
See recent articles
- [1] arXiv:2407.20439 [pdf, other]
-
Title: Haptic feedback of front car motion can improve driving controlSubjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Systems and Control (eess.SY)
This study investigates the role of haptic feedback in a car-following scenario, where information about the motion of the front vehicle is provided through a virtual elastic connection with it. Using a robotic interface in a simulated driving environment, we examined the impact of varying levels of such haptic feedback on the driver's ability to follow the road while avoiding obstacles. The results of an experiment with 15 subjects indicate that haptic feedback from the front car's motion can significantly improve driving control (i.e., reduce motion jerk and deviation from the road) and reduce mental load (evaluated via questionnaire). This suggests that haptic communication, as observed between physically interacting humans, can be leveraged to improve safety and efficiency in automated driving systems, warranting further testing in real driving scenarios.
- [2] arXiv:2407.20465 [pdf, other]
-
Title: A flexible framework for accurate LiDAR odometry, map manipulation, and localizationComments: 41 pages, 30 figuresSubjects: Robotics (cs.RO)
LiDAR-based SLAM is a core technology for autonomous vehicles and robots. Despite the intense research activity in this field, each proposed system uses a particular sensor post-processing pipeline and a single map representation format. The present work aims at introducing a revolutionary point of view for 3D LiDAR SLAM and localization: (1) using view-based maps as the fundamental representation of maps ("simple-maps"), which can then be used to generate arbitrary metric maps optimized for particular tasks; and (2) by introducing a new framework in which mapping pipelines can be defined without coding, defining the connections of a network of reusable blocks much like deep-learning networks are designed by connecting layers of standardized elements. Moreover, the idea of including the current linear and angular velocity vectors as variables to be optimized within the ICP loop is also introduced, leading to superior robustness against aggressive motion profiles without an IMU. The presented open-source ecosystem, released to ROS 2, includes tools and prebuilt pipelines covering all the way from data acquisition to map editing and visualization, real-time localization, loop-closure detection, or map georeferencing from consumer-grade GNSS receivers. Extensive experimental validation reveals that the proposal compares well to, or improves, former state-of-the-art (SOTA) LiDAR odometry systems, while also successfully mapping some hard sequences where others diverge. A proposed self-adaptive configuration has been used, without parameter changes, for all 3D LiDAR datasets with sensors between 16 and 128 rings, extensively tested on 83 sequences over more than 250~km of automotive, hand-held, airborne, and quadruped LiDAR datasets, both indoors and outdoors. The open-sourced implementation is available online at this https URL
- [3] arXiv:2407.20556 [pdf, other]
-
Title: Survey of Design Paradigms for Social RobotsSubjects: Robotics (cs.RO); Computation and Language (cs.CL); Computers and Society (cs.CY)
The demand for social robots in fields like healthcare, education, and entertainment increases due to their emotional adaptation features. These robots leverage multimodal communication, incorporating speech, facial expressions, and gestures to enhance user engagement and emotional support. The understanding of design paradigms of social robots is obstructed by the complexity of the system and the necessity to tune it to a specific task. This article provides a structured review of social robot design paradigms, categorizing them into cognitive architectures, role design models, linguistic models, communication flow, activity system models, and integrated design models. By breaking down the articles on social robot design and application based on these paradigms, we highlight the strengths and areas for improvement in current approaches. We further propose our original integrated design model that combines the most important aspects of the design of social robots. Our approach shows the importance of integrating operational, communicational, and emotional dimensions to create more adaptive and empathetic interactions between robots and humans.
- [4] arXiv:2407.20619 [pdf, other]
-
Title: ATI-CTLO:Adaptive Temporal Interval-based Continuous-Time LiDAR-Only OdometrySubjects: Robotics (cs.RO)
The motion distortion in LiDAR scans caused by aggressive robot motion and varying terrain features significantly impacts the positioning and mapping performance of 3D LiDAR odometry. Existing distortion correction solutions often struggle to balance computational complexity and accuracy. In this work, we propose an Adaptive Temporal Interval-based Continuous-Time LiDAR-only Odometry, utilizing straightforward and efficient linear interpolation. Our method flexibly adjusts the temporal intervals between control nodes according to the dynamics of motion and environmental characteristics. This adaptability enhances performance across various motion states and improves robustness in challenging, feature-sparse environments. We validate the effectiveness of our method on multiple datasets across different platforms, achieving accuracy comparable to state-of-the-art LiDAR-only odometry methods. Notably, in scenarios involving aggressive motion and sparse features, our method outperforms existing solutions.
- [5] arXiv:2407.20635 [pdf, other]
-
Title: Autonomous Improvement of Instruction Following Skills via Foundation ModelsSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Intelligent instruction-following robots capable of improving from autonomously collected experience have the potential to transform robot learning: instead of collecting costly teleoperated demonstration data, large-scale deployment of fleets of robots can quickly collect larger quantities of autonomous data that can collectively improve their performance. However, autonomous improvement requires solving two key problems: (i) fully automating a scalable data collection procedure that can collect diverse and semantically meaningful robot data and (ii) learning from non-optimal, autonomous data with no human annotations. To this end, we propose a novel approach that addresses these challenges, allowing instruction-following policies to improve from autonomously collected data without human supervision. Our framework leverages vision-language models to collect and evaluate semantically meaningful experiences in new environments, and then utilizes a decomposition of instruction following tasks into (semantic) language-conditioned image generation and (non-semantic) goal reaching, which makes it significantly more practical to improve from this autonomously collected data without any human annotations. We carry out extensive experiments in the real world to demonstrate the effectiveness of our approach, and find that in a suite of unseen environments, the robot policy can be improved significantly with autonomously collected data. We open-source the code for our semantic autonomous improvement pipeline, as well as our autonomous dataset of 30.5K trajectories collected across five tabletop environments.
- [6] arXiv:2407.20709 [pdf, other]
-
Title: A Case Study on Visual-Audio-Tactile Cross-Modal RetrievalComments: 7 pages, 6 figures, accepted to the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)Subjects: Robotics (cs.RO)
Cross-Modal Retrieval (CMR), which retrieves relevant items from one modality (e.g., audio) given a query in another modality (e.g., visual), has undergone significant advancements in recent years. This capability is crucial for robots to integrate and interpret information across diverse sensory inputs. However, the retrieval space in existing robotic CMR approaches often consists of only one modality, which limits the robot's performance. In this paper, we propose a novel CMR model that incorporates three different modalities, i.e., visual, audio and tactile, for enhanced multi-modal object retrieval, named as VAT-CMR. In this model, multi-modal representations are first fused to provide a holistic view of object features. To mitigate the semantic gaps between representations of different modalities, a dominant modality is then selected during the classification training phase to improve the distinctiveness of the representations, so as to improve the retrieval performance. To evaluate our proposed approach, we conducted a case study and the results demonstrate that our VAT-CMR model surpasses competing approaches. Further, our proposed dominant modality selection significantly enhances cross-retrieval accuracy.
New submissions for Wednesday, 31 July 2024 (showing 6 of 6 entries )
- [7] arXiv:2407.20242 (cross-list from cs.CY) [pdf, other]
-
Title: BadRobot: Jailbreaking LLM-based Embodied AI in the Physical WorldComments: Preliminary version (15 pages, 4 figures). Work in progress, revisions ongoing. Appreciate understanding and welcome any feedbackSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Embodied artificial intelligence (AI) represents an artificial intelligence system that interacts with the physical world through sensors and actuators, seamlessly integrating perception and action. This design enables AI to learn from and operate within complex, real-world environments. Large Language Models (LLMs) deeply explore language instructions, playing a crucial role in devising plans for complex tasks. Consequently, they have progressively shown immense potential in empowering embodied AI, with LLM-based embodied AI emerging as a focal point of research within the community. It is foreseeable that, over the next decade, LLM-based embodied AI robots are expected to proliferate widely, becoming commonplace in homes and industries. However, a critical safety issue that has long been hiding in plain sight is: could LLM-based embodied AI perpetrate harmful behaviors? Our research investigates for the first time how to induce threatening actions in embodied AI, confirming the severe risks posed by these soon-to-be-marketed robots, which starkly contravene Asimov's Three Laws of Robotics and threaten human safety. Specifically, we formulate the concept of embodied AI jailbreaking and expose three critical security vulnerabilities: first, jailbreaking robotics through compromised LLM; second, safety misalignment between action and language spaces; and third, deceptive prompts leading to unaware hazardous behaviors. We also analyze potential mitigation measures and advocate for community awareness regarding the safety of embodied AI applications in the physical world.
- [8] arXiv:2407.20391 (cross-list from cs.CV) [pdf, other]
-
Title: Alignment Scores: Robust Metrics for Multiview Pose Accuracy EvaluationSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
We propose three novel metrics for evaluating the accuracy of a set of estimated camera poses given the ground truth: Translation Alignment Score (TAS), Rotation Alignment Score (RAS), and Pose Alignment Score (PAS). The TAS evaluates the translation accuracy independently of the rotations, and the RAS evaluates the rotation accuracy independently of the translations. The PAS is the average of the two scores, evaluating the combined accuracy of both translations and rotations. The TAS is computed in four steps: (1) Find the upper quartile of the closest-pair-distances, $d$. (2) Align the estimated trajectory to the ground truth using a robust registration method. (3) Collect all distance errors and obtain the cumulative frequencies for multiple thresholds ranging from $0.01d$ to $d$ with a resolution $0.01d$. (4) Add up these cumulative frequencies and normalize them such that the theoretical maximum is 1. The TAS has practical advantages over the existing metrics in that (1) it is robust to outliers and collinear motion, and (2) there is no need to adjust parameters on different datasets. The RAS is computed in a similar manner to the TAS and is also shown to be more robust against outliers than the existing rotation metrics. We verify our claims through extensive simulations and provide in-depth discussion of the strengths and weaknesses of the proposed metrics.
- [9] arXiv:2407.20732 (cross-list from cs.CV) [pdf, other]
-
Title: Scene-Specific Trajectory Sets: Maximizing Representation in Motion ForecastingSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Representing diverse and plausible future trajectories of actors is crucial for motion forecasting in autonomous driving. However, efficiently capturing the true trajectory distribution with a compact set is challenging. In this work, we propose a novel approach for generating scene-specific trajectory sets that better represent the diversity and admissibility of future actor behavior. Our method constructs multiple trajectory sets tailored to different scene contexts, such as intersections and non-intersections, by leveraging map information and actor dynamics. We introduce a deterministic goal sampling algorithm that identifies relevant map regions and generates trajectories conditioned on the scene layout. Furthermore, we empirically investigate various sampling strategies and set sizes to optimize the trade-off between coverage and diversity. Experiments on the Argoverse 2 dataset demonstrate that our scene-specific sets achieve higher plausibility while maintaining diversity compared to traditional single-set approaches. The proposed Recursive In-Distribution Subsampling (RIDS) method effectively condenses the representation space and outperforms metric-driven sampling in terms of trajectory admissibility. Our work highlights the benefits of scene-aware trajectory set generation for capturing the complex and heterogeneous nature of actor behavior in real-world driving scenarios.
- [10] arXiv:2407.20798 (cross-list from cs.LG) [pdf, other]
-
Title: Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer LearningComments: Published at 3rd Conference on Lifelong Learning Agents (CoLLAs), 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
We introduce Diffusion Augmented Agents (DAAG), a novel framework that leverages large language models, vision language models, and diffusion models to improve sample efficiency and transfer learning in reinforcement learning for embodied agents. DAAG hindsight relabels the agent's past experience by using diffusion models to transform videos in a temporally and geometrically consistent way to align with target instructions with a technique we call Hindsight Experience Augmentation. A large language model orchestrates this autonomous process without requiring human supervision, making it well-suited for lifelong learning scenarios. The framework reduces the amount of reward-labeled data needed to 1) finetune a vision language model that acts as a reward detector, and 2) train RL agents on new tasks. We demonstrate the sample efficiency gains of DAAG in simulated robotics environments involving manipulation and navigation. Our results show that DAAG improves learning of reward detectors, transferring past experience, and acquiring new tasks - key abilities for developing efficient lifelong learning agents. Supplementary material and visualizations are available on our website this https URL
Cross submissions for Wednesday, 31 July 2024 (showing 4 of 4 entries )
- [11] arXiv:2203.03374 (replaced) [pdf, other]
-
Title: A Unified Formulation of Geometry-aware Dynamic Movement PrimitivesComments: 18 pages, 12 figures, journalSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
Learning from demonstration (LfD) is considered as an efficient way to transfer skills from humans to robots. Traditionally, LfD has been used to transfer Cartesian and joint positions and forces from human demonstrations. The traditional approach works well for some robotic tasks, but for many tasks of interest, it is necessary to learn skills such as orientation, impedance, and/or manipulability that have specific geometric characteristics. An effective encoding of such skills can be only achieved if the underlying geometric structure of the skill manifold is considered and the constrains arising from this structure are fulfilled during both learning and execution. However, typical learned skill models such as dynamic movement primitives (DMPs) are limited to Euclidean data and fail in correctly embedding quantities with geometric constraints. In this paper, we propose a novel and mathematically principled framework that uses concepts from Riemannian geometry to allow DMPs to properly embed geometric constrains. The resulting DMP formulation can deal with data sampled from any Riemannian manifold including, but not limited to, unit quaternions and symmetric and positive definite matrices. The proposed approach has been extensively evaluated both on simulated data and real robot experiments. The performed evaluation demonstrates that beneficial properties of DMPs, such as convergence to a given goal and the possibility to change the goal during operation, apply also to the proposed formulation.
- [12] arXiv:2305.06869 (replaced) [pdf, other]
-
Title: An Adaptive Graduated Nonconvexity Loss Function for Robust Nonlinear Least Squares SolutionsComments: Accepted to IEEE Transaction on Robotics 15 pages 11 figuresSubjects: Robotics (cs.RO)
Many problems in robotics, such as estimating the state from noisy sensor data or aligning two point clouds, can be posed and solved as least-squares problems. Unfortunately, vanilla nonminimal solvers for least-squares problems are notoriously sensitive to outliers. As such, various robust loss functions have been proposed to reduce the sensitivity to outliers. Examples of loss functions include pseudo-Huber, Cauchy, and Geman-McClure. Recently, these loss functions have been generalized into a single loss function that enables the best loss function to be found adaptively based on the distribution of the residuals. However, even with the generalized robust loss function, most nonminimal solvers can only be solved locally given a prior state estimate due to the nonconvexity of the problem. The first contribution of this paper is to combine graduated nonconvexity (GNC) with the generalized robust loss function to solve least-squares problems without a prior state estimate and without the need to specify a loss function. Moreover, existing loss functions, including the generalized loss function, are based on Gaussian-like distribution. However, residuals are often defined as the squared norm of a multivariate error and distributed in a Chi-like fashion. The second contribution of this paper is to apply a norm-aware adaptive robust loss function within a GNC framework. The proposed approach enables a GNC formulation of a generalized loss function such that GNC can be readily applied to a wider family of loss functions. Furthermore, simulations and experiments demonstrate that the proposed method is more robust compared to non-GNC counterparts, and yields faster convergence times compared to other GNC formulations.
- [13] arXiv:2310.03676 (replaced) [pdf, other]
-
Title: PV-OSIMr: A Lowest Order Complexity Algorithm for Computing the Delassus MatrixComments: 8 pages, improved clarity of exposition and resolved ambiguities in algorithm. Under peer reviewSubjects: Robotics (cs.RO)
We present PV-OSIMr, an efficient algorithm for computing the Delassus matrix (also known as the inverse operational space inertia matrix) for a kinematic tree, with the lowest order computational complexity known in literature. PV-OSIMr is derived by optimizing the Popov-Vereshchagin (PV) solver computations using the compositionality of the force and motion propagators. It has a computational complexity of O(n + m^2 ) compared to O(n + m^2d) of the original PV-OSIM algorithm and O(n+md+m^2 ) of the extended force propagator algorithm (EFPA), where n is the number of joints, m is the number of constraints and d is the depth of the kinematic tree. Since Delassus matrix computation requires constructing an m x m sized matrix and must consider all the n joints at least once, the asymptotic computational complexity of PV-OSIMr is optimal. We further benchmark our algorithm and find it to be often more efficient than the PV-OSIM and EFPA in practice.
- [14] arXiv:2310.09824 (replaced) [pdf, other]
-
Title: Overconstrained LocomotionHaoran Sun, Bangchao Huang, Zishang Zhang, Ronghan Xu, Guojing Huang, Shihao Feng, Guangyi Huang, Jiayi Yin, Nuofan Qiu, Hua Chen, Wei Zhang, Jia Pan, Fang Wan, Chaoyang SongComments: 30 pages, 20 figures, 2 tablesSubjects: Robotics (cs.RO)
This paper studies the design, control, and learning of a novel robotic limb that produces overconstrained locomotion by employing the Bennett linkage for motion generation, capable of parametric reconfiguration between a reptile- and mammal-inspired morphology within a single quadruped. In contrast to the prevailing focus on planar linkages, this research delves into adopting overconstrained linkages as the limb mechanism. The overconstrained linkages have solid theoretical foundations in advanced kinematics but are under-explored in robotic applications. This study showcases the morphological superiority of Overconstrained Robotic Limbs (ORLs) that can transform into planar or spherical limbs, exemplified using the simplest case of a Bennett linkage as an ORL. We apply Model Predictive Control (MPC) to simulate a range of overconstrained locomotion tasks, revealing its superiority in energy efficiency against planar limbs when considering foothold distances and speeds. The results are further verified in overconstrained locomotion policies optimized from Reinforcement Learning (RL). From an evolutionary biology perspective, these findings highlight the mechanism distinctions in limb design between reptiles and mammals and represent the first documented instance of ORLs outperforming planar limb designs in dynamic locomotion. Future studies will focus on deploying the model-based and learning-based overconstrained locomotion skills in the robotic hardware to close the Sim2Real gap for developing evolutionary-inspired, energy-efficient control of novel robotic limbs.
- [15] arXiv:2310.12007 (replaced) [pdf, other]
-
Title: KI-PMF: Knowledge Integrated Plausible Motion ForecastingJournal-ref: 2024 IEEE Intelligent Vehicles Symposium (IV)Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Accurately forecasting the motion of traffic actors is crucial for the deployment of autonomous vehicles at a large scale. Current trajectory forecasting approaches primarily concentrate on optimizing a loss function with a specific metric, which can result in predictions that do not adhere to physical laws or violate external constraints. Our objective is to incorporate explicit knowledge priors that allow a network to forecast future trajectories in compliance with both the kinematic constraints of a vehicle and the geometry of the driving environment. To achieve this, we introduce a non-parametric pruning layer and attention layers to integrate the defined knowledge priors. Our proposed method is designed to ensure reachability guarantees for traffic actors in both complex and dynamic situations. By conditioning the network to follow physical laws, we can obtain accurate and safe predictions, essential for maintaining autonomous vehicles' safety and efficiency in real-world this http URL summary, this paper presents concepts that prevent off-road predictions for safe and reliable motion forecasting by incorporating knowledge priors into the training process.
- [16] arXiv:2311.15852 (replaced) [pdf, other]
-
Title: Exponential Auto-Tuning Fault-Tolerant Control of N Degrees-of-Freedom Manipulators Subject to Torque ConstraintsComments: This paper has been accepted in the 63rd IEEE Conference on Decision and Control (CDC)Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
This paper presents a novel auto-tuning subsystem-based fault-tolerant control (SBFC) system designed for robotic manipulator systems with n degrees of freedom (DoF). It initially proposes a novel model for joint torques, incorporating an actuator fault correction model to account for potential faults and a mathematical saturation function to mitigate issues related to unforeseen excessive torque. This model is designed to prevent the generation of excessive torques even by faulty actuators. Subsequently, a robust subsystem-based adaptive control strategy is proposed to force system states closely along desired trajectories, while tolerating various actuator faults, excessive torques, and unknown modeling errors. Furthermore, optimal SBFC gains are determined by tailoring the JAYA algorithm (JA), a high-performance swarm intelligence technique, standing out for its capacity to optimize without the need for meticulous tuning of algorithm-specific parameters, relying instead on its intrinsic principles. Notably, this control framework ensures uniform exponential stability (UES). The enhancement of accuracy and tracking time for reference trajectories, along with the validation of theoretical assertions, is demonstrated through the presentation of simulation outcomes.
- [17] arXiv:2312.04533 (replaced) [pdf, other]
-
Title: Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language ModelsComments: ICRA 2024. Project webpage with robot videos: this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We introduce Dream2Real, a robotics framework which integrates vision-language models (VLMs) trained on 2D data into a 3D object rearrangement pipeline. This is achieved by the robot autonomously constructing a 3D representation of the scene, where objects can be rearranged virtually and an image of the resulting arrangement rendered. These renders are evaluated by a VLM, so that the arrangement which best satisfies the user instruction is selected and recreated in the real world with pick-and-place. This enables language-conditioned rearrangement to be performed zero-shot, without needing to collect a training dataset of example arrangements. Results on a series of real-world tasks show that this framework is robust to distractors, controllable by language, capable of understanding complex multi-object relations, and readily applicable to both tabletop and 6-DoF rearrangement tasks.
- [18] arXiv:2403.08109 (replaced) [pdf, other]
-
Title: VANP: Learning Where to See for Navigation with Self-Supervised Vision-Action Pre-TrainingComments: Extended version of the paper accepted at IROS 2024. Code: this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Humans excel at efficiently navigating through crowds without collision by focusing on specific visual regions relevant to navigation. However, most robotic visual navigation methods rely on deep learning models pre-trained on vision tasks, which prioritize salient objects -- not necessarily relevant to navigation and potentially misleading. Alternative approaches train specialized navigation models from scratch, requiring significant computation. On the other hand, self-supervised learning has revolutionized computer vision and natural language processing, but its application to robotic navigation remains underexplored due to the difficulty of defining effective self-supervision signals. Motivated by these observations, in this work, we propose a Self-Supervised Vision-Action Model for Visual Navigation Pre-Training (VANP). Instead of detecting salient objects that are beneficial for tasks such as classification or detection, VANP learns to focus only on specific visual regions that are relevant to the navigation task. To achieve this, VANP uses a history of visual observations, future actions, and a goal image for self-supervision, and embeds them using two small Transformer Encoders. Then, VANP maximizes the information between the embeddings by using a mutual information maximization objective function. We demonstrate that most VANP-extracted features match with human navigation intuition. VANP achieves comparable performance as models learned end-to-end with half the training time and models trained on a large-scale, fully supervised dataset, i.e., ImageNet, with only 0.08% data.
- [19] arXiv:2404.09765 (replaced) [pdf, other]
-
Title: Hilti SLAM Challenge 2023: Benchmarking Single + Multi-session SLAM across Sensor Constellations in ConstructionSubjects: Robotics (cs.RO); Image and Video Processing (eess.IV)
Simultaneous Localization and Mapping systems are a key enabler for positioning in both handheld and robotic applications. The Hilti SLAM Challenges organized over the past years have been successful at benchmarking some of the world's best SLAM Systems with high accuracy. However, more capabilities of these systems are yet to be explored, such as platform agnosticism across varying sensor suites and multi-session SLAM. These factors indirectly serve as an indicator of robustness and ease of deployment in real-world applications. There exists no dataset plus benchmark combination publicly available, which considers these factors combined. The Hilti SLAM Challenge 2023 Dataset and Benchmark addresses this issue. Additionally, we propose a novel fiducial marker design for a pre-surveyed point on the ground to be observable from an off-the-shelf LiDAR mounted on a robot, and an algorithm to estimate its position at mm-level accuracy. Results from the challenge show an increase in overall participation, single-session SLAM systems getting increasingly accurate, successfully operating across varying sensor suites, but relatively few participants performing multi-session SLAM. Dataset URL: this https URL
- [20] arXiv:2407.02664 (replaced) [pdf, other]
-
Title: The path towards contact-based physical human-robot interactionSubjects: Robotics (cs.RO)
With the advancements in human-robot interaction (HRI), robots are now capable of operating in close proximity and engaging in physical interactions with humans (pHRI). Likewise, contact-based pHRI is becoming increasingly common as robots are equipped with a range of sensors to perceive human motions. Despite the presence of surveys exploring various aspects of HRI and pHRI, there is presently a gap in comprehensive studies that collect, organize and relate developments across all aspects of contact-based pHRI. It has become challenging to gain a comprehensive understanding of the current state of the field, thoroughly analyze the aspects that have been covered, and identify areas needing further attention. Hence, the present survey. While it includes key developments in pHRI, a particular focus is placed on contact-based interaction, which has numerous applications in industrial, rehabilitation and medical robotics. Across the literature, a common denominator is the importance to establish a safe, compliant and human intention-oriented interaction. This endeavour encompasses aspects of perception, planning and control, and how they work together to enhance safety and reliability. Notably, the survey highlights the application of data-driven techniques: backed by a growing body of literature demonstrating their effectiveness, approaches like reinforcement learning and learning from demonstration have become key to improving robot perception and decision-making within complex and uncertain pHRI scenarios. As the field is yet in its early stage, these observations may help guide future developments and steer research towards the responsible integration of physically interactive robots into workplaces, public spaces, and elements of private life.
- [21] arXiv:2407.09242 (replaced) [pdf, other]
-
Title: An Adaptive Indoor Localization Approach Using WiFi RSSI Fingerprinting with SLAM-Enabled Robotic Platform and Deep Neural NetworksSeyed Alireza Rahimi Azghadi, Atah Nuh Mih, Asfia Kawnine, Monica Wachowicz, Francis Palma, Hung CaoComments: Fingerprinting dataset; Robotic platform; Indoor localization; Signals strength indicator; Location-based servicesSubjects: Robotics (cs.RO); Emerging Technologies (cs.ET); Networking and Internet Architecture (cs.NI)
Indoor localization plays a vital role in the era of the IoT and robotics, with WiFi technology being a prominent choice due to its ubiquity. We present a method for creating WiFi fingerprinting datasets to enhance indoor localization systems and address the gap in WiFi fingerprinting dataset creation. We used the Simultaneous Localization And Mapping (SLAM) algorithm and employed a robotic platform to construct precise maps and localize robots in indoor environments. We developed software applications to facilitate data acquisition, fingerprinting dataset collection, and accurate ground truth map building. Subsequently, we aligned the spatial information generated via the SLAM with the WiFi scans to create a comprehensive WiFi fingerprinting dataset. The created dataset was used to train a deep neural network (DNN) for indoor localization, which can prove the usefulness of grid density. We conducted experimental validation within our office environment to demonstrate the proposed method's effectiveness, including a heatmap from the dataset showcasing the spatial distribution of WiFi signal strengths for the testing access points placed within the environment. Notably, our method offers distinct advantages over existing approaches as it eliminates the need for a predefined map of the environment, requires no preparatory steps, lessens human intervention, creates a denser fingerprinting dataset, and reduces the WiFi fingerprinting dataset creation time. Our method achieves 26% more accurate localization than the other methods and can create a six times denser fingerprinting dataset in one-third of the time compared to the traditional method. In summary, using WiFi RSSI Fingerprinting data surveyed by the SLAM-Enabled Robotic Platform, we can adapt our trained DNN model to indoor localization in any dynamic environment and enhance its scalability and applicability in real-world scenarios.
- [22] arXiv:2407.14783 (replaced) [pdf, other]
-
Title: VisFly: An Efficient and Versatile Simulator for Training Vision-based FlightSubjects: Robotics (cs.RO)
We present VisFly, a quadrotor simulator designed to efficiently train vision-based flight policies using reinforcement learning algorithms. VisFly offers a user-friendly framework and interfaces, leveraging Habitat-Sim's rendering engines to achieve frame rates exceeding 10,000 frames per second for rendering motion and sensor data. The simulator incorporates differentiable physics and seamlessly integrates with the Gym environment, facilitating the straightforward implementation of various learning algorithms. It supports the direct import of all open-source scene datasets compatible with Habitat-Sim, enabling training on diverse real-world environments and ensuring fair comparisons of learned flight policies. We also propose a general policy architecture for three typical flight tasks relying on visual observations, which have been validated in our simulator using reinforcement learning. The simulator will be available at [this https URL].
- [23] arXiv:2210.08731 (replaced) [pdf, other]
-
Title: Evaluation of Pedestrian Safety in a High-Fidelity Simulation Environment FrameworkLin Ma, Longrui Chen, Yan Zhang, Mengdi Chu, Wenjie Jiang, Jiahao Shen, Chuxuan Li, Yifeng Shi, Nairui Luo, Jirui Yuan, Guyue Zhou, Jiangtao GongComments: Resubmit to ITSC 2024; Accept by ITSC 2024Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO)
Pedestrians' safety is a crucial factor in assessing autonomous driving scenarios. However, pedestrian safety evaluation is rarely considered by existing autonomous driving simulation platforms. This paper proposes a pedestrian safety evaluation method for autonomous driving, in which not only the collision events but also the conflict events together with the characteristics of pedestrians are fully considered. Moreover, to apply the pedestrian safety evaluation system, we construct a high-fidelity simulation framework embedded with pedestrian safety-critical characteristics. We demonstrate our simulation framework and pedestrian safety evaluation with a comparative experiment with two kinds of autonomous driving perception algorithms -- single-vehicle perception and vehicle-to-infrastructure (V2I) cooperative perception. The results show that our framework can evaluate different autonomous driving algorithms with detailed and quantitative pedestrian safety indexes. To this end, the proposed simulation method and framework can be used to access different autonomous driving algorithms and evaluate pedestrians' safety performance in future autonomous driving simulations, which can inspire more pedestrian-friendly autonomous driving algorithms.
- [24] arXiv:2312.04474 (replaced) [pdf, other]
-
Title: Chain of Code: Reasoning with a Language Model-Augmented Code EmulatorChengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, Brian IchterComments: ICML 2024 Oral; Project webpage: this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Code provides a general syntactic structure to build complex programs and perform precise computations when paired with a code interpreter - we hypothesize that language models (LMs) can leverage code-writing to improve Chain of Thought reasoning not only for logic and arithmetic tasks, but also for semantic ones (and in particular, those that are a mix of both). For example, consider prompting an LM to write code that counts the number of times it detects sarcasm in an essay: the LM may struggle to write an implementation for "detect_sarcasm(string)" that can be executed by the interpreter (handling the edge cases would be insurmountable). However, LMs may still produce a valid solution if they not only write code, but also selectively "emulate" the interpreter by generating the expected output of "detect_sarcasm(string)". In this work, we propose Chain of Code (CoC), a simple yet surprisingly effective extension that improves LM code-driven reasoning. The key idea is to encourage LMs to format semantic sub-tasks in a program as flexible pseudocode that the interpreter can explicitly catch undefined behaviors and hand off to simulate with an LM (as an "LMulator"). Experiments demonstrate that Chain of Code outperforms Chain of Thought and other baselines across a variety of benchmarks; on BIG-Bench Hard, Chain of Code achieves 84%, a gain of 12% over Chain of Thought. In a nutshell, CoC broadens the scope of reasoning questions that LMs can answer by "thinking in code".
- [25] arXiv:2402.16174 (replaced) [pdf, other]
-
Title: GenNBV: Generalizable Next-Best-View Policy for Active 3D ReconstructionComments: CVPR 2024 accepted paper. Project page: this http URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
While recent advances in neural radiance field enable realistic digitization for large-scale scenes, the image-capturing process is still time-consuming and labor-intensive. Previous works attempt to automate this process using the Next-Best-View (NBV) policy for active 3D reconstruction. However, the existing NBV policies heavily rely on hand-crafted criteria, limited action space, or per-scene optimized representations. These constraints limit their cross-dataset generalizability. To overcome them, we propose GenNBV, an end-to-end generalizable NBV policy. Our policy adopts a reinforcement learning (RL)-based framework and extends typical limited action space to 5D free space. It empowers our agent drone to scan from any viewpoint, and even interact with unseen geometries during training. To boost the cross-dataset generalizability, we also propose a novel multi-source state embedding, including geometric, semantic, and action representations. We establish a benchmark using the Isaac Gym simulator with the Houses3K and OmniObject3D datasets to evaluate this NBV policy. Experiments demonstrate that our policy achieves a 98.26% and 97.12% coverage ratio on unseen building-scale objects from these datasets, respectively, outperforming prior solutions.
- [26] arXiv:2403.11057 (replaced) [pdf, other]
-
Title: Large Language Models Powered Context-aware Motion Prediction in Autonomous DrivingComments: 6 pages,4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Motion prediction is among the most fundamental tasks in autonomous driving. Traditional methods of motion forecasting primarily encode vector information of maps and historical trajectory data of traffic participants, lacking a comprehensive understanding of overall traffic semantics, which in turn affects the performance of prediction tasks. In this paper, we utilized Large Language Models (LLMs) to enhance the global traffic context understanding for motion prediction tasks. We first conducted systematic prompt engineering, visualizing complex traffic environments and historical trajectory information of traffic participants into image prompts -- Transportation Context Map (TC-Map), accompanied by corresponding text prompts. Through this approach, we obtained rich traffic context information from the LLM. By integrating this information into the motion prediction model, we demonstrate that such context can enhance the accuracy of motion predictions. Furthermore, considering the cost associated with LLMs, we propose a cost-effective deployment strategy: enhancing the accuracy of motion prediction tasks at scale with 0.7\% LLM-augmented datasets. Our research offers valuable insights into enhancing the understanding of traffic scenes of LLMs and the motion prediction performance of autonomous driving. The source code is available at \url{this https URL} and \url{this https URL}.
- [27] arXiv:2403.13443 (replaced) [pdf, other]
-
Title: Fast-Poly: A Fast Polyhedral Framework For 3D Multi-Object TrackingComments: 1st on the NuScenes Tracking benchmark with 75.8 AMOTA and 34.2 FPSSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
3D Multi-Object Tracking (MOT) captures stable and comprehensive motion states of surrounding obstacles, essential for robotic perception. However, current 3D trackers face issues with accuracy and latency consistency. In this paper, we propose Fast-Poly, a fast and effective filter-based method for 3D MOT. Building upon our previous work Poly-MOT, Fast-Poly addresses object rotational anisotropy in 3D space, enhances local computation densification, and leverages parallelization technique, improving inference speed and precision. Fast-Poly is extensively tested on two large-scale tracking benchmarks with Python implementation. On the nuScenes dataset, Fast-Poly achieves new state-of-the-art performance with 75.8% AMOTA among all methods and can run at 34.2 FPS on a personal CPU. On the Waymo dataset, Fast-Poly exhibits competitive accuracy with 63.6% MOTA and impressive inference speed (35.5 FPS). The source code is publicly available at this https URL.
- [28] arXiv:2407.08260 (replaced) [pdf, other]
-
Title: SALSA: Swift Adaptive Lightweight Self-Attention for Enhanced LiDAR Place RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Large-scale LiDAR mappings and localization leverage place recognition techniques to mitigate odometry drifts, ensuring accurate mapping. These techniques utilize scene representations from LiDAR point clouds to identify previously visited sites within a database. Local descriptors, assigned to each point within a point cloud, are aggregated to form a scene representation for the point cloud. These descriptors are also used to re-rank the retrieved point clouds based on geometric fitness scores. We propose SALSA, a novel, lightweight, and efficient framework for LiDAR place recognition. It consists of a Sphereformer backbone that uses radial window attention to enable information aggregation for sparse distant points, an adaptive self-attention layer to pool local descriptors into tokens, and a multi-layer-perceptron Mixer layer for aggregating the tokens to generate a scene descriptor. The proposed framework outperforms existing methods on various LiDAR place recognition datasets in terms of both retrieval and metric localization while operating in real-time.
- [29] arXiv:2407.17673 (replaced) [pdf, other]
-
Title: CRASAR-U-DROIDs: A Large Scale Benchmark Dataset for Building Alignment and Damage Assessment in Georectified sUAS ImageryComments: 16 Pages, 7 Figures, 6 TablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
This document presents the Center for Robot Assisted Search And Rescue - Uncrewed Aerial Systems - Disaster Response Overhead Inspection Dataset (CRASAR-U-DROIDs) for building damage assessment and spatial alignment collected from small uncrewed aerial systems (sUAS) geospatial imagery. This dataset is motivated by the increasing use of sUAS in disaster response and the lack of previous work in utilizing high-resolution geospatial sUAS imagery for machine learning and computer vision models, the lack of alignment with operational use cases, and with hopes of enabling further investigations between sUAS and satellite imagery. The CRASAR-U-DRIODs dataset consists of fifty-two (52) orthomosaics from ten (10) federally declared disasters (Hurricane Ian, Hurricane Ida, Hurricane Harvey, Hurricane Idalia, Hurricane Laura, Hurricane Michael, Musset Bayou Fire, Mayfield Tornado, Kilauea Eruption, and Champlain Towers Collapse) spanning 67.98 square kilometers (26.245 square miles), containing 21,716 building polygons and damage labels, and 7,880 adjustment annotations. The imagery was tiled and presented in conjunction with overlaid building polygons to a pool of 130 annotators who provided human judgments of damage according to the Joint Damage Scale. These annotations were then reviewed via a two-stage review process in which building polygon damage labels were first reviewed individually and then again by committee. Additionally, the building polygons have been aligned spatially to precisely overlap with the imagery to enable more performant machine learning models to be trained. It appears that CRASAR-U-DRIODs is the largest labeled dataset of sUAS orthomosaic imagery.