Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (88)

Search Parameters:
Keywords = 6-DOF pose estimation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 8127 KiB  
Article
Data-Driven Kinematic Model for the End-Effector Pose Control of a Manipulator Robot
by Josué Goméz-Casas, Carlos A. Toro-Arcila, Nelly Abigaíl Rodríguez-Rosales, Jonathan Obregón-Flores, Daniela E. Ortíz-Ramos, Jesús Fernando Martínez-Villafañe and Oziel Gómez-Casas
Processes 2024, 12(12), 2831; https://doi.org/10.3390/pr12122831 - 10 Dec 2024
Viewed by 528
Abstract
This paper presents a data-driven kinematic model for the end-effector pose control applied to a variety of manipulator robots, focusing on the entire end-effector’s pose (position and orientation). The measured signals of the full pose and their computed derivatives, along with a linear [...] Read more.
This paper presents a data-driven kinematic model for the end-effector pose control applied to a variety of manipulator robots, focusing on the entire end-effector’s pose (position and orientation). The measured signals of the full pose and their computed derivatives, along with a linear combination of an estimated Jacobian matrix and a vector of joint velocities, generate a model estimation error. The Jacobian matrix is estimated using the Pseudo Jacobian Matrix (PJM) algorithm, which requires tuning only the step and weight parameters that scale the convergence of the model estimation error. The proposed control law is derived in two stages: the first one is part of an objective function minimization, and the second one is a constraint in a quasi-Lagrangian function. The control design parameters guarantee the control error convergence in a closed-loop configuration with adaptive behavior in terms of the dynamics of the estimated Jacobian matrix. The novelty of the approach lies in its ability to achieve superior tracking performance across different manipulator robots, validated through simulations. Quantitative results show that, compared to a classical inverse-kinematics approach, the proposed method achieves rapid convergence of performance indices (e.g., Root Mean Square Error (RMSE) reduced to near-zero in two cycles vs. a steady-state RMSE of 20 in the classical approach). Additionally, the proposed method minimizes joint drift, maintaining an RMSE of approximately 0.3 compared to 1.5 under the classical scheme. The control was validated by means of simulations featuring an UR5e manipulator with six Degrees of Freedom (DOF), a KUKA Youbot with eight DOF, and a KUKA Youbot Dual with thirteen DOF. The stability analysis of the closed-loop controller is demonstrated by means of the Lyapunov stability conditions. Full article
Show Figures

Figure 1

20 pages, 3018 KiB  
Article
Global Semantic Localization from Abstract Ellipse-Ellipsoid Model and Object-Level Instance Topology
by Heng Wu, Yanjie Liu, Chao Wang and Yanlong Wei
Remote Sens. 2024, 16(22), 4187; https://doi.org/10.3390/rs16224187 - 10 Nov 2024
Viewed by 564
Abstract
Robust and highly accurate localization using a camera is a challenging task when appearance varies significantly. In indoor environments, changes in illumination and object occlusion can have a significant impact on visual localization. In this paper, we propose a visual localization method based [...] Read more.
Robust and highly accurate localization using a camera is a challenging task when appearance varies significantly. In indoor environments, changes in illumination and object occlusion can have a significant impact on visual localization. In this paper, we propose a visual localization method based on an ellipse-ellipsoid model, combined with object-level instance topology and alignment. First, we develop a CNN-based (Convolutional Neural Network) ellipse prediction network, DEllipse-Net, which integrates depth information with RGB data to estimate the projection of ellipsoids onto images. Second, we model environments using 3D (Three-dimensional) ellipsoids, instance topology, and ellipsoid descriptors. Finally, the detected ellipses are aligned with the ellipsoids in the environment through semantic object association, and 6-DoF (Degree of Freedom) pose estimation is performed using the ellipse-ellipsoid model. In the bounding box noise experiment, DEllipse-Net demonstrates higher robustness compared to other methods, achieving the highest prediction accuracy for 11 out of 23 objects in ellipse prediction. In the localization test with 15 pixels of noise, we achieve ATE (Absolute Translation Error) and ARE (Absolute Rotation Error) of 0.077 m and 2.70 in the fr2_desk sequence. Additionally, DEllipse-Net is lightweight and highly portable, with a model size of only 18.6 MB, and a single model can handle all objects. In the object-level instance topology and alignment experiment, our topology and alignment methods significantly enhance the global localization accuracy of the ellipse-ellipsoid model. In experiments involving lighting changes and occlusions, our method achieves more robust global localization compared to the classical bag-of-words based localization method and other ellipse-ellipsoid localization methods. Full article
Show Figures

Figure 1

23 pages, 8425 KiB  
Article
Enhancing Inter-AUV Perception: Adaptive 6-DOF Pose Estimation with Synthetic Images for AUV Swarm Sensing
by Qingbo Wei, Yi Yang, Xingqun Zhou, Zhiqiang Hu, Yan Li, Chuanzhi Fan, Quan Zheng and Zhichao Wang
Drones 2024, 8(9), 486; https://doi.org/10.3390/drones8090486 - 14 Sep 2024
Cited by 1 | Viewed by 838
Abstract
The capabilities of AUV mutual perception and localization are crucial for the development of AUV swarm systems. We propose the AUV6D model, a synthetic image-based approach to enhance inter-AUV perception through 6D pose estimation. Due to the challenge of acquiring accurate 6D pose [...] Read more.
The capabilities of AUV mutual perception and localization are crucial for the development of AUV swarm systems. We propose the AUV6D model, a synthetic image-based approach to enhance inter-AUV perception through 6D pose estimation. Due to the challenge of acquiring accurate 6D pose data, a dataset of simulated underwater images with precise pose labels was generated using Unity3D. Mask-CycleGAN technology was introduced to transform these simulated images into realistic synthetic images, addressing the scarcity of available underwater data. Furthermore, the Color Intermediate Domain Mapping strategy is proposed to ensure alignment across different image styles at pixel and feature levels, enhancing the adaptability of the pose estimation model. Additionally, the Salient Keypoint Vector Voting Mechanism was developed to improve the accuracy and robustness of underwater pose estimation, enabling precise localization even in the presence of occlusions. The experimental results demonstrated that our AUV6D model achieved millimeter-level localization precision and pose estimation errors within five degrees, showing exceptional performance in complex underwater environments. Navigation experiments with two AUVs further verified the model’s reliability for mutual 6D pose estimation. This research provides substantial technical support for more complex and precise collaborative operations for AUV swarms in the future. Full article
Show Figures

Figure 1

24 pages, 5021 KiB  
Article
A Robust Tri-Electromagnet-Based 6-DoF Pose Tracking System Using an Error-State Kalman Filter
by Shuda Dong and Heng Wang
Sensors 2024, 24(18), 5956; https://doi.org/10.3390/s24185956 - 13 Sep 2024
Viewed by 706
Abstract
Magnetic pose tracking is a non-contact, accurate, and occlusion-free method that has been increasingly employed to track intra-corporeal medical devices such as endoscopes in computer-assisted medical interventions. In magnetic pose-tracking systems, a nonlinear estimation algorithm is needed to recover the pose information from [...] Read more.
Magnetic pose tracking is a non-contact, accurate, and occlusion-free method that has been increasingly employed to track intra-corporeal medical devices such as endoscopes in computer-assisted medical interventions. In magnetic pose-tracking systems, a nonlinear estimation algorithm is needed to recover the pose information from magnetic measurements. In existing pose estimation algorithms such as the extended Kalman filter (EKF), the 3-DoF orientation in the S3 manifold is normally parametrized as unit quaternions and simply treated as a vector in the Euclidean space, which causes a violation of the unity constraint of quaternions and reduces pose tracking accuracy. In this paper, a pose estimation algorithm based on the error-state Kalman filter (ESKF) is proposed to improve the accuracy and robustness of electromagnetic tracking systems. The proposed system consists of three electromagnetic coils for magnetic field generation and a tri-axial magnetic sensor attached to the target object for field measurement. A strategy of sequential coil excitation is developed to separate the magnetic fields from different coils and reject magnetic disturbances. Simulation and experiments are conducted to evaluate the pose tracking performance of the proposed ESKF algorithm, which is also compared with standard EKF and constrained EKF. It is shown that the ESKF can effectively maintain the quaternion unity and thus achieve a better tracking accuracy, i.e., a Euclidean position error of 2.23 mm and an average orientation angle error of 0.45°. The disturbance rejection performance of the electromagnetic tracking system is also experimentally validated. Full article
Show Figures

Graphical abstract

25 pages, 19272 KiB  
Article
6DoF Object Pose and Focal Length Estimation from Single RGB Images in Uncontrolled Environments
by Mayura Manawadu and Soon-Yong Park
Sensors 2024, 24(17), 5474; https://doi.org/10.3390/s24175474 - 23 Aug 2024
Viewed by 1327
Abstract
Accurate 6DoF (degrees of freedom) pose and focal length estimation are important in extended reality (XR) applications, enabling precise object alignment and projection scaling, thereby enhancing user experiences. This study focuses on improving 6DoF pose estimation using single RGB images of unknown camera [...] Read more.
Accurate 6DoF (degrees of freedom) pose and focal length estimation are important in extended reality (XR) applications, enabling precise object alignment and projection scaling, thereby enhancing user experiences. This study focuses on improving 6DoF pose estimation using single RGB images of unknown camera metadata. Estimating the 6DoF pose and focal length from an uncontrolled RGB image, obtained from the internet, is challenging because it often lacks crucial metadata. Existing methods such as FocalPose and Focalpose++ have made progress in this domain but still face challenges due to the projection scale ambiguity between the translation of an object along the z-axis (tz) and the camera’s focal length. To overcome this, we propose a two-stage strategy that decouples the projection scaling ambiguity in the estimation of z-axis translation and focal length. In the first stage, tz is set arbitrarily, and we predict all the other pose parameters and focal length relative to the fixed tz. In the second stage, we predict the true value of tz while scaling the focal length based on the tz update. The proposed two-stage method reduces projection scale ambiguity in RGB images and improves pose estimation accuracy. The iterative update rules constrained to the first stage and tailored loss functions including Huber loss in the second stage enhance the accuracy in both 6DoF pose and focal length estimation. Experimental results using benchmark datasets show significant improvements in terms of median rotation and translation errors, as well as better projection accuracy compared to the existing state-of-the-art methods. In an evaluation across the Pix3D datasets (chair, sofa, table, and bed), the proposed two-stage method improves projection accuracy by approximately 7.19%. Additionally, the incorporation of Huber loss resulted in a significant reduction in translation and focal length errors by 20.27% and 6.65%, respectively, in comparison to the Focalpose++ method. Full article
(This article belongs to the Special Issue Computer Vision and Virtual Reality: Technologies and Applications)
Show Figures

Figure 1

18 pages, 4498 KiB  
Article
Selective Grasping for Complex-Shaped Parts Using Topological Skeleton Extraction
by Andrea Pennisi, Monica Sileo, Domenico Daniele Bloisi and Francesco Pierri
Electronics 2024, 13(15), 3021; https://doi.org/10.3390/electronics13153021 - 31 Jul 2024
Viewed by 680
Abstract
To enhance the autonomy and flexibility of robotic systems, a crucial role is played by the capacity to perceive and grasp objects. More in detail, robot manipulators must detect the presence of the objects within their workspace, identify the grasping point, and compute [...] Read more.
To enhance the autonomy and flexibility of robotic systems, a crucial role is played by the capacity to perceive and grasp objects. More in detail, robot manipulators must detect the presence of the objects within their workspace, identify the grasping point, and compute a trajectory for approaching the objects with a pose of the end-effector suitable for performing the task. These can be challenging tasks in the presence of complex geometries, where multiple grasping-point candidates can be detected. In this paper, we present a novel approach for dealing with complex-shaped automotive parts consisting of a deep-learning-based method for topological skeleton extraction and an active grasping pose selection mechanism. In particular, we use a modified version of the well-known Lightweight OpenPose algorithm to estimate the topological skeleton of real-world automotive parts. The estimated skeleton is used to select the best grasping pose for the object at hand. Our approach is designed to be more computationally efficient with respect to other existing grasping pose detection methods. Quantitative experiments conducted with a 7 DoF manipulator on different real-world automotive components demonstrate the effectiveness of the proposed approach with a success rate of 87.04%. Full article
(This article belongs to the Special Issue Applications of Machine Vision in Robotics)
Show Figures

Figure 1

27 pages, 3382 KiB  
Article
DOT-SLAM: A Stereo Visual Simultaneous Localization and Mapping (SLAM) System with Dynamic Object Tracking Based on Graph Optimization
by Yuan Zhu, Hao An, Huaide Wang, Ruidong Xu, Zhipeng Sun and Ke Lu
Sensors 2024, 24(14), 4676; https://doi.org/10.3390/s24144676 - 18 Jul 2024
Cited by 2 | Viewed by 1228
Abstract
Most visual simultaneous localization and mapping (SLAM) systems are based on the assumption of a static environment in autonomous vehicles. However, when dynamic objects, particularly vehicles, occupy a large portion of the image, the localization accuracy of the system decreases significantly. To mitigate [...] Read more.
Most visual simultaneous localization and mapping (SLAM) systems are based on the assumption of a static environment in autonomous vehicles. However, when dynamic objects, particularly vehicles, occupy a large portion of the image, the localization accuracy of the system decreases significantly. To mitigate this challenge, this paper unveils DOT-SLAM, a novel stereo visual SLAM system that integrates dynamic object tracking through graph optimization. By integrating dynamic object pose estimation into the SLAM system, the system can effectively utilize both foreground and background points for ego vehicle localization and obtain a static feature points map. To rectify the inaccuracies in depth estimation from stereo disparity directly on the foreground points of dynamic objects due to their self-similarity characteristics, a coarse-to-fine depth estimation method based on camera–road plane geometry is presented. This method uses rough depth to guide fine stereo matching, thereby obtaining the 3 dimensions (3D)spatial positions of feature points on dynamic objects. Subsequently, by establishing constraints on the dynamic object’s pose using the road plane and non-holonomic constraints (NHCs) of the vehicle, reducing the initial pose uncertainty of dynamic objects leads to more accurate dynamic object initialization. Finally, by considering foreground points, background points, the local road plane, the ego vehicle pose, and dynamic object poses as optimization nodes, through the establishment and joint optimization of a nonlinear model based on graph optimization, accurate six degrees of freedom (DoFs) pose estimations are obtained for both the ego vehicle and dynamic objects. Experimental validation on the KITTI-360 dataset demonstrates that DOT-SLAM effectively utilizes features from the background and dynamic objects in the environment, resulting in more accurate vehicle trajectory estimation and a static environment map. Results obtained from a real-world dataset test reinforce the effectiveness. Full article
Show Figures

Figure 1

21 pages, 3782 KiB  
Article
Globally Optimal Relative Pose and Scale Estimation from Only Image Correspondences with Known Vertical Direction
by Zhenbao Yu, Shirong Ye, Changwei Liu, Ronghe Jin, Pengfei Xia and Kang Yan
ISPRS Int. J. Geo-Inf. 2024, 13(7), 246; https://doi.org/10.3390/ijgi13070246 - 9 Jul 2024
Viewed by 899
Abstract
Installing multi-camera systems and inertial measurement units (IMUs) in self-driving cars, micro aerial vehicles, and robots is becoming increasingly common. An IMU provides the vertical direction, allowing coordinate frames to be aligned in a common direction. The degrees of freedom (DOFs) of the [...] Read more.
Installing multi-camera systems and inertial measurement units (IMUs) in self-driving cars, micro aerial vehicles, and robots is becoming increasingly common. An IMU provides the vertical direction, allowing coordinate frames to be aligned in a common direction. The degrees of freedom (DOFs) of the rotation matrix are reduced from 3 to 1. In this paper, we propose a globally optimal solver to calculate the relative poses and scale of generalized cameras with a known vertical direction. First, the cost function is established to minimize algebraic error in the least-squares sense. Then, the cost function is transformed into two polynomials with only two unknowns. Finally, the eigenvalue method is used to solve the relative rotation angle. The performance of the proposed method is verified on both simulated and KITTI datasets. Experiments show that our method is more accurate than the existing state-of-the-art solver in estimating the relative pose and scale. Compared to the best method among the comparison methods, the method proposed in this paper reduces the rotation matrix error, translation vector error, and scale error by 53%, 67%, and 90%, respectively. Full article
Show Figures

Figure 1

25 pages, 5039 KiB  
Article
Test Platform for Developing New Optical Position Tracking Technology towards Improved Head Motion Correction in Magnetic Resonance Imaging
by Marina Silic, Fred Tam and Simon J. Graham
Sensors 2024, 24(12), 3737; https://doi.org/10.3390/s24123737 - 8 Jun 2024
Viewed by 832
Abstract
Optical tracking of head pose via fiducial markers has been proven to enable effective correction of motion artifacts in the brain during magnetic resonance imaging but remains difficult to implement in the clinic due to lengthy calibration and set up times. Advances in [...] Read more.
Optical tracking of head pose via fiducial markers has been proven to enable effective correction of motion artifacts in the brain during magnetic resonance imaging but remains difficult to implement in the clinic due to lengthy calibration and set up times. Advances in deep learning for markerless head pose estimation have yet to be applied to this problem because of the sub-millimetre spatial resolution required for motion correction. In the present work, two optical tracking systems are described for the development and training of a neural network: one marker-based system (a testing platform for measuring ground truth head pose) with high tracking fidelity to act as the training labels, and one markerless deep-learning-based system using images of the markerless head as input to the network. The markerless system has the potential to overcome issues of marker occlusion, insufficient rigid attachment of the marker, lengthy calibration times, and unequal performance across degrees of freedom (DOF), all of which hamper the adoption of marker-based solutions in the clinic. Detail is provided on the development of a custom moiré-enhanced fiducial marker for use as ground truth and on the calibration procedure for both optical tracking systems. Additionally, the development of a synthetic head pose dataset is described for the proof of concept and initial pre-training of a simple convolutional neural network. Results indicate that the ground truth system has been sufficiently calibrated and can track head pose with an error of <1 mm and <1°. Tracking data of a healthy, adult participant are shown. Pre-training results show that the average root-mean-squared error across the 6 DOF is 0.13 and 0.36 (mm or degrees) on a head model included and excluded from the training dataset, respectively. Overall, this work indicates excellent feasibility of the deep-learning-based approach and will enable future work in training and testing on a real dataset in the MRI environment. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

13 pages, 2958 KiB  
Article
Research on Six-Degree-of-Freedom Refueling Robotic Arm Positioning and Docking Based on RGB-D Visual Guidance
by Mingbo Yang and Jiapeng Liu
Appl. Sci. 2024, 14(11), 4904; https://doi.org/10.3390/app14114904 - 5 Jun 2024
Cited by 1 | Viewed by 1383
Abstract
The main contribution of this paper is the proposal of a six-degree-of-freedom (6-DoF) refueling robotic arm positioning and docking technology guided by RGB-D camera visual guidance, as well as conducting in-depth research and experimental validation on the technology. We have integrated the YOLOv8 [...] Read more.
The main contribution of this paper is the proposal of a six-degree-of-freedom (6-DoF) refueling robotic arm positioning and docking technology guided by RGB-D camera visual guidance, as well as conducting in-depth research and experimental validation on the technology. We have integrated the YOLOv8 algorithm with the Perspective-n-Point (PnP) algorithm to achieve precise detection and pose estimation of the target refueling interface. The focus is on resolving the recognition and positioning challenges of a specialized refueling interface by the 6-DoF robotic arm during the automated refueling process. To capture the unique characteristics of the refueling interface, we developed a dedicated dataset for the specialized refueling connectors, ensuring the YOLO algorithm’s accurate identification of the target interfaces. Subsequently, the detected interface information is converted into precise 6-DoF pose data using the PnP algorithm. These data are used to determine the desired end-effector pose of the robotic arm. The robotic arm’s movements are controlled through a trajectory planning algorithm to complete the refueling gun docking process. An experimental setup was established in the laboratory to validate the accuracy of the visual recognition and the applicability of the robotic arm’s docking posture. The experimental results demonstrate that under general lighting conditions, the recognition accuracy of this docking interface method meets the docking requirements. Compared to traditional vision-guided methods based on OpenCV, this visual guidance algorithm exhibits better adaptability and effectively provides pose information for the robotic arm. Full article
Show Figures

Figure 1

21 pages, 5094 KiB  
Article
TQU-SLAM Benchmark Dataset for Comparative Study to Build Visual Odometry Based on Extracted Features from Feature Descriptors and Deep Learning
by Thi-Hao Nguyen, Van-Hung Le, Huu-Son Do, Trung-Hieu Te and Van-Nam Phan
Future Internet 2024, 16(5), 174; https://doi.org/10.3390/fi16050174 - 17 May 2024
Cited by 1 | Viewed by 1250
Abstract
The problem of data enrichment to train visual SLAM and VO construction models using deep learning (DL) is an urgent problem today in computer vision. DL requires a large amount of data to train a model, and more data with many different contextual [...] Read more.
The problem of data enrichment to train visual SLAM and VO construction models using deep learning (DL) is an urgent problem today in computer vision. DL requires a large amount of data to train a model, and more data with many different contextual and conditional conditions will create a more accurate visual SLAM and VO construction model. In this paper, we introduce the TQU-SLAM benchmark dataset, which includes 160,631 RGB-D frame pairs. It was collected from the corridors of three interconnected buildings comprising a length of about 230 m. The ground-truth data of the TQU-SLAM benchmark dataset were prepared manually, including 6-DOF camera poses, 3D point cloud data, intrinsic parameters, and the transformation matrix between the camera coordinate system and the real world. We also tested the TQU-SLAM benchmark dataset using the PySLAM framework with traditional features such as SHI_TOMASI, SIFT, SURF, ORB, ORB2, AKAZE, KAZE, and BRISK and features extracted from DL such as VGG, DPVO, and TartanVO. The camera pose estimation results are evaluated, and we show that the ORB2 features have the best results (Errd = 5.74 mm), while the ratio of the number of frames with detected keypoints of the SHI_TOMASI feature is the best (rd=98.97%). At the same time, we also present and analyze the challenges of the TQU-SLAM benchmark dataset for building visual SLAM and VO systems. Full article
(This article belongs to the Special Issue Machine Learning Techniques for Computer Vision)
Show Figures

Figure 1

17 pages, 4069 KiB  
Article
A Lightweight 6D Pose Estimation Network Based on Improved Atrous Spatial Pyramid Pooling
by Fupan Wang, Xiaohang Tang, Yadong Wu, Yinfan Wang, Huarong Chen, Guijuan Wang and Jing Liao
Electronics 2024, 13(7), 1321; https://doi.org/10.3390/electronics13071321 - 1 Apr 2024
Viewed by 1270
Abstract
It is difficult for lightweight neural networks to produce accurate 6DoF pose estimation effects due to their accuracy being affected by scale changes. To solve this problem, we propose a method with good performance and robustness based on previous research. The enhanced PVNet-based [...] Read more.
It is difficult for lightweight neural networks to produce accurate 6DoF pose estimation effects due to their accuracy being affected by scale changes. To solve this problem, we propose a method with good performance and robustness based on previous research. The enhanced PVNet-based method uses depth-wise convolution to build a lightweight network. In addition, coordinate attention and atrous spatial pyramid pooling are used to ensure accuracy and robustness. This method effectively reduces the network size and computational complexity and is a lightweight 6DoF pose estimation method based on monocular RGB images. Experiments on public datasets and self-built datasets show that the average ADD(-S) estimation accuracy and 2D projection index of the improved method are improved. For datasets with large changes in object scale, the estimation accuracy of the average ADD(-S) is greatly improved. Full article
Show Figures

Figure 1

21 pages, 11283 KiB  
Article
A Method for Unseen Object Six Degrees of Freedom Pose Estimation Based on Segment Anything Model and Hybrid Distance Optimization
by Li Xin, Hu Lin, Xinjun Liu and Shiyu Wang
Electronics 2024, 13(4), 774; https://doi.org/10.3390/electronics13040774 - 16 Feb 2024
Viewed by 1556
Abstract
Six degrees of freedom pose estimation technology constitutes the cornerstone for precise robotic control and similar tasks. Addressing the limitations of current 6-DoF pose estimation methods in handling object occlusions and unknown objects, we have developed a novel two-stage 6-DoF pose estimation method [...] Read more.
Six degrees of freedom pose estimation technology constitutes the cornerstone for precise robotic control and similar tasks. Addressing the limitations of current 6-DoF pose estimation methods in handling object occlusions and unknown objects, we have developed a novel two-stage 6-DoF pose estimation method that integrates RGB-D data with CAD models. Initially, targeting high-quality zero-shot object instance segmentation tasks, we innovated the CAE-SAM model based on the SAM framework. In addressing the SAM model’s boundary blur, mask voids, and over-segmentation issues, this paper introduces innovative strategies such as local spatial-feature-enhancement modules, global context markers, and a bounding box generator. Subsequently, we proposed a registration method optimized through a hybrid distance metric to diminish the dependency of point cloud registration algorithms on sensitive hyperparameters. Experimental results on the HQSeg-44K dataset substantiate the notable improvements in instance segmentation accuracy and robustness rendered by the CAE-SAM model. Moreover, the efficacy of this two-stage method is further corroborated using a 6-DoF pose dataset of workpieces constructed with CloudCompare and RealSense. For unseen targets, the ADD metric achieved 2.973 mm, and the ADD-S metric reached 1.472 mm. This paper significantly enhances pose estimation performance and streamlines the algorithm’s deployment and maintenance procedures. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Graphical abstract

33 pages, 3013 KiB  
Review
A Survey of 6DoF Object Pose Estimation Methods for Different Application Scenarios
by Jian Guan, Yingming Hao, Qingxiao Wu, Sicong Li and Yingjian Fang
Sensors 2024, 24(4), 1076; https://doi.org/10.3390/s24041076 - 7 Feb 2024
Cited by 5 | Viewed by 9221
Abstract
Recently, 6DoF object pose estimation has become increasingly important for a broad range of applications in the fields of virtual reality, augmented reality, autonomous driving, and robotic operations. This task involves extracting the target area from the input data and subsequently determining the [...] Read more.
Recently, 6DoF object pose estimation has become increasingly important for a broad range of applications in the fields of virtual reality, augmented reality, autonomous driving, and robotic operations. This task involves extracting the target area from the input data and subsequently determining the position and orientation of the objects. In recent years, many new advances have been made in pose estimation. However, existing reviews have the problem of only summarizing category-level or instance-level methods, and not comprehensively summarizing deep learning methods. This paper will provide a comprehensive review of the latest progress in 6D pose estimation to help researchers better understanding this area. In this study, the current methods about 6DoF object pose estimation are mainly categorized into two groups: instance-level and category-level groups, based on whether it is necessary to acquire the CAD model of the object. Recent advancements about learning-based 6DoF pose estimation methods are comprehensively reviewed. The study systematically explores the innovations and applicable scenarios of various methods. It provides an overview of widely used datasets, task metrics, and diverse application scenarios. Furthermore, state-of-the-art methods are compared across publicly accessible datasets, taking into account differences in input data types. Finally, we summarize the challenges of current tasks, methods for different applications, and future development directions. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

15 pages, 7258 KiB  
Article
Active Object Detection and Tracking Using Gimbal Mechanisms for Autonomous Drone Applications
by Jakob Grimm Hansen and Rui Pimentel de Figueiredo
Drones 2024, 8(2), 55; https://doi.org/10.3390/drones8020055 - 6 Feb 2024
Cited by 2 | Viewed by 4456
Abstract
Object recognition, localization, and tracking play a role of primordial importance in computer vision applications. However, it is still an extremely difficult task, particularly in scenarios where objects are attended to using fast-moving UAVs that need to robustly operate in real time. Typically [...] Read more.
Object recognition, localization, and tracking play a role of primordial importance in computer vision applications. However, it is still an extremely difficult task, particularly in scenarios where objects are attended to using fast-moving UAVs that need to robustly operate in real time. Typically the performance of these vision-based systems is affected by motion blur and geometric distortions, to name but two issues. Gimbal systems are thus essential to compensate for motion blur and ensure visual streams are stable. In this work, we investigate the advantages of active tracking approaches using a three-degrees-of-freedom (DoF) gimbal system mounted on UAVs. A method that utilizes joint movement and visual information for actively tracking spherical and planar objects in real time is proposed. Tracking methodologies are tested and evaluated in two different realistic Gazebo simulation environments: the first on 3D positional tracking (sphere) and the second on tracking of 6D poses (planar fiducial markers). We show that active object tracking is advantageous for UAV applications, first, by reducing motion blur, caused by fast camera motion and vibrations, and, second, by fixating the object of interest within the center of the field of view and thus reducing re-projection errors due to peripheral distortion. The results demonstrate significant object pose estimation accuracy improvements of active approaches when compared with traditional passive ones. More specifically, a set of experiments suggests that active gimbal tracking can increase the spatial estimation accuracy of known-size moving objects, under conditions of challenging motion patterns and in the presence of image distortion. Full article
(This article belongs to the Section Drone Design and Development)
Show Figures

Figure 1

Back to TopTop