Efficient Robot Localization Through Deep Learning-Based Natural Fiduciary Pattern Recognition
Abstract
:1. Introduction
2. Background and Relevance
3. Application Case
4. Materials and Methods
- Input: None.
- Step 1: The robot starts from an initial known position.
- Step 2: The robot captures images labeled with positions of the scene.
- Output: Image dataset labeled with associated positions.
- Input: Image dataset labeled with associated positions; pre-trained CNN.
- Step 1: The CNN is trained using the image dataset labeled with associated positions.
- Step 2: Once the CNN is trained, the most significant zones are identified. A heat map is obtained with predicted error for each position. The best error predictions give the most significant patterns.
- Output: Localization error heat map; trained CNN.
- ○
- A KUKA robot with 6 degrees of freedom (LBR iiwa 7 R800 model);
- ○
- A high resolution camera;
- ○
- Programming language: MATLAB desktop;
- ○
- Custom-developed software for dividing the image;
- ○
- A pre-trained VGG16 model CNN (35 min 11 s in training process);
- ○
- An HP laptop with a Core i5 processor, RAM (8 GB), disk (356 GB).
- 1.
- Convolution operation: each convolution layer performs the operation
- is the pth tensor’s value at position and the qth channel. In the first layer, this input is the image data.
- are the weights from the th filter.
- 2.
- Applying the activation function (ReLU):
- 3.
- Max-pooling operation:
- 4.
- Dense layers:
- is the th value of the th tensor.
- is the and weight of the th layer.
- is the bias of the th layer.
5. Results
- The position of the objects in the scenario. The fact that the objects in the scenario are positioned in their original locations is a fundamental aspect of our study, as it enables accurate localization and the mapping of potential trajectories.
- The shape of natural fiducial patterns is another key aspect. Their form indicates the space occupied by these objects, which is an essential element for accurate localization and trajectory mapping.
6. Discussion
7. Conclusions
8. General Applications and Future Directions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Durrant-Whyte, H.; Bailey, T. Simultaneous localization and mapping: Part I. IEEE Robot. Autom. Mag. 2006, 13, 99–110. [Google Scholar] [CrossRef]
- Thrun, S.; Burgard, W.; Fox, D. Probabilistic Robotics Cambridge; MIT Press: Cambridge, MA, USA, 2005. [Google Scholar]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Siciliano, B. Springer Handbook of Robotics; Springer: Berlin/Heidelberg, Germany, 2008; Volume 2, pp. 15–35. [Google Scholar]
- Scaramuzza, D.; Fraundorfer, F. Visual odometry [tutorial]. IEEE Robot. Autom. Mag. 2011, 18, 80–92. [Google Scholar] [CrossRef]
- Garrido-Jurado, S.; Muñoz-Salinas, R.; Madrid-Cuevas, F.J.; Marín-Jiménez, M.J. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognit. 2014, 47, 2280–2292. [Google Scholar] [CrossRef]
- Muñoz-Salinas, R.; Marín-Jimenez, M.J.; Yeguas-Bolivar, E.; Medina-Carnicer, R. Mapping and localization from planar markers. Pattern Recognit. 2018, 73, 158–171. [Google Scholar] [CrossRef]
- Olson, E. AprilTag: A robust and flexible visual fiducial system. In Proceedings of the 2011 IEEE international conference on robotics and automation, Shanghai, China, 9–13 May 2011; pp. 3400–3407. [Google Scholar]
- Mur-Artal, R.; Tardós, J.D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
- Cadena, C.; Carlone, L.; Carrillo, H.; Latif, Y.; Scaramuzza, D.; Neira, J.; Reid, I.; Leonard, J.J. Past, present, and future of simultaneous localization and mapping: Toward the robust-p perception age. IEEE Trans. Robot. 2016, 32, 1309–1332. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Leutenegger, S.; Chli, M.; Siegwart, R.Y. BRISK: Binary robust invariant scalable keypoints. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2548–2555. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Bay, H.; Tuytelaars, T.; Van Gool, L. SURF: Speeded up robust features. In Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006. Proceedings, Part. I 9; Springer: Berlin/Heidelberg, Germany, 2006; pp. 404–417. [Google Scholar]
- Liu, J.; Liu, Z. The Vision-Based Target Recognition, Localization, and Control for Harvesting Robots: A Review. Int. J. Precis. Eng. Manuf. 2024, 25, 409–428. [Google Scholar] [CrossRef]
- Zhou, H.; Yang, G.; Wang, B.; Li, X.; Wang, R.; Huang, X.; Wu, H.; Wang, X.V. An attention-based deep learning approach for inertial motion recognition and estimation in human-robot collaboration. J. Manuf. Syst. 2023, 67, 97–110. [Google Scholar] [CrossRef]
- Zhang, C.; Li, M.; Chen, Y.; Yang, Z.; He, B.; Li, X.; Xie, J.; Xu, G. An anthropomorphic robotic hand with a soft-rigid hybrid structure and positive-negative pneumatic actuation. IEEE Robot. Autom. Lett. 2023, 8, 4346–4353. [Google Scholar] [CrossRef]
- Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
- Schops, T.; Schönberger, J.L.; Galliani, S.; Sattler, T.; Schindler, K.; Pollefeys, M.; Geiger, A. A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3260–3269. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst. 2014, 27, 1–9. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5-9 October 2015, Proceedings, Part III 18; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
- Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Kaess, M.; Ranganathan, A.; Dellaert, F. iSAM: Incremental smoothing and mapping. IEEE Trans. Robot. 2008, 24, 1365–1378. [Google Scholar] [CrossRef]
- Wang, Q.; Yuan, C.; Liu, Y. Learning deep conditional neural network for image segmentation. IEEE Trans. Multimed. 2019, 21, 1839–1852. [Google Scholar] [CrossRef]
- Zhang, Z. Microsoft kinect sensor and its effect. IEEE Multimedia 2012, 19, 4–10. [Google Scholar] [CrossRef]
- Kendall, A.; Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
- Engel, J.; Schöps, T.; Cremers, D. LSD-SLAM: Large-scale direct monocular SLAM. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 834–849. [Google Scholar]
Patterns | Short Description | Recent Advancements | Refs. |
---|---|---|---|
Visual SLAM | It estimates the position of a robot utilizing visual data, typically obtained from cameras, while constructing a map of the surrounding environment. It is applied in robotics, augmented reality (AR), and autonomous vehicles. | Feature-Based Methods: Identifying and tracking visual features over time. Recent advancements include improved versions of ORB-SLAM and new algorithms. Direct Methods: These operate directly on pixel intensities rather than extracting features. Examples include updated versions of DSO and new direct methods. Deep Learning-Based Approaches: Learning-based methods leveraging convolutional neural networks (CNNs) for feature extraction and end-to-end SLAM solutions. | [12,13,14] |
Object Identification | Object identification involves recognizing and categorizing objects within an image or video. | Conventional Methods: Techniques such as improved versions of HOG and DPM. Deep Learning: CNNs have revolutionized the field, with updated models like YOLOv5, Faster R-CNN, and SSD providing state-of-the-art performance. | [15,16] |
ARUCO Markers | ARUCO markers are binary square fiducial markers used for camera pose estimation and calibration. | Detection Algorithms: Recent improvements in ARUCO marker detection algorithms and OpenCV library implementations. They are used in augmented reality, robotics, and computer vision applications like camera calibration and 3D scene reconstruction. | [17,18] |
AprilTag | AprilTag is another type of fiducial marker designed for robust and efficient detection. | Tag Design: Updated designs of AprilTags ensuring high detection rates and minimal false positives. Detection Performance: Recent updates in the AprilTag library offer fast and reliable detection, suitable for real-time applications in robotics and AR. | [19] |
SURF (Speeded-Up Robust Features) | SURF is a feature detector and descriptor used for object recognition, image registration, and 3D reconstruction. | Feature Detection: Recent improvements in SURF for detecting points of interest. Descriptor: Updated SURF descriptors that describe the neighborhood around each detected feature, allowing for robust matching. | [20] |
ORB (Oriented FAST and Rotated BRIEF) | ORB is an efficient alternative to SIFT and SURF, providing comparable performance at a lower computational cost. | Feature Detection: Recent improvements in ORB for detecting keypoints. Descriptor: Updated BRIEF descriptors that are rotated according to the keypoint orientation to achieve rotation invariance. | [21,22] |
SIFT (Scale-Invariant Feature Transform) | SIFT is a robust feature detection and description algorithm widely used in computer vision. | Feature Detection: Recent advancements in SIFT for detecting scale-invariant keypoints. Descriptor: Updated SIFT descriptors based on the gradient histogram of the keypoint neighborhood, providing robustness to scale, rotation, and affine transformations. | [23] |
1. | Input Layer: 128 × 128 × 3 |
2. | Conv2D Layer: 128 × 128 × 64 (Weights: 3 × 3 × 3, Bias: 1 × 1 × 64), ReLU |
3. | Conv2D Layer: 128 × 128 × 64 (Weights: 3 × 3 × 64, Bias: 1 × 1 × 64), ReLU |
4. | MaxPooling2D: 64 × 64 × 64 |
5. | Conv2D Layer: 64 × 64 × 128 (Weights: 3 × 3 × 64, Bias: 1 × 1 × 128), ReLU |
6. | Conv2D Layer: 64 × 64 × 128 (Weights: 3 × 3 × 128, Bias: 1 × 1 × 128), ReLU |
7. | MaxPooling2D: 32 × 32 × 128 |
8. | Conv2D Layer: 32 × 32 × 256 (Weights: 3 × 3 × 128, Bias: 1 × 1 × 256), ReLU |
9. | Conv2D Layer: 32 × 32 × 256 (Weights: 3 × 3 × 256, Bias: 1 × 1 × 256), ReLU |
10. | MaxPooling2D: 16 × 16 × 256 |
11. | Conv2D Layer: 16 × 16 × 512 (Weights: 3 × 3 × 256, Bias: 1 × 1 × 512), ReLU |
12. | Conv2D Layer: 16 × 16 × 512 (Weights: 3 × 3 × 512, Bias: 1 × 1 × 512), ReLU |
13. | MaxPooling2D: 8 × 8 × 512 |
14. | Conv2D Layer: 8 × 8 × 512 (Weights: 3 × 3 × 512, Bias: 1 × 1 × 512), ReLU |
15. | MaxPooling2D: 4 × 4 × 512 |
16. | Fully Connected Layer: 1 × 1 × 1000 (Weights: 1000 × 81, Bias: 1000 × 1), ReLU, Dropout (50%) |
17. | Fully Connected Layer: 1 × 1 × 100 (Weights: 100 × 81, Bias: 100 × 1), ReLU, Dropout (50%) |
18. | Fully Connected Layer: 1 × 1 × 2 (Weights: 2 × 100, Bias: 2 × 1), Mean-Squared Error Output |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mena-Almonte, R.A.; Zulueta, E.; Etxeberria-Agiriano, I.; Fernandez-Gamiz, U. Efficient Robot Localization Through Deep Learning-Based Natural Fiduciary Pattern Recognition. Mathematics 2025, 13, 467. https://doi.org/10.3390/math13030467
Mena-Almonte RA, Zulueta E, Etxeberria-Agiriano I, Fernandez-Gamiz U. Efficient Robot Localization Through Deep Learning-Based Natural Fiduciary Pattern Recognition. Mathematics. 2025; 13(3):467. https://doi.org/10.3390/math13030467
Chicago/Turabian StyleMena-Almonte, Ramón Alberto, Ekaitz Zulueta, Ismael Etxeberria-Agiriano, and Unai Fernandez-Gamiz. 2025. "Efficient Robot Localization Through Deep Learning-Based Natural Fiduciary Pattern Recognition" Mathematics 13, no. 3: 467. https://doi.org/10.3390/math13030467
APA StyleMena-Almonte, R. A., Zulueta, E., Etxeberria-Agiriano, I., & Fernandez-Gamiz, U. (2025). Efficient Robot Localization Through Deep Learning-Based Natural Fiduciary Pattern Recognition. Mathematics, 13(3), 467. https://doi.org/10.3390/math13030467