research-article

Open access

SEESys: Online Pose Error Estimation System for Visual SLAM

Authors:

Maria GorlatovaAuthors Info & Claims

SENSYS '24: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems

Pages 322 - 335

https://doi.org/10.1145/3666025.3699341

Published: 04 November 2024 Publication History

Abstract

In this work, we introduce SEESys, the first system to provide online pose error estimation for Simultaneous Localization and Mapping (SLAM). Unlike prior offline error estimation approaches, the SEESys framework efficiently collects real-time system features and delivers accurate pose error magnitude estimates with low latency. This enables real-time quality-of-service information for downstream applications. To achieve this goal, we develop a SLAM system run-time status monitor (RTS monitor) that performs feature collection with minimal overhead, along with a multi-modality attention-based Deep SLAM Error Estimator (DeepSEE) for error estimation. We train and evaluate SEESys using both public SLAM benchmarks and a diverse set of synthetic datasets, achieving an RMSE of 0.235 cm of pose error estimation, which is 15.8% lower than the baseline. Additionally, we conduct a case study showcasing SEESys in a real-world scenario, where it is applied to a real-time audio error advisory system for human operators of a SLAM-enabled device. The results demonstrate that SEESys provides error estimates with an average end-to-end latency of 37.3 ms, and the audio error advisory reduces pose tracking error by 25%.

References

[1]

Ali J Ali, Zakieh Sadat Hashemifar, and Karthik Dantu. 2020. Edge-SLAM: Edge-assisted visual simultaneous localization and mapping. In Proceedings of ACM MobiSys.

[2]

Islam Ali, Bingqing Wan, and Hong Zhang. 2023. Prediction of SLAM ATE using an ensemble learning regression model and 1-D global pooling of data characterization. arXiv preprint arXiv:2303.00616 (2023).

[3]

Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. 2008. Speeded-up robust features (SURF). Computer Vision and Image Understanding 110, 3 (2008), 346--359.

Digital Library

[4]

Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding?. In Proceedings of ICML.

[5]

Mihai Bujanca, Xuesong Shi, Matthew Spear, Pengpeng Zhao, Barry Lennox, and Mikel Luján. 2021. Robust SLAM systems: Are we there yet?. In Proceedings of IEEE/RSJ IROS.

Digital Library

[6]

Michael Burri, Janosch Nikolic, Pascal Gohl, Thomas Schneider, Joern Rehder, Sammy Omari, Markus W Achtelik, and Roland Siegwart. 2016. The EuRoC micro aerial vehicle datasets. The International Journal of Robotics Research 35, 10 (2016), 1157--1163.

Digital Library

[7]

Alvaro Parra Bustos, Tat-Jun Chin, Anders Eriksson, and Ian Reid. 2019. Visual SLAM: Why bundle adjust?. In Proceedings of IEEE ICRA.

Digital Library

[8]

Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, José Neira, Ian Reid, and John J Leonard. 2016. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Transactions on Robotics 32, 6 (2016), 1309--1332.

Digital Library

[9]

Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua. 2010. BRIEF: Binary robust independent elementary features. In Proceedings of ECCV.

[10]

Carlos Campos, Richard Elvira, Juan J Gómez Rodríguez, José MM Montiel, and Juan D Tardós. 2021. ORB-SLAM3: An accurate open-source library for visual, visual-inertial, and multimap SLAM. IEEE Transactions on Robotics 37, 6 (2021), 1874--1890.

[11]

Yongbo Chen, Shoudong Huang, Liang Zhao, and Gamini Dissanayake. 2021. Cramér-Rao bounds and optimal design metrics for pose-graph SLAM. IEEE Transactions on Robotics 37, 2 (2021), 627--641.

[12]

Ying Chen, Hazer Inaltekin, and Maria Gorlatova. 2023. AdaptSLAM: Edge-assisted adaptive SLAM with resource constraints via uncertainty minimization. In Proceedings of IEEE INFOCOM.

[13]

Santiago Cortés, Arno Solin, Esa Rahtu, and Juho Kannala. 2018. ADVIO: An authentic dataset for visual-inertial odometry. In Proceedings of ECCV.

Digital Library

[14]

Ádám Csapó and György Wersényi. 2013. Overview of auditory representations in human-machine interfaces. ACM Computing Surveys (CSUR) 46, 2 (2013), 1--23.

Digital Library

[15]

Xinke Deng, Zixu Zhang, Avishai Sintov, Jing Huang, and Timothy Bretl. 2018. Feature-constrained active visual SLAM for mobile robot navigation. In Proceedings of IEEE ICRA.

Digital Library

[16]

Karan Desai and Justin Johnson. 2021. Virtex: Learning visual representations from textual annotations. In Proceedings of IEEE/CVF CVPR.

[17]

Aditya Dhakal, Xukan Ran, Yunshu Wang, Jiasi Chen, and KK Ramakrishnan. 2022. SLAM-share: visual simultaneous localization and mapping for real-time multi-user augmented reality. In Proceedings of ACM CoNEXT. 293--306.

Digital Library

[18]

Samuel F Dodge and Lina J Karam. 2018. Quality robust mixtures of deep neural networks. IEEE Transactions on Image Processing 27, 11 (2018), 5553--5562.

Digital Library

[19]

Jakob Engel, Thomas Schöps, and Daniel Cremers. 2014. LSD-SLAM: Large-scale direct monocular SLAM. In Proceedings of ECCV.

[20]

Martin Eriksson and Roberto Bresin. 2010. Improving running mechanics by use of interactive sonification. In Proceedings of ISon.

[21]

Luca Ferranti, Xiaotian Li, Jani Boutellier, and Juho Kannala. 2021. Can you trust your pose? Confidence estimation in visual localization. In Proceedings of IEEE ICPR.

[22]

Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the KTTTI vision benchmark suite. In Proceedings of IEEE CVPR.

[23]

Winter Guerra, Ezra Tal, Varun Murali, Gilhyun Ryou, and Sertac Karaman. 2019. FlightGoggles: Photorealistic sensor simulation for perception-driven robotics using photogrammetry and virtual reality. In Proceedings of IEEE/RSJ IROS.

Digital Library

[24]

Mengxi Hanyao, Yibo Jin, Zhuzhong Qian, Sheng Zhang, and Sanglu Lu. 2021. Edge-assisted online on-device object detection for real-time video analytics. In Proceedings of IEEE InfoCom.

Digital Library

[25]

Thomas Hermann and Andy Hunt. 2005. An introduction to interactive sonification. IEEE Multimedia 12, 2 (2005), 20--24.

Digital Library

[26]

Thomas Hermann, Andy Hunt, and John G Neuhoff. 2011. The sonification handbook. Vol. 1. Logos Verlag Berlin.

[27]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[28]

Simon Holland, David R Morse, and Henrik Gedenryd. 2002. AudioGPS: Spatial audio navigation with a minimal attention interface. Personal and Ubiquitous Computing 6 (2002), 253--259.

Digital Library

[29]

Tianyi Hu, Fan Yang, Tim Scargill, and Maria Gorlatova. 2024. Apple vs. Meta: A Comparative Study on Spatial Tracking in SOTA XR Headsets. In Proceedings of ACM ImmerCom (co-located with ACM MobiCom).

[30]

Li Jinyu, Yang Bangbang, Chen Danpeng, Wang Nan, Zhang Guofeng, and Bao Hujun. 2019. Survey and evaluation of monocular visual-inertial SLAM algorithms for augmented reality. Virtual Reality & Intelligent Hardware 1, 4 (2019), 386--410.

[31]

Caihong Kai, Hao Zhou, Yibo Yi, and Wei Huang. 2020. Collaborative cloud-edge-end task offloading in mobile-edge computing networks with limited communication capability. IEEE Transactions on Cognitive Communications and Networking 7, 2 (2020), 624--634.

[32]

Mike Kasper, Steve McGuire, and Christoffer Heckman. 2019. A Benchmark for Visual-inertial Odometry Systems Employing Onboard Illumination. In Proceedings of IEEE/RSJ IROS.

Digital Library

[33]

Georg Klein and David Murray. 2007. Parallel tracking and mapping for small AR workspaces. In Proceedings of IEEE/ACM ISMAR.

Digital Library

[34]

Nicola Krombach, David Droeschel, and Sven Behnke. 2017. Combining feature-based and direct methods for semi-dense real-time stereo visual odometry. In Proceedings of IAS.

[35]

Stefan Leutenegger, Margarita Chli, and Roland Y Siegwart. 2011. BRISK: Binary robust invariant scalable keypoints. In Proceedings of IEEE/CVF ICCV.

Digital Library

[36]

Luyang Liu, Hongyu Li, and Marco Gruteser. 2019. Edge assisted real-time object detection for mobile augmented reality. In Proceedings of ACM MobiCom.

Digital Library

[37]

Peidong Liu, Xingxing Zuo, Viktor Larsson, and Marc Pollefeys. 2021. MBA-VO: Motion blur aware visual odometry. In Proceedings of IEEE/CVF ICCV.

[38]

Wen Lik Dennis Lui and Ray Jarvis. 2010. A pure vision-based approach to topological SLAM. In Proceedings of IEEE/RSJ IROS.

[39]

Andréa Macario Barros, Maugan Michel, Yoann Moline, Gwenolé Corre, and Frédérick Carrel. 2022. A comprehensive survey of visual SLAM algorithms. Robotics 11, 1 (2022), 24.

[40]

Sasan Matinfar, Mehrdad Salehi, Daniel Suter, Matthias Seibold, Shervin Dehghani, Navid Navab, Florian Wanivenhaus, Philipp Fürnstahl, Mazda Farshad, and Nassir Navab. 2023. Sonification as a reliable alternative to conventional visual surgical navigation. Scientific Reports 13, 1 (2023), 5930.

[41]

David McGookin, Stephen Brewster, and Pablo Priego. 2009. Audio bubbles: Employing non-speech audio to support tourist wayfinding. In Proceedings of RAID.

Digital Library

[42]

Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. 2015. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Transactions on Robotics 31, 5 (2015), 1147--1163.

Digital Library

[43]

Janne Mustaniemi, Juho Kannala, Simo Särkkä, Jiri Matas, and Janne Heikkilä. 2018. Fast motion deblurring for feature detection and matching using inertial measurements. In Proceedinngs of IEEE ICPR.

[44]

Luigi Nardi, Bruno Bodin, M Zeeshan Zia, John Mawer, Andy Nisbet, Paul HJ Kelly, Andrew J Davison, Mikel Lujan, Michael FP O'Boyle, Graham Riley, et al. 2015. Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM. In Proceedings of IEEE ICRA.

[45]

Linus Nwankwo and Elmar Rueckert. 2023. Understanding Why SLAM Algorithms Fail in Modern Indoor Environments. In Proceedings of the International Conference on Robotics in Alpe-Adria Danube Region.

[46]

OptiTrack. 2024. OptiTrack. https://optitrack.com/.

[47]

Tong Qin, Peiliang Li, and Shaojie Shen. 2018. VINS-Mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics 34, 4 (2018), 1004--1020.

Digital Library

[48]

Qualtrics. 2024. Qualtrics. https://www.qualtrics.com.

[49]

Giampaolo Rodola. 2009. psutil. https://github.com/giampaolo/psutil.

[50]

Antoni Rosinol, John J Leonard, and Luca Carlone. 2023. Probabilistic volumetric fusion for dense monocular slam. In Proceedings of the IEEE/CVF WACV.

[51]

Edward Rosten and Tom Drummond. 2006. Machine learning for high-speed corner detection. In Proceedings of ECCV.

Digital Library

[52]

Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In Proceedings of IEEE/CVF ICCV.

Digital Library

[53]

Ernest Rutherford and Hans Geiger. 1908. An electrical method of counting the number of α-particles from radio-active substances. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character 81, 546 (1908), 141--161.

[54]

Tim Scargill, Ying Chen, Tianyi Hu, and Maria Gorlatova. 2023. SiTAR: Situated trajectory analysis for in-the-wild pose error estimation. In Proceedings of IEEE ISMAR.

[55]

Tim Scargill, Ying Chen, Nathan Marzen, and Maria Gorlatova. 2022. Integrated design of augmented reality spaces using virtual environments. In Proceedings of IEEE ISMAR.

[56]

Tim Scargill, Majda Hadziahmetovic, and Maria Gorlatova. 2023. Invisible textures: Comparing machine and human perception of environment texture for AR. In Proceedings of ACM ImmerCom (co-located with ACM MobiCom).

Digital Library

[57]

Johannes L Schonberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In Proceedings of IEEE/CVF CVPR.

[58]

David Schubert, Thore Goll, Nikolaus Demmel, Vladyslav Usenko, Jörg Stückler, and Daniel Cremers. 2018. The TUM VI benchmark for evaluating visual-inertial odometry. In Proceedings of IEEE/RSJ IROS.

Digital Library

[59]

Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. 2018. AirSim: High-fidelity visual and physical simulation for autonomous vehicles. In Field and Service Robotics. Springer, 621--635.

[60]

Sina Shahhosseini, Tianyi Hu, Dongjoo Seo, Anil Kanduri, Bryan Donyanavard, Amir M Rahmani, and Nikil Dutt. 2022. Hybrid learning for orchestrating deep learning inference in multi-user edge-cloud networks. In Proceedings of IEEE ISQED. IEEE, 1--6.

[61]

C. E. Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 3 (1948), 379--423.

[62]

Xuesong Shi, Dongjiang Li, Pengpeng Zhao, Qinbin Tian, Yuxin Tian, Qiwei Long, Chunhao Zhu, Jingwei Song, Fei Qiao, Le Song, et al. 2020. Are we ready for service robots? the OpenLORIS-Scene datasets for lifelong SLAM. In Proceedings of IEEE ICRA.

[63]

Chester C Slama, Charles Theurer, and Soren W Henriksen. 1980. Manual of photogrammetry. Number Ed. 4.

[64]

Python sounddevice. 2024. Python sounddevice. https://python-sounddevice.readthedocs.io/.

[65]

J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. 2012. A Benchmark for the Evaluation of RGB-D SLAM Systems. In Proceedings of IEEE/RSJ IROS.

[66]

Takafumi Taketomi, Hideaki Uchiyama, and Sei Ikeda. 2017. Visual SLAM algorithms: A survey from 2010 to 2016. IPSJ Transactions on Computer Vision and Applications 9 (2017), 1--11.

[67]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).

[68]

Vicon. 2024. Vicon. https://www.vicon.com/.

[69]

Rui Wang, Martin Schworer, and Daniel Cremers. 2017. Stereo DSO: Large-scale direct sparse visual odometry with stereo cameras. In IEEE/CVF ICCV.

[70]

Wenshan Wang, Delong Zhu, Xiangwei Wang, Yaoyu Hu, Yuheng Qiu, Chen Wang, Yafei Hu, Ashish Kapoor, and Sebastian Scherer. 2020. TartanAir: A dataset to push the limits of visual SLAM. In Proceedings of IEEE/RSJ IROS.

Digital Library

[71]

Xi Wei, Tianzhu Zhang, Yan Li, Yongdong Zhang, and Feng Wu. 2020. Multi-modality cross attention network for image and sentence matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10941--10950.

[72]

Graham Wilson and Stephen A Brewster. 2015. Using dynamic audio feedback to support peripersonal reaching in visually impaired people. In Proceedings of ACM SIGACCESS.

Digital Library

[73]

Jingao Xu, Hao Cao, Danyang Li, Kehong Huang, Chen Qian, Longfei Shangguan, and Zheng Yang. 2020. Edge assisted mobile semantic visual SLAM. In Proceedings of IEEE InfoCom.

Digital Library

[74]

Jingao Xu, Hao Cao, Zheng Yang, Longfei Shangguan, Jialin Zhang, Xiaowu He, and Yunhao Liu. 2022. SwarmMap: Scaling up real-time collaborative visual SLAM at the edge. In Proceedings of USENIX NSDI.

[75]

Nan Yang, Rui Wang, and Daniel Cremers. 2017. Feature-based or direct: An evaluation of monocular visual odometry. arXiv preprint arXiv:1705.04300 (2017), 1--12.

[76]

Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. 2022. TS2Vec: Towards universal representation of time series. In Proceedings of AAAI.

[77]

Jinrui Zhang, Huan Yang, Ju Ren, Deyu Zhang, Bangwen He, Ting Cao, Yuanchun Li, Yaoxue Zhang, and Yunxin Liu. 2022. MobiDepth: real-time depth estimation using on-device dual cameras. In Proceedings of ACM MobiCom.

Digital Library

[78]

Xinran Zhang, Hanqi Zhu, Yifan Duan, Wuyang Zhang, Longfei Shangguan, Yu Zhang, Jianmin Ji, and Yanyong Zhang. 2024. Map++: Towards user-participatory visual SLAM systems with efficient map expansion and sharing. In Proceedings of ACM MobiCom.

Digital Library

[79]

Yunfan Zhang, Tim Scargill, Ashutosh Vaishnav, Gopika Premsankar, Mario Di Francesco, and Maria Gorlatova. 2022. Indepth: Real-time depth inpainting for mobile augmented reality. In Proceedings of ACM IMWUT.

Digital Library

[80]

Zichao Zhang and Davide Scaramuzza. 2018. A tutorial on quantitative trajectory evaluation for visual (-inertial) odometry. In Proceedings of IEEE/RSJ IROS.

Digital Library

[81]

David Zuñiga-Noël, Alberto Jaenal, Ruben Gomez-Ojeda, and Javier Gonzalez-Jimenez. 2020. The UMA-VI dataset: Visual-inertial odometry in low-textured and dynamic illumination environments. The International Journal of Robotics Research 39, 9 (2020), 1052--1060.

Digital Library

Index Terms

SEESys: Online Pose Error Estimation System for Visual SLAM
1. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing systems and tools

Recommendations

DNN-based SLAM Tracking Error Online Estimation
ACM MobiCom '23: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking

Simultaneous localization and mapping (SLAM) takes in sensor data, e.g., camera frames, and estimates the user's trajectory while creating a map of the surrounding environment. However, existing SLAM evaluation methods are not reference-free, ...
Real-Time Tracking Error Estimation for Augmented Reality for Registration with Linecode Markers

Augmented reality tasks require a high-reliability tracking method. Large tracking error causes many problems during AR applications. Tracking error estimation should be integrated with them to improve the reliability of tracking methods. Although some ...
Unsupervised Multi-view Multi-person 3D Pose Estimation Using Reprojection Error
Artificial Neural Networks and Machine Learning – ICANN 2022
Abstract
This work addresses multi-view multi-person 3D pose estimation in synchronized and calibrated camera views. Recent approaches estimate neural network weights in a supervised way; they rely on ground truth annotated datasets to compute the loss ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SenSys '24: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems

November 2024

950 pages

ISBN:9798400706974

DOI:10.1145/3666025

Chair:
Jie Liu,
Co-chairs:
Yuanchao Shu,
Jiming Chen,
Program Chair:
Yuan He,
Program Co-chair:
Rui Tan

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SenSys '24

Sponsor:

SenSys '24: 22nd ACM Conference on Embedded Networked Sensor Systems

November 4 - 7, 2024

Hangzhou, China

Acceptance Rates

Overall Acceptance Rate 198 of 990 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
437
Total Downloads

Downloads (Last 12 months)437
Downloads (Last 6 weeks)123

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten