Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3666025.3699341acmconferencesArticle/Chapter ViewAbstractPublication PagessensysConference Proceedingsconference-collections
research-article
Open access

SEESys: Online Pose Error Estimation System for Visual SLAM

Published: 04 November 2024 Publication History

Abstract

In this work, we introduce SEESys, the first system to provide online pose error estimation for Simultaneous Localization and Mapping (SLAM). Unlike prior offline error estimation approaches, the SEESys framework efficiently collects real-time system features and delivers accurate pose error magnitude estimates with low latency. This enables real-time quality-of-service information for downstream applications. To achieve this goal, we develop a SLAM system run-time status monitor (RTS monitor) that performs feature collection with minimal overhead, along with a multi-modality attention-based Deep SLAM Error Estimator (DeepSEE) for error estimation. We train and evaluate SEESys using both public SLAM benchmarks and a diverse set of synthetic datasets, achieving an RMSE of 0.235 cm of pose error estimation, which is 15.8% lower than the baseline. Additionally, we conduct a case study showcasing SEESys in a real-world scenario, where it is applied to a real-time audio error advisory system for human operators of a SLAM-enabled device. The results demonstrate that SEESys provides error estimates with an average end-to-end latency of 37.3 ms, and the audio error advisory reduces pose tracking error by 25%.

References

[1]
Ali J Ali, Zakieh Sadat Hashemifar, and Karthik Dantu. 2020. Edge-SLAM: Edge-assisted visual simultaneous localization and mapping. In Proceedings of ACM MobiSys.
[2]
Islam Ali, Bingqing Wan, and Hong Zhang. 2023. Prediction of SLAM ATE using an ensemble learning regression model and 1-D global pooling of data characterization. arXiv preprint arXiv:2303.00616 (2023).
[3]
Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. 2008. Speeded-up robust features (SURF). Computer Vision and Image Understanding 110, 3 (2008), 346--359.
[4]
Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding?. In Proceedings of ICML.
[5]
Mihai Bujanca, Xuesong Shi, Matthew Spear, Pengpeng Zhao, Barry Lennox, and Mikel Luján. 2021. Robust SLAM systems: Are we there yet?. In Proceedings of IEEE/RSJ IROS.
[6]
Michael Burri, Janosch Nikolic, Pascal Gohl, Thomas Schneider, Joern Rehder, Sammy Omari, Markus W Achtelik, and Roland Siegwart. 2016. The EuRoC micro aerial vehicle datasets. The International Journal of Robotics Research 35, 10 (2016), 1157--1163.
[7]
Alvaro Parra Bustos, Tat-Jun Chin, Anders Eriksson, and Ian Reid. 2019. Visual SLAM: Why bundle adjust?. In Proceedings of IEEE ICRA.
[8]
Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, José Neira, Ian Reid, and John J Leonard. 2016. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Transactions on Robotics 32, 6 (2016), 1309--1332.
[9]
Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua. 2010. BRIEF: Binary robust independent elementary features. In Proceedings of ECCV.
[10]
Carlos Campos, Richard Elvira, Juan J Gómez Rodríguez, José MM Montiel, and Juan D Tardós. 2021. ORB-SLAM3: An accurate open-source library for visual, visual-inertial, and multimap SLAM. IEEE Transactions on Robotics 37, 6 (2021), 1874--1890.
[11]
Yongbo Chen, Shoudong Huang, Liang Zhao, and Gamini Dissanayake. 2021. Cramér-Rao bounds and optimal design metrics for pose-graph SLAM. IEEE Transactions on Robotics 37, 2 (2021), 627--641.
[12]
Ying Chen, Hazer Inaltekin, and Maria Gorlatova. 2023. AdaptSLAM: Edge-assisted adaptive SLAM with resource constraints via uncertainty minimization. In Proceedings of IEEE INFOCOM.
[13]
Santiago Cortés, Arno Solin, Esa Rahtu, and Juho Kannala. 2018. ADVIO: An authentic dataset for visual-inertial odometry. In Proceedings of ECCV.
[14]
Ádám Csapó and György Wersényi. 2013. Overview of auditory representations in human-machine interfaces. ACM Computing Surveys (CSUR) 46, 2 (2013), 1--23.
[15]
Xinke Deng, Zixu Zhang, Avishai Sintov, Jing Huang, and Timothy Bretl. 2018. Feature-constrained active visual SLAM for mobile robot navigation. In Proceedings of IEEE ICRA.
[16]
Karan Desai and Justin Johnson. 2021. Virtex: Learning visual representations from textual annotations. In Proceedings of IEEE/CVF CVPR.
[17]
Aditya Dhakal, Xukan Ran, Yunshu Wang, Jiasi Chen, and KK Ramakrishnan. 2022. SLAM-share: visual simultaneous localization and mapping for real-time multi-user augmented reality. In Proceedings of ACM CoNEXT. 293--306.
[18]
Samuel F Dodge and Lina J Karam. 2018. Quality robust mixtures of deep neural networks. IEEE Transactions on Image Processing 27, 11 (2018), 5553--5562.
[19]
Jakob Engel, Thomas Schöps, and Daniel Cremers. 2014. LSD-SLAM: Large-scale direct monocular SLAM. In Proceedings of ECCV.
[20]
Martin Eriksson and Roberto Bresin. 2010. Improving running mechanics by use of interactive sonification. In Proceedings of ISon.
[21]
Luca Ferranti, Xiaotian Li, Jani Boutellier, and Juho Kannala. 2021. Can you trust your pose? Confidence estimation in visual localization. In Proceedings of IEEE ICPR.
[22]
Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the KTTTI vision benchmark suite. In Proceedings of IEEE CVPR.
[23]
Winter Guerra, Ezra Tal, Varun Murali, Gilhyun Ryou, and Sertac Karaman. 2019. FlightGoggles: Photorealistic sensor simulation for perception-driven robotics using photogrammetry and virtual reality. In Proceedings of IEEE/RSJ IROS.
[24]
Mengxi Hanyao, Yibo Jin, Zhuzhong Qian, Sheng Zhang, and Sanglu Lu. 2021. Edge-assisted online on-device object detection for real-time video analytics. In Proceedings of IEEE InfoCom.
[25]
Thomas Hermann and Andy Hunt. 2005. An introduction to interactive sonification. IEEE Multimedia 12, 2 (2005), 20--24.
[26]
Thomas Hermann, Andy Hunt, and John G Neuhoff. 2011. The sonification handbook. Vol. 1. Logos Verlag Berlin.
[27]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[28]
Simon Holland, David R Morse, and Henrik Gedenryd. 2002. AudioGPS: Spatial audio navigation with a minimal attention interface. Personal and Ubiquitous Computing 6 (2002), 253--259.
[29]
Tianyi Hu, Fan Yang, Tim Scargill, and Maria Gorlatova. 2024. Apple vs. Meta: A Comparative Study on Spatial Tracking in SOTA XR Headsets. In Proceedings of ACM ImmerCom (co-located with ACM MobiCom).
[30]
Li Jinyu, Yang Bangbang, Chen Danpeng, Wang Nan, Zhang Guofeng, and Bao Hujun. 2019. Survey and evaluation of monocular visual-inertial SLAM algorithms for augmented reality. Virtual Reality & Intelligent Hardware 1, 4 (2019), 386--410.
[31]
Caihong Kai, Hao Zhou, Yibo Yi, and Wei Huang. 2020. Collaborative cloud-edge-end task offloading in mobile-edge computing networks with limited communication capability. IEEE Transactions on Cognitive Communications and Networking 7, 2 (2020), 624--634.
[32]
Mike Kasper, Steve McGuire, and Christoffer Heckman. 2019. A Benchmark for Visual-inertial Odometry Systems Employing Onboard Illumination. In Proceedings of IEEE/RSJ IROS.
[33]
Georg Klein and David Murray. 2007. Parallel tracking and mapping for small AR workspaces. In Proceedings of IEEE/ACM ISMAR.
[34]
Nicola Krombach, David Droeschel, and Sven Behnke. 2017. Combining feature-based and direct methods for semi-dense real-time stereo visual odometry. In Proceedings of IAS.
[35]
Stefan Leutenegger, Margarita Chli, and Roland Y Siegwart. 2011. BRISK: Binary robust invariant scalable keypoints. In Proceedings of IEEE/CVF ICCV.
[36]
Luyang Liu, Hongyu Li, and Marco Gruteser. 2019. Edge assisted real-time object detection for mobile augmented reality. In Proceedings of ACM MobiCom.
[37]
Peidong Liu, Xingxing Zuo, Viktor Larsson, and Marc Pollefeys. 2021. MBA-VO: Motion blur aware visual odometry. In Proceedings of IEEE/CVF ICCV.
[38]
Wen Lik Dennis Lui and Ray Jarvis. 2010. A pure vision-based approach to topological SLAM. In Proceedings of IEEE/RSJ IROS.
[39]
Andréa Macario Barros, Maugan Michel, Yoann Moline, Gwenolé Corre, and Frédérick Carrel. 2022. A comprehensive survey of visual SLAM algorithms. Robotics 11, 1 (2022), 24.
[40]
Sasan Matinfar, Mehrdad Salehi, Daniel Suter, Matthias Seibold, Shervin Dehghani, Navid Navab, Florian Wanivenhaus, Philipp Fürnstahl, Mazda Farshad, and Nassir Navab. 2023. Sonification as a reliable alternative to conventional visual surgical navigation. Scientific Reports 13, 1 (2023), 5930.
[41]
David McGookin, Stephen Brewster, and Pablo Priego. 2009. Audio bubbles: Employing non-speech audio to support tourist wayfinding. In Proceedings of RAID.
[42]
Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. 2015. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Transactions on Robotics 31, 5 (2015), 1147--1163.
[43]
Janne Mustaniemi, Juho Kannala, Simo Särkkä, Jiri Matas, and Janne Heikkilä. 2018. Fast motion deblurring for feature detection and matching using inertial measurements. In Proceedinngs of IEEE ICPR.
[44]
Luigi Nardi, Bruno Bodin, M Zeeshan Zia, John Mawer, Andy Nisbet, Paul HJ Kelly, Andrew J Davison, Mikel Lujan, Michael FP O'Boyle, Graham Riley, et al. 2015. Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM. In Proceedings of IEEE ICRA.
[45]
Linus Nwankwo and Elmar Rueckert. 2023. Understanding Why SLAM Algorithms Fail in Modern Indoor Environments. In Proceedings of the International Conference on Robotics in Alpe-Adria Danube Region.
[46]
OptiTrack. 2024. OptiTrack. https://optitrack.com/.
[47]
Tong Qin, Peiliang Li, and Shaojie Shen. 2018. VINS-Mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics 34, 4 (2018), 1004--1020.
[48]
Qualtrics. 2024. Qualtrics. https://www.qualtrics.com.
[49]
Giampaolo Rodola. 2009. psutil. https://github.com/giampaolo/psutil.
[50]
Antoni Rosinol, John J Leonard, and Luca Carlone. 2023. Probabilistic volumetric fusion for dense monocular slam. In Proceedings of the IEEE/CVF WACV.
[51]
Edward Rosten and Tom Drummond. 2006. Machine learning for high-speed corner detection. In Proceedings of ECCV.
[52]
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In Proceedings of IEEE/CVF ICCV.
[53]
Ernest Rutherford and Hans Geiger. 1908. An electrical method of counting the number of α-particles from radio-active substances. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character 81, 546 (1908), 141--161.
[54]
Tim Scargill, Ying Chen, Tianyi Hu, and Maria Gorlatova. 2023. SiTAR: Situated trajectory analysis for in-the-wild pose error estimation. In Proceedings of IEEE ISMAR.
[55]
Tim Scargill, Ying Chen, Nathan Marzen, and Maria Gorlatova. 2022. Integrated design of augmented reality spaces using virtual environments. In Proceedings of IEEE ISMAR.
[56]
Tim Scargill, Majda Hadziahmetovic, and Maria Gorlatova. 2023. Invisible textures: Comparing machine and human perception of environment texture for AR. In Proceedings of ACM ImmerCom (co-located with ACM MobiCom).
[57]
Johannes L Schonberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In Proceedings of IEEE/CVF CVPR.
[58]
David Schubert, Thore Goll, Nikolaus Demmel, Vladyslav Usenko, Jörg Stückler, and Daniel Cremers. 2018. The TUM VI benchmark for evaluating visual-inertial odometry. In Proceedings of IEEE/RSJ IROS.
[59]
Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. 2018. AirSim: High-fidelity visual and physical simulation for autonomous vehicles. In Field and Service Robotics. Springer, 621--635.
[60]
Sina Shahhosseini, Tianyi Hu, Dongjoo Seo, Anil Kanduri, Bryan Donyanavard, Amir M Rahmani, and Nikil Dutt. 2022. Hybrid learning for orchestrating deep learning inference in multi-user edge-cloud networks. In Proceedings of IEEE ISQED. IEEE, 1--6.
[61]
C. E. Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 3 (1948), 379--423.
[62]
Xuesong Shi, Dongjiang Li, Pengpeng Zhao, Qinbin Tian, Yuxin Tian, Qiwei Long, Chunhao Zhu, Jingwei Song, Fei Qiao, Le Song, et al. 2020. Are we ready for service robots? the OpenLORIS-Scene datasets for lifelong SLAM. In Proceedings of IEEE ICRA.
[63]
Chester C Slama, Charles Theurer, and Soren W Henriksen. 1980. Manual of photogrammetry. Number Ed. 4.
[64]
Python sounddevice. 2024. Python sounddevice. https://python-sounddevice.readthedocs.io/.
[65]
J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. 2012. A Benchmark for the Evaluation of RGB-D SLAM Systems. In Proceedings of IEEE/RSJ IROS.
[66]
Takafumi Taketomi, Hideaki Uchiyama, and Sei Ikeda. 2017. Visual SLAM algorithms: A survey from 2010 to 2016. IPSJ Transactions on Computer Vision and Applications 9 (2017), 1--11.
[67]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).
[68]
Vicon. 2024. Vicon. https://www.vicon.com/.
[69]
Rui Wang, Martin Schworer, and Daniel Cremers. 2017. Stereo DSO: Large-scale direct sparse visual odometry with stereo cameras. In IEEE/CVF ICCV.
[70]
Wenshan Wang, Delong Zhu, Xiangwei Wang, Yaoyu Hu, Yuheng Qiu, Chen Wang, Yafei Hu, Ashish Kapoor, and Sebastian Scherer. 2020. TartanAir: A dataset to push the limits of visual SLAM. In Proceedings of IEEE/RSJ IROS.
[71]
Xi Wei, Tianzhu Zhang, Yan Li, Yongdong Zhang, and Feng Wu. 2020. Multi-modality cross attention network for image and sentence matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10941--10950.
[72]
Graham Wilson and Stephen A Brewster. 2015. Using dynamic audio feedback to support peripersonal reaching in visually impaired people. In Proceedings of ACM SIGACCESS.
[73]
Jingao Xu, Hao Cao, Danyang Li, Kehong Huang, Chen Qian, Longfei Shangguan, and Zheng Yang. 2020. Edge assisted mobile semantic visual SLAM. In Proceedings of IEEE InfoCom.
[74]
Jingao Xu, Hao Cao, Zheng Yang, Longfei Shangguan, Jialin Zhang, Xiaowu He, and Yunhao Liu. 2022. SwarmMap: Scaling up real-time collaborative visual SLAM at the edge. In Proceedings of USENIX NSDI.
[75]
Nan Yang, Rui Wang, and Daniel Cremers. 2017. Feature-based or direct: An evaluation of monocular visual odometry. arXiv preprint arXiv:1705.04300 (2017), 1--12.
[76]
Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. 2022. TS2Vec: Towards universal representation of time series. In Proceedings of AAAI.
[77]
Jinrui Zhang, Huan Yang, Ju Ren, Deyu Zhang, Bangwen He, Ting Cao, Yuanchun Li, Yaoxue Zhang, and Yunxin Liu. 2022. MobiDepth: real-time depth estimation using on-device dual cameras. In Proceedings of ACM MobiCom.
[78]
Xinran Zhang, Hanqi Zhu, Yifan Duan, Wuyang Zhang, Longfei Shangguan, Yu Zhang, Jianmin Ji, and Yanyong Zhang. 2024. Map++: Towards user-participatory visual SLAM systems with efficient map expansion and sharing. In Proceedings of ACM MobiCom.
[79]
Yunfan Zhang, Tim Scargill, Ashutosh Vaishnav, Gopika Premsankar, Mario Di Francesco, and Maria Gorlatova. 2022. Indepth: Real-time depth inpainting for mobile augmented reality. In Proceedings of ACM IMWUT.
[80]
Zichao Zhang and Davide Scaramuzza. 2018. A tutorial on quantitative trajectory evaluation for visual (-inertial) odometry. In Proceedings of IEEE/RSJ IROS.
[81]
David Zuñiga-Noël, Alberto Jaenal, Ruben Gomez-Ojeda, and Javier Gonzalez-Jimenez. 2020. The UMA-VI dataset: Visual-inertial odometry in low-textured and dynamic illumination environments. The International Journal of Robotics Research 39, 9 (2020), 1052--1060.

Index Terms

  1. SEESys: Online Pose Error Estimation System for Visual SLAM

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SenSys '24: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems
    November 2024
    950 pages
    ISBN:9798400706974
    DOI:10.1145/3666025
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 November 2024

    Check for updates

    Author Tags

    1. SLAM
    2. pose tracking
    3. tracking error
    4. error estimate
    5. edge computing
    6. deep learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    Acceptance Rates

    Overall Acceptance Rate 198 of 990 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 437
      Total Downloads
    • Downloads (Last 12 months)437
    • Downloads (Last 6 weeks)123
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media