research-article

Fusing Semantic Segmentation and Object Detection for Visual SLAM in Dynamic Scenes

Authors:

Huyin ZhangAuthors Info & Claims

VRST '21: Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology

Article No.: 2, Pages 1 - 7

https://doi.org/10.1145/3489849.3489882

Published: 08 December 2021 Publication History

Abstract

The assumption of static scenes limits the performance of traditional visual SLAM. Many existing solutions adopt deep learning methods or geometric constraints to solve the problem of dynamic scenes, but these schemes are either low efficiency or lack of robustness to a certain extent. In this paper, we propose a solution combining object detection and semantic segmentation to obtain the prior contours of potential dynamic objects. With this prior information, geometric constraints techniques are utilized to assist with removing dynamic feature points. Finally, the evaluation with the public datasets demonstrates that our proposed method can improve the accuracy of pose estimation and robustness of visual SLAM with no efficiency loss in high dynamic scenarios.

References

[1]

Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (Dec. 2017), 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615

[2]

Berta Bescos, José M. Fácil, Javier Civera, and José Neira. 2018. DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes. IEEE Robotics and Automation Letters 3, 4 (Oct. 2018), 4076–4083. https://doi.org/10.1109/LRA.2018.2860039 arxiv:1806.05620

[3]

Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934 [cs, eess] (April 2020). arxiv:2004.10934 [cs, eess]

[4]

Nikolas Brasch, Aljaz Bozic, Joe Lallemand, and Federico Tombari. 2018. Semantic Monocular SLAM for Highly Dynamic Environments. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, Madrid, 393–400. https://doi.org/10.1109/IROS.2018.8593828

Digital Library

[5]

Jakob Engel, Thomas Schöps, and Daniel Cremers. 2014. LSD-SLAM: Large-Scale Direct Monocular SLAM. In Computer Vision – ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Vol. 8690. Springer International Publishing, Cham, 834–849. https://doi.org/10.1007/978-3-319-10605-2_54

[6]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision 88, 2 (June 2010), 303–338. https://doi.org/10.1007/s11263-009-0275-4

Digital Library

[7]

Gumin Jin, Xingjun Zhong, Shaoqing Fang, Xiangyu Deng, and Jianxun Li. 2019. Keyframe-Based Dynamic Elimination SLAM System Using YOLO Detection. In Intelligent Robotics and Applications, Haibin Yu, Jinguo Liu, Lianqing Liu, Zhaojie Ju, Yuwang Liu, and Dalin Zhou (Eds.). Vol. 11743. Springer International Publishing, Cham, 697–705. https://doi.org/10.1007/978-3-030-27538-9_60

Digital Library

[8]

Masaya Kaneko, Kazuya Iwami, Torn Ogawa, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2018. Mask-SLAM: Robust Feature-Based Monocular SLAM by Masking Using Semantic Segmentation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, Salt Lake City, UT, USA, 371–3718. https://doi.org/10.1109/CVPRW.2018.00063

[9]

Georg Klein and David Murray. 2007. Parallel Tracking and Mapping for Small AR Workspaces. In 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality. IEEE, Nara, Japan, 1–10. https://doi.org/10.1109/ISMAR.2007.4538852

Digital Library

[10]

Abhijit Kundu, K Madhava Krishna, and Jayanthi Sivaswamy. 2009. Moving Object Detection by Multi-View Geometric Techniques from a Single Camera Mounted Robot. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, St. Louis, MO, USA, 4306–4312. https://doi.org/10.1109/IROS.2009.5354227

[11]

Davide Migliore, Roberto Rigamonti, Daniele Marzorati, Matteo Matteucci, and Domenico G Sorrenti. 2009. Use a Single Camera for Simultaneous Localization And Mapping with Mobile Object Tracking in Dynamic Environments. In Proceedings of the ICRA Workshop on Safe Navigation in Open and Dynamic Environments: Application to Autonomous Vehicles. 12–17.

[12]

Raul Mur-Artal, J. M. M. Montiel, and Juan D. Tardos. 2015. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics 31, 5 (Oct. 2015), 1147–1163. https://doi.org/10.1109/TRO.2015.2463671

Digital Library

[13]

Raul Mur-Artal and Juan D. Tardos. 2017. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras. IEEE Transactions on Robotics 33, 5 (Oct. 2017), 1255–1262. https://doi.org/10.1109/TRO.2017.2705103 arxiv:1610.06475

Digital Library

[14]

Richard A. Newcombe, Steven J. Lovegrove, and Andrew J. Davison. 2011. DTAM: Dense Tracking and Mapping in Real-Time. In 2011 International Conference on Computer Vision. IEEE, Barcelona, Spain, 2320–2327. https://doi.org/10.1109/ICCV.2011.6126513

Digital Library

[15]

Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. arXiv:1804.02767 [cs] (April 2018). arxiv:1804.02767 [cs]

[16]

Muhamad Risqi U. Saputra, Andrew Markham, and Niki Trigoni. 2018. Visual SLAM and Structure from Motion in Dynamic Environments: A Survey. Comput. Surveys 51, 2 (June 2018), 1–36. https://doi.org/10.1145/3177853

Digital Library

[17]

Jan Stühmer, Stefan Gumhold, and Daniel Cremers. 2010. Real-Time Dense Geometry from a Handheld Camera. In Pattern Recognition. Vol. 6376. Springer Berlin Heidelberg, Berlin, Heidelberg, 11–20. https://doi.org/10.1007/978-3-642-15986-2_2

[18]

Jrgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. 2012. A Benchmark for the Evaluation of RGB-D SLAM Systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, Vilamoura-Algarve, Portugal, 573–580. https://doi.org/10.1109/IROS.2012.6385773

[19]

Zemin Wang, Qian Zhang, Jiansheng Li, Shuming Zhang, and Jingbin Liu. 2019. A Computationally Efficient Semantic SLAM Solution for Dynamic Scenes. Remote Sensing 11, 11 (June 2019), 1363. https://doi.org/10.3390/rs11111363

[20]

Wei Tan, Haomin Liu, Zilong Dong, Guofeng Zhang, and Hujun Bao. 2013. Robust Monocular SLAM in Dynamic Environments. In 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, Adelaide, Australia, 209–218. https://doi.org/10.1109/ISMAR.2013.6671781

[21]

Linhui Xiao, Jinge Wang, Xiaosong Qiu, Zheng Rong, and Xudong Zou. 2019. Dynamic-SLAM: Semantic Monocular Visual Localization and Mapping Based on Deep Learning in Dynamic Environment. Robotics and Autonomous Systems 117 (July 2019), 1–16. https://doi.org/10.1016/j.robot.2019.03.012

Digital Library

[22]

Shiqiang Yang, Guohao Fan, Lele Bai, Rui Li, and Dexin Li. 2020. MGC-VSLAM: A Meshing-Based and Geometric Constraint VSLAM for Dynamic Indoor Environments. IEEE Access 8(2020), 81007–81021. https://doi.org/10.1109/ACCESS.2020.2990890

[23]

Chao Yu, Zuxin Liu, Xin-Jun Liu, Fugui Xie, Yi Yang, Qi Wei, and Qiao Fei. 2018. DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, Madrid, 1168–1174. https://doi.org/10.1109/IROS.2018.8593691

Digital Library

Cited By

Xu WFan WLi JAlfarraj OTolba AHuang T(2023)A Robust Visual SLAM Method for Additive Manufacturing of Vehicular Parts Under Dynamic ScenesIEEE Access10.1109/ACCESS.2023.325173311(22114-22123)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3251733

Recommendations

Optimized segmentation with image inpainting for semantic mapping in dynamic scenes
Abstract
Moving objects will obscure static objects in a dynamic scene. When the existing semantic segmentation methods deal with these static objects, there are often missing or errors in segmentation results. To solve this problem, we propose a framework ...
Dynamic Scenes Visual SLAM Based on Improved Semantic Segmentation Method
EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering

Visual SLAM systems represented by feature point extraction have the assumption that the scene is static, aiming at the problem that dynamic objects seriously affect the accuracy of visual SLAM, this paper proposes a new semantic segmentation method ...
Traffic Scene Perception Based on Joint Object Detection and Semantic Segmentation
Abstract
Traffic scene visual perception technology is very important for intelligent transportation. Although the emerging panoptic segmentation is the most desirable sensing technology, object detection and semantic segmentation are relatively more ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

VRST '21: Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology

December 2021

563 pages

ISBN:9781450390927

DOI:10.1145/3489849

Editors:
Yuichi Itoh
Aoyama Gakuin University, Japan
,
Kazuki Takashima
Tohoku University, Japan
,
Parinya Punpongsanon
Osaka University, Japan
,
Misha Sra
University of California, Santa Barbara, USA
,
Kazuyuki Fujita
Tohoku University, Japan
,
Shigeo Yoshida
The University of Tokyo, Japan
,
Tham Piumsomboon
University of Canterbury, New Zealand

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Wuhan Science and Technology Planning Application Foundation Frontier Project
National Key Research and Development Program of China

Conference

VRST '21

Sponsor:

VRST '21: 27th ACM Symposium on Virtual Reality Software and Technology

December 8 - 10, 2021

Osaka, Japan

Acceptance Rates

Overall Acceptance Rate 66 of 254 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
292
Total Downloads

Downloads (Last 12 months)33
Downloads (Last 6 weeks)1

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xu WFan WLi JAlfarraj OTolba AHuang T(2023)A Robust Visual SLAM Method for Additive Manufacturing of Vehicular Parts Under Dynamic ScenesIEEE Access10.1109/ACCESS.2023.325173311(22114-22123)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3251733

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents