Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Bridging the Domain Gap in Scene Flow Estimation via Hierarchical Smoothness Refinement

Published: 12 June 2024 Publication History

Abstract

This article introduces SmoothFlowNet3D, an innovative encoder-decoder architecture specifically designed for bridging the domain gap in scene flow estimation. To achieve this goal, SmoothFlowNet3D divides the scene flow estimation task into two stages: initial scene flow estimation and smoothness refinement. Specifically, SmoothFlowNet3D comprises a hierarchical encoder that extracts multi-scale point cloud features from two consecutive frames, along with a hierarchical decoder responsible for predicting the initial scene flow and further refining it to achieve smoother estimation. To generate the initial scene flow, a cross-frame nearest-neighbor search operation is performed between the features extracted from two consecutive frames, resulting in forward and backward flow embeddings. These embeddings are then combined to form the bidirectional flow embedding, serving as input for predicting the initial scene flow. Additionally, a flow smoothing module based on the self-attention mechanism is proposed to predict the smoothing error and facilitate the refinement of the initial scene flow for more accurate and smoother estimation results. Extensive experiments demonstrate that the proposed SmoothFlowNet3D approach achieves state-of-the-art performance on both synthetic datasets and real LiDAR point clouds, confirming its effectiveness in enhancing scene flow smoothness.

References

[1]
Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). https://iclr.cc/archive/www/2015.htmlhttps://arxiv.org/abs/1409.0473
[2]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, Vol. 33 (2020), 1877–1901.
[3]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European Conference on Computer Vision. Springer, 213–229.
[4]
Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. 2015. Shapenet: An information-rich 3d model repository. arXiv:1512.03012. Retrieved from https://arxiv.org/abs/1512.03012
[5]
Chaofan Chen, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2020. Hapgn: Hierarchical attentive pooling graph network for point cloud segmentation. IEEE Trans. Multimedia 23 (2020), 2335–2346.
[6]
Yang Chen and Gérard Medioni. 1992. Object modelling by registration of multiple range images. Image and Vision Computing 10, 3 (1992), 145–155.
[7]
Wencan Cheng and Jong Hwan Ko. 2022. Bi-PointFlowNet: Bidirectional learning for point cloud based scene flow estimation. In Proceedings of the 17th European Conference on Computer Vision (ECCV’22), Part XXVIII. Springer, 108–124.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.
[9]
Jingyun Fu, Zhiyu Xiang, Chengyu Qiao, and Tingming Bai. 2023. PT-FlowNet: Scene flow estimation on point clouds with point transformer. IEEE Robot. Autom. Lett. 8, 5 (2023), 2566–2573.
[10]
Xiuye Gu, Yijie Wang, Chongruo Wu, Yong Jae Lee, and Panqu Wang. 2019. Hplflownet: Hierarchical permutohedral lattice flownet for scene flow estimation on large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3254–3263.
[11]
Yair Kittenplon, Yonina C. Eldar, and Dan Raviv. 2021. Flowstep3d: Model unrolling for self-supervised scene flow estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4114–4123.
[12]
Itai Lang, Dror Aiger, Forrester Cole, Shai Avidan, and Michael Rubinstein. 2023. Scoop: Self-supervised correspondence and optimization-based scene flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5281–5290.
[13]
Bing Li, Cheng Zheng, Silvio Giancola, and Bernard Ghanem. 2022. Sctn: Sparse convolution-transformer network for scene flow estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 1254–1262.
[14]
Jiachen Li, Ali Hassani, Steven Walton, and Humphrey Shi. 2023. Convmlp: Hierarchical convolutional mlps for vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6306–6315.
[15]
Ruibo Li, Guosheng Lin, Tong He, Fayao Liu, and Chunhua Shen. 2021. HCRF-Flow: Scene flow from point clouds with continuous high-order CRFs and position-aware flow embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 364–373.
[16]
Xingyu Liu, Charles R. Qi, and Leonidas J. Guibas. 2019. Flownet3d: Learning scene flow in 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 529–537.
[17]
Daniel Maturana and Sebastian Scherer. 2015. Voxnet: A 3d convolutional neural network for real-time object recognition. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’15). IEEE, 922–928.
[18]
Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4040–4048.
[19]
Hsien-Yu Meng, Lin Gao, Yu-Kun Lai, and Dinesh Manocha. 2019. Vv-net: Voxel vae net with group convolutions for point cloud segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8500–8508.
[20]
Moritz Menze, Christian Heipke, and Andreas Geiger. 2018. Object scene flow. ISPRS J. Photogram. Remote Sens. 140 (2018), 60–76.
[21]
Hermina Petric Maretic, Mireille El Gheche, Giovanni Chierchia, and Pascal Frossard. 2019. GOT: an optimal transport framework for graph comparison. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 13899–13910.
[22]
Gilles Puy, Alexandre Boulch, and Renaud Marlet. 2020. Flot: Scene flow on point clouds guided by optimal transport. In European Conference on Computer Vision. Springer, 527–544.
[23]
Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.
[24]
Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. PointNet++ deep hierarchical feature learning on point sets in a metric space. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 5105–5114.
[25]
Yue Qian, Junhui Hou, Sam Kwong, and Ying He. 2021. Deep magnification-flexible upsampling over 3d point clouds. IEEE Trans. Image Process. 30 (2021), 8354–8367.
[26]
Yaqi Shen, Le Hui, Jin Xie, and Jian Yang. 2023. Self-supervised 3D scene flow estimation guided by superpoints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5271–5280.
[27]
Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, and Jan Kautz. 2018. Splatnet: Sparse lattice networks for point cloud processing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2530–2539.
[28]
Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. 2018. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8934–8943.
[29]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6000–6010.
[30]
Sundar Vedula, Peter Rander, Robert Collins, and Takeo Kanade. 2005. Three-dimensional scene flow. IEEE Trans. Pattern Anal. Mach. Intell. 27, 3 (2005), 475–480.
[31]
Christoph Vogel, Konrad Schindler, and Stefan Roth. 2015. 3d scene flow estimation with a piecewise rigid scene model. Int. J. Comput. Vis. 115, 1 (2015), 1–28.
[32]
Guangming Wang, Yunzhe Hu, Zhe Liu, Yiyang Zhou, Masayoshi Tomizuka, Wei Zhan, and Hesheng Wang. 2022. What matters for 3D scene flow network. In Proceedings of the 17th European Conference on Computer Vision (ECCV’22), Part XXXIII. Springer, 38–55.
[33]
Guangming Wang, Chensheng Peng, Yingying Gu, Jinpeng Zhang, and Hesheng Wang. 2023. Interactive multi-scale fusion of 2D and 3D features for multi-object vehicle tracking. IEEE Transactions on Intelligent Transportation Systems 24, 10 (2023), 10618–10627.
[34]
Guangming Wang, Xinrui Wu, Zhe Liu, and Hesheng Wang. 2021. Hierarchical attention learning of scene flow in 3d point clouds. IEEE Trans. Image Process. 30 (2021), 5168–5181.
[35]
Yun Wang, Cheng Chi, and Xin Yang. 2023. Exploiting implicit rigidity constraints via weight-sharing aggregation for scene flow estimation from point clouds. arXiv:2303.02454. Retrieved from https://arxiv.org/abs/2303.02454
[36]
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 38, 5 (2019), 1–12.
[37]
Zhoutao Wang, Qian Xie, Mingqiang Wei, Kun Long, and Jun Wang. 2022. Multi-feature fusion votenet for 3d object detection. ACM Trans. Multimedia Comput. Commun. Appl. 18, 1 (2022), 1–17.
[38]
Yi Wei, Ziyi Wang, Yongming Rao, Jiwen Lu, and Jie Zhou. 2021. Pv-raft: Point-voxel correlation fields for scene flow estimation of point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6954–6963.
[39]
Wenxuan Wu, Zhongang Qi, and Li Fuxin. 2019. Pointconv: Deep convolutional networks on 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9621–9630.
[40]
Wenxuan Wu, Zhi Yuan Wang, Zhuwen Li, Wei Liu, and Li Fuxin. 2020. Pointpwc-net: Cost volume on point clouds for (self-) supervised scene flow estimation. In European Conference on Computer Vision. Springer, 88–107.
[41]
Yan Xia, Yusheng Xu, Shuang Li, Rui Wang, Juan Du, Daniel Cremers, and Uwe Stilla. 2021. SOE-Net: A self-attention and orientation encoding network for point cloud based place recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11348–11357.
[42]
Jin Xie, Yanwei Pang, Jing Pan, Jing Nie, Jiale Cao, and Jungong Han. 2023. Complementary feature pyramid network for object detection. ACM Trans. Multimedia Comput. Commun. Appl. 19, 6 (2023), 1–15.
[43]
Shunxin Xu, Ke Sun, Dong Liu, Zhiwei Xiong, and Zheng-Jun Zha. 2023. Synergy between semantic segmentation and image denoising via alternate boosting. ACM Trans. Multimedia Comput. Commun. Appl. 19, 2 (2023), 1–23.
[44]
Linwei Ye, Mrigank Rochan, Zhi Liu, and Yang Wang. 2019. Cross-modal self-attention network for referring image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10502–10511.
[45]
Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. 2022. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19313–19322.
[46]
Hengshuang Zhao, Jiaya Jia, and Vladlen Koltun. 2020. Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10076–10085.
[47]
Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip H. S. Torr, and Vladlen Koltun. 2021. Point transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16259–16268.

Index Terms

  1. Bridging the Domain Gap in Scene Flow Estimation via Hierarchical Smoothness Refinement

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 8
    August 2024
    726 pages
    EISSN:1551-6865
    DOI:10.1145/3618074
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 June 2024
    Online AM: 27 April 2024
    Accepted: 17 April 2024
    Revised: 22 January 2024
    Received: 17 July 2023
    Published in TOMM Volume 20, Issue 8

    Check for updates

    Author Tags

    1. Point cloud
    2. 3D scene flow
    3. smoothness refinement
    4. self-attention

    Qualifiers

    • Research-article

    Funding Sources

    • The Hubei Key Laboratory of Intelligent Geo-Information Processing
    • Singapore Ministry of Education (MOE) AcRF Tier 2

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 127
      Total Downloads
    • Downloads (Last 12 months)127
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media