research-article

Bridging the Domain Gap in Scene Flow Estimation via Hierarchical Smoothness Refinement

Authors:

Jun LiuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 8

Article No.: 236, Pages 1 - 21

https://doi.org/10.1145/3661823

Published: 12 June 2024 Publication History

Abstract

This article introduces SmoothFlowNet3D, an innovative encoder-decoder architecture specifically designed for bridging the domain gap in scene flow estimation. To achieve this goal, SmoothFlowNet3D divides the scene flow estimation task into two stages: initial scene flow estimation and smoothness refinement. Specifically, SmoothFlowNet3D comprises a hierarchical encoder that extracts multi-scale point cloud features from two consecutive frames, along with a hierarchical decoder responsible for predicting the initial scene flow and further refining it to achieve smoother estimation. To generate the initial scene flow, a cross-frame nearest-neighbor search operation is performed between the features extracted from two consecutive frames, resulting in forward and backward flow embeddings. These embeddings are then combined to form the bidirectional flow embedding, serving as input for predicting the initial scene flow. Additionally, a flow smoothing module based on the self-attention mechanism is proposed to predict the smoothing error and facilitate the refinement of the initial scene flow for more accurate and smoother estimation results. Extensive experiments demonstrate that the proposed SmoothFlowNet3D approach achieves state-of-the-art performance on both synthetic datasets and real LiDAR point clouds, confirming its effectiveness in enhancing scene flow smoothness.

References

[1]

Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). https://iclr.cc/archive/www/2015.htmlhttps://arxiv.org/abs/1409.0473

[2]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, Vol. 33 (2020), 1877–1901.

[3]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European Conference on Computer Vision. Springer, 213–229.

Digital Library

[4]

Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. 2015. Shapenet: An information-rich 3d model repository. arXiv:1512.03012. Retrieved from https://arxiv.org/abs/1512.03012

[5]

Chaofan Chen, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2020. Hapgn: Hierarchical attentive pooling graph network for point cloud segmentation. IEEE Trans. Multimedia 23 (2020), 2335–2346.

Digital Library

[6]

Yang Chen and Gérard Medioni. 1992. Object modelling by registration of multiple range images. Image and Vision Computing 10, 3 (1992), 145–155.

Digital Library

[7]

Wencan Cheng and Jong Hwan Ko. 2022. Bi-PointFlowNet: Bidirectional learning for point cloud based scene flow estimation. In Proceedings of the 17th European Conference on Computer Vision (ECCV’22), Part XXVIII. Springer, 108–124.

Digital Library

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.

[9]

Jingyun Fu, Zhiyu Xiang, Chengyu Qiao, and Tingming Bai. 2023. PT-FlowNet: Scene flow estimation on point clouds with point transformer. IEEE Robot. Autom. Lett. 8, 5 (2023), 2566–2573.

[10]

Xiuye Gu, Yijie Wang, Chongruo Wu, Yong Jae Lee, and Panqu Wang. 2019. Hplflownet: Hierarchical permutohedral lattice flownet for scene flow estimation on large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3254–3263.

[11]

Yair Kittenplon, Yonina C. Eldar, and Dan Raviv. 2021. Flowstep3d: Model unrolling for self-supervised scene flow estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4114–4123.

[12]

Itai Lang, Dror Aiger, Forrester Cole, Shai Avidan, and Michael Rubinstein. 2023. Scoop: Self-supervised correspondence and optimization-based scene flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5281–5290.

[13]

Bing Li, Cheng Zheng, Silvio Giancola, and Bernard Ghanem. 2022. Sctn: Sparse convolution-transformer network for scene flow estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 1254–1262.

[14]

Jiachen Li, Ali Hassani, Steven Walton, and Humphrey Shi. 2023. Convmlp: Hierarchical convolutional mlps for vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6306–6315.

[15]

Ruibo Li, Guosheng Lin, Tong He, Fayao Liu, and Chunhua Shen. 2021. HCRF-Flow: Scene flow from point clouds with continuous high-order CRFs and position-aware flow embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 364–373.

[16]

Xingyu Liu, Charles R. Qi, and Leonidas J. Guibas. 2019. Flownet3d: Learning scene flow in 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 529–537.

[17]

Daniel Maturana and Sebastian Scherer. 2015. Voxnet: A 3d convolutional neural network for real-time object recognition. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’15). IEEE, 922–928.

Digital Library

[18]

Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4040–4048.

[19]

Hsien-Yu Meng, Lin Gao, Yu-Kun Lai, and Dinesh Manocha. 2019. Vv-net: Voxel vae net with group convolutions for point cloud segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8500–8508.

[20]

Moritz Menze, Christian Heipke, and Andreas Geiger. 2018. Object scene flow. ISPRS J. Photogram. Remote Sens. 140 (2018), 60–76.

[21]

Hermina Petric Maretic, Mireille El Gheche, Giovanni Chierchia, and Pascal Frossard. 2019. GOT: an optimal transport framework for graph comparison. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 13899–13910.

[22]

Gilles Puy, Alexandre Boulch, and Renaud Marlet. 2020. Flot: Scene flow on point clouds guided by optimal transport. In European Conference on Computer Vision. Springer, 527–544.

Digital Library

[23]

Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.

[24]

Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. PointNet++ deep hierarchical feature learning on point sets in a metric space. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 5105–5114.

[25]

Yue Qian, Junhui Hou, Sam Kwong, and Ying He. 2021. Deep magnification-flexible upsampling over 3d point clouds. IEEE Trans. Image Process. 30 (2021), 8354–8367.

Digital Library

[26]

Yaqi Shen, Le Hui, Jin Xie, and Jian Yang. 2023. Self-supervised 3D scene flow estimation guided by superpoints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5271–5280.

[27]

Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, and Jan Kautz. 2018. Splatnet: Sparse lattice networks for point cloud processing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2530–2539.

[28]

Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. 2018. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8934–8943.

[29]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6000–6010.

[30]

Sundar Vedula, Peter Rander, Robert Collins, and Takeo Kanade. 2005. Three-dimensional scene flow. IEEE Trans. Pattern Anal. Mach. Intell. 27, 3 (2005), 475–480.

Digital Library

[31]

Christoph Vogel, Konrad Schindler, and Stefan Roth. 2015. 3d scene flow estimation with a piecewise rigid scene model. Int. J. Comput. Vis. 115, 1 (2015), 1–28.

Digital Library

[32]

Guangming Wang, Yunzhe Hu, Zhe Liu, Yiyang Zhou, Masayoshi Tomizuka, Wei Zhan, and Hesheng Wang. 2022. What matters for 3D scene flow network. In Proceedings of the 17th European Conference on Computer Vision (ECCV’22), Part XXXIII. Springer, 38–55.

Digital Library

[33]

Guangming Wang, Chensheng Peng, Yingying Gu, Jinpeng Zhang, and Hesheng Wang. 2023. Interactive multi-scale fusion of 2D and 3D features for multi-object vehicle tracking. IEEE Transactions on Intelligent Transportation Systems 24, 10 (2023), 10618–10627.

[34]

Guangming Wang, Xinrui Wu, Zhe Liu, and Hesheng Wang. 2021. Hierarchical attention learning of scene flow in 3d point clouds. IEEE Trans. Image Process. 30 (2021), 5168–5181.

[35]

Yun Wang, Cheng Chi, and Xin Yang. 2023. Exploiting implicit rigidity constraints via weight-sharing aggregation for scene flow estimation from point clouds. arXiv:2303.02454. Retrieved from https://arxiv.org/abs/2303.02454

[36]

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 38, 5 (2019), 1–12.

Digital Library

[37]

Zhoutao Wang, Qian Xie, Mingqiang Wei, Kun Long, and Jun Wang. 2022. Multi-feature fusion votenet for 3d object detection. ACM Trans. Multimedia Comput. Commun. Appl. 18, 1 (2022), 1–17.

Digital Library

[38]

Yi Wei, Ziyi Wang, Yongming Rao, Jiwen Lu, and Jie Zhou. 2021. Pv-raft: Point-voxel correlation fields for scene flow estimation of point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6954–6963.

[39]

Wenxuan Wu, Zhongang Qi, and Li Fuxin. 2019. Pointconv: Deep convolutional networks on 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9621–9630.

[40]

Wenxuan Wu, Zhi Yuan Wang, Zhuwen Li, Wei Liu, and Li Fuxin. 2020. Pointpwc-net: Cost volume on point clouds for (self-) supervised scene flow estimation. In European Conference on Computer Vision. Springer, 88–107.

Digital Library

[41]

Yan Xia, Yusheng Xu, Shuang Li, Rui Wang, Juan Du, Daniel Cremers, and Uwe Stilla. 2021. SOE-Net: A self-attention and orientation encoding network for point cloud based place recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11348–11357.

[42]

Jin Xie, Yanwei Pang, Jing Pan, Jing Nie, Jiale Cao, and Jungong Han. 2023. Complementary feature pyramid network for object detection. ACM Trans. Multimedia Comput. Commun. Appl. 19, 6 (2023), 1–15.

Digital Library

[43]

Shunxin Xu, Ke Sun, Dong Liu, Zhiwei Xiong, and Zheng-Jun Zha. 2023. Synergy between semantic segmentation and image denoising via alternate boosting. ACM Trans. Multimedia Comput. Commun. Appl. 19, 2 (2023), 1–23.

Digital Library

[44]

Linwei Ye, Mrigank Rochan, Zhi Liu, and Yang Wang. 2019. Cross-modal self-attention network for referring image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10502–10511.

[45]

Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. 2022. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19313–19322.

[46]

Hengshuang Zhao, Jiaya Jia, and Vladlen Koltun. 2020. Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10076–10085.

[47]

Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip H. S. Torr, and Vladlen Koltun. 2021. Point transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16259–16268.

Index Terms

Bridging the Domain Gap in Scene Flow Estimation via Hierarchical Smoothness Refinement
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Scene flow estimation by depth map upsampling and layer assignment for camera-LiDAR system
Abstract
This paper presents a scene flow estimation method which functions by depth map upsampling and layer assignment for the camera-LiDAR (Light Detection And Ranging) system. The 3D geometry and motion of the observed scene are estimated ...
3D Scene Flow Estimation with a Piecewise Rigid Scene Model

3D scene flow estimation aims to jointly recover dense geometry and 3D motion from stereoscopic image sequences, thus generalizes classical disparity and 2D optical flow estimation. To realize its conceptual benefits and overcome limitations of many ...
Multi-view Scene Flow Estimation: A View Centered Variational Approach

We present a novel method for recovering the 3D structure and scene flow from calibrated multi-view sequences. We propose a 3D point cloud parametrization of the 3D structure and scene flow that allows us to directly estimate the desired unknowns. A ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 8

August 2024

726 pages

EISSN:1551-6865

DOI:10.1145/3618074

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2024

Online AM: 27 April 2024

Accepted: 17 April 2024

Revised: 22 January 2024

Received: 17 July 2023

Published in TOMM Volume 20, Issue 8

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

The Hubei Key Laboratory of Intelligent Geo-Information Processing
Singapore Ministry of Education (MOE) AcRF Tier 2

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
127
Total Downloads

Downloads (Last 12 months)127
Downloads (Last 6 weeks)7

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents