Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3475273acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

HDA-Net: Horizontal Deformable Attention Network for Stereo Matching

Published: 17 October 2021 Publication History

Abstract

Stereo matching is a fundamental and challenging task which has various applications in autonomous driving, dense reconstruction and other depth related tasks. Contextual information with discriminative features is crucial for accurate stereo matching in the ill-posed regions (textureless, occlusion, etc.). In this paper, we propose an efficient horizontal attention module to adaptively capture the global correspondence clues. Compared with the popular non-local attention, our horizontal attention is more effective for stereo matching with better performance and lower consumption of computation and memory. We further introduce a deformable module to refine the contextual information in the disparity discontinuous areas such as the boundary of objects. Learning-based method is adopted to construct the cost volume by concatenating the features of two branches. In order to offer explicit similarity measure to guide learning-based volume for obtaining more reasonable unimodal matching cost distribution we additionally combine the learning-based volume with the improved zero-centered group-wise correlation volume. Finally, we regularize the 4D joint cost volume by a 3D CNN module and generate the final output by disparity regression. The experimental results show that our proposed HDA-Net achieves the state-of-the-art performance on the Scene Flow dataset and obtains competitive performance on the KITTI datasets compared with the relevant networks.

References

[1]
N. Parmar A. Vaswani, N. Shazeer and et al. 2017. Attention Is All You Need. In Proceedings of Conference on Neural Information Processing Systems.
[2]
M. Brown, G. Hua, and S. Winder. 2011. Discriminative Learning of Local Image Descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, 1 (2011), 43--57. https://doi.org/10.1109/TPAMI.2010.54
[3]
R. Chabra, J. Straub, C. Sweeney, R. Newcombe, and H. Fuchs. 2019. StereoDRNet: Dilated Residual StereoNet. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 11778--11787. https://doi.org/10.1109/CVPR.2019.01206
[4]
J. Chang and Y. Chen. 2018. Pyramid Stereo Matching Network. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 5410--5418. https://doi.org/10.1109/CVPR.2018.00567
[5]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan Yuille. 2016. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PP (06 2016). https://doi.org/10.1109/TPAMI.2017.2699184
[6]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 4 (2017), 834--848.
[7]
Zhuoyuan Chen, Xun Sun, Liang Wang, Yinan Yu, and Chang Huang. 2015. A Deep Visual Correspondence Embedding Model for Stereo Matching Costs. In Proceedings of IEEE International Conference on Computer Vision.
[8]
Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision. 2758--2766.
[9]
J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu. 2019. Dual Attention Network for Scene Segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3141--3149. https://doi.org/10.1109/CVPR.2019.00326
[10]
Andreas Geiger, Martin Roser, and Raquel Urtasun. 2010. Efficient Large-Scale Stereo Matching. In Proceedings of Asian Conference on Computer Vision.
[11]
X. Guo, K. Yang, W. Yang, X. Wang, and H. Li. 2019. Group-Wise Correlation Stereo Network. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 3268--3277. https://doi.org/10.1109/CVPR.2019.00339
[12]
Z. Guo, L. Zhang, and D. Zhang. 2010. A Completed Modeling of Local Binary Pattern Operator for Texture Classification. IEEE Transactions on Image Processing, Vol. 19, 6 (2010), 1657--1663. https://doi.org/10.1109/TIP.2010.2044957
[13]
Rostam Affendi Hamzah, Rosman Abd Rahim, and Zarina Mohd Noh. 2010. Sum of absolute differences algorithm in stereo correspondence problem for stereo matching in computer vision application. In 2010 3rd International Conference on Computer Science and Information Technology, Vol. 1. IEEE, 652--657.
[14]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 770--778. https://doi.org/10.1109/CVPR.2016.90
[15]
Z. Huang, X. Wang, Y. Wei, L. Huang, H. Shi, W. Liu, and T. S. Huang. 2020. CCNet: Criss-Cross Attention for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020), 1--1. https://doi.org/10.1109/TPAMI.2020.3007032
[16]
Adrian Johnston and Gustavo Carneiro. 2020. Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4756--4765.
[17]
A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, R. Kennedy, A. Bachrach, and A. Bry. 2017. End-to-End Learning of Geometry and Context for Deep Stereo Regression. In Proceedings of IEEE International Conference on Computer Vision. 66--75. https://doi.org/10.1109/ICCV.2017.17
[18]
Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. Proceedings of International Conference on Learning Representations (12 2014).
[19]
Marvin Klingner, Jan-Aike Termöhlen, Jonas Mikolajczyk, and Tim Fingscheidt. 2020. Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In European Conference on Computer Vision. Springer, 582--600.
[20]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.
[21]
Wenjie Luo, Alexander G. Schwing, and Raquel Urtasun. 2016. Efficient Deep Learning for Stereo Matching. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 5695--5703.
[22]
Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 4040--4048. https://doi.org/10.1109/CVPR.2016.438
[23]
Moritz Menze and Andreas Geiger. 2015. Object scene flow for autonomous vehicles. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3061--3070.
[24]
Zhibo Rao, Mingyi He, Yuchao Dai, Zhidong Zhu, and Renjie He. 2020. NLCA-Net: a non-local context attention network for stereo matching. APSIPA Transactions on Signal and Information Processing, Vol. 9 (2020).
[25]
H. Sang, Q. Wang, and Y. Zhao. 2019. Multi-Scale Context Attention Network for Stereo Matching. IEEE Access, Vol. 7 (2019), 15152--15161. https://doi.org/10.1109/ACCESS.2019.2895271
[26]
D. Scharstein, R. Szeliski, and R. Zabih. 2001. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In Proceedings of IEEE Workshop on Stereo and Multi-Baseline Vision. 131--140. https://doi.org/10.1109/SMBV.2001.988771
[27]
Amit Shaked and Lior Wolf. 2017. Improved Stereo Matching with Constant Highway Networks and Reflective Confidence Learning. In Proceedings of IEEE Computer Vision and Pattern Recognition. 6901--6910.
[28]
Zhenyao Wu, Xinyi Wu, Xiaoping Zhang, Song Wang, and Lili Ju. 2019. Semantic Stereo Matching With Pyramid Cost Volumes. In Proceedings of IEEE International Conference on Computer Vision.
[29]
Chen-Wei Xie, Hong-Yu Zhou, and Jianxin Wu. 2018. Vortex Pooling: Improving Context Representation in Semantic Segmentation. ArXiv (04 2018).
[30]
H. Xu and J. Zhang. 2020. AANet: Adaptive Aggregation Network for Efficient Stereo Matching. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 1956--1965. https://doi.org/10.1109/CVPR42600.2020.00203
[31]
Guorun Yang, Hengshuang Zhao, Jianping Shi, Zhidong Deng, and Jiaya Jia. 2018. SegStereo: Exploiting Semantic Information for Disparity Estimation. In Proceedings of European Conference on Computer Vision.
[32]
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018a. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European conference on computer vision (ECCV). 325--341.
[33]
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018b. Learning a discriminative feature network for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1857--1866.
[34]
Y. Yuan and Jingdong Wang. 2018. OCNet: Object Context Network for Scene Parsing. ArXiv, Vol. abs/1809.00916 (2018).
[35]
Jure Zbontar and Yann LeCun. 2015. Computing the stereo matching cost with a convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1592--1599.
[36]
Jure vZ bontar and Yann LeCun. 2016. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches. Journal of Machine Learning Research, Vol. 17, 65 (2016), 1--32. http://jmlr.org/papers/v17/15--535.html
[37]
Ke Zhang, Jiangbo Lu, Gauthier Lafruit, Rudy Lauwereins, and Luc Van Gool. 2009. Robust stereo matching with fast normalized cross-correlation over shape-adaptive regions. In 2009 16th IEEE International Conference on Image Processing (ICIP). IEEE, 2357--2360.
[38]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2881--2890.
[39]
X. Zhu, H. Hu, S. Lin, and J. Dai. 2019 a. Deformable ConvNets V2: More Deformable, Better Results. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 9300--9308. https://doi.org/10.1109/CVPR.2019.00953
[40]
Zhidong Zhu, Mingyi He, Yuchao Dai, Zhibo Rao, and Bo Li. 2019. Multi-scale cross-form pyramid network for stereo matching. In 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA). IEEE, 1789--1794.
[41]
Z. Zhu, M. Xu, S. Bai, T. Huang, and X. Bai. 2019 b. Asymmetric Non-Local Neural Networks for Semantic Segmentation. In Proceedings of IEEE International Conference on Computer Vision (ICCV). 593--602. https://doi.org/10.1109/ICCV.2019.00068

Cited By

View all
  • (2025)AP-Net: Attention-fused volume and progressive aggregation for accurate stereo matchingNeurocomputing10.1016/j.neucom.2024.128685612(128685)Online publication date: Jan-2025
  • (2023)Horizontal Attention Based Generation Module for Unsupervised Domain Adaptive Stereo MatchingIEEE Robotics and Automation Letters10.1109/LRA.2023.33130098:10(6779-6786)Online publication date: Oct-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deformable module
  2. horizontal attention
  3. zero-centered correlation

Qualifiers

  • Research-article

Funding Sources

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)2
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)AP-Net: Attention-fused volume and progressive aggregation for accurate stereo matchingNeurocomputing10.1016/j.neucom.2024.128685612(128685)Online publication date: Jan-2025
  • (2023)Horizontal Attention Based Generation Module for Unsupervised Domain Adaptive Stereo MatchingIEEE Robotics and Automation Letters10.1109/LRA.2023.33130098:10(6779-6786)Online publication date: Oct-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media