Abstract
Stereo matching is an important part of establishing stereo vision. Parallax information obtained by stereo matching directly affects the three-dimensional information of an object. End-to-end stereo matching algorithms can directly derive parallax maps from the designed network. However, at the same time, the structure of the network is complex, and a large number of parameters take up much memory. The network increases the device burden, which increases the time required to obtain the parallax map, lowering the efficiency of the network movement. Thus, an improved stereo matching algorithm based on AANet (adaptive aggregation network for efficient stereo matching) is proposed in this paper: AEDNet (adaptive end-to-end depth network for stereo matching). In the feature extraction module, the network simplifies the network structure by limiting the convolution kernel size to obtain the features with low abstraction. In cost aggregation, the intra-scale aggregation module is used to achieve adaptive cost aggregation through deformable convolution, and the inter-scale aggregation module uses the traditional cross-scale aggregation method to compensate for the missing global information to a certain extent. The network is verified the performance on the KITTI dataset. The results show that the algorithm can still complete stereo matching efficiently and accurately and obtain a better disparity map when the network is simplified. These provide preconditions for accurate three-dimensional reconstruction.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author, upon reasonable request.
References
Aleotti F, Poggi M, Tosi F et al (2020) Learning End-to-End Scene Flow by Distilling Single Tasks Knowledge[C]. Nat Conf Artif Intell 34(7):10435–10442
Bhatti Uzair Aslam, Zhaoyuan Yu, Chanussot Jocelyn, Zeeshan Zeeshan, Yuan Linwang, Luo Wen, Nawaz Saqib Ali, Bhatti Mughair Aslam, Ain QuratUl, Mehmood Anum (2022) Local Similarity-Based Spatial-Spectral Fusion Hyperspectral Image Classification With Deep CNN and Gabor Filtering[J]. IEEE Trans Geosci Remote Sens 60:5514215–5514215
Chang JR, Chen YS (2018) Pyramid stereo matching network [C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5410–5418
Changhee Won;Jongbin Ryu;Jongwoo Lim (2021) End-to-End Learning for Omnidirectional Stereo Matching With Uncertainty Prior[J]. IEEE Trans Pattern Anal Mach Intell 43(11):3850–3862
Chen S, Xiang Z, Qiao C et al (2020) SGNet: semantics guided deep stereo matching[C]. Proceedings of Asian Conference on Computer Vision (ACCV) 106-122. Springer International Publishing, Kyoto
Chen W, Jia X, Mingfei Wu, Liang Z (2022) Multi-Dimensional Cooperative Network for Stereo Matching[J]. IEEE Robot Autom Lett 7(1):581–587
Chenglong Xu, Chengdong Wu, Daokui Qu, Fang Xu, Sun H, Song J (2021) Accurate and Efficient Stereo Matching by Log-Angle and Pyramid-Tree[J]. IEEE Trans Circuits Syst Video Technol 31(10):4007–4019
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite[C]. IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 3354–3361
He K, Zhang X, Ren S, et al (2016) Deep Residual Learning for Image Recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA. 770-778
Hongzhi Du, Li Y, Sun Y, Zhu J, Tombari F (2021) SRH-Net: Stacked Recurrent Hourglass Network for Stereo Matching[J]. IEEE Robot Autom Lett 6(4):8005–8012
Liu J, Feng Y, Ji G, Fu Y, Zhu S (2020) An improved stereo matching algorithm based on PSMNet[J]. South China Univ Technol (Nat Sci Edit) 48(01):60–69+ 83
Kim S, Min D, Kim S, Sohn K (2021) Adversarial Confidence Estimation Networks for Robust Stereo Matching[J]. IEEE Trans Intell Transp Syst 22(11):6875–6889
Kuzmin A, Mikushin D, Lempitsky V (2017) End-to-end Learning of Cost-Volume Aggregation for Real-time Dense Stereo [C]. 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), Tokyo, Japan. 1–6. https://doi.org/10.1109/MLSP.2017.8168183
Laga Hamid, Jospin Laurent Valentin, Boussaid Farid, Bennamoun Mohammed (2022) A Survey on Deep Learning Techniques for Stereo-Based Depth Estimation[J]. IEEE Trans Pattern Anal Mach Intell 44(4):1738–1764
Lee Y, Kim H (2022) A High-Throughput Depth Estimation Processor for Accurate Semiglobal Stereo Matching Using Pipelined Inter-Pixel Aggregation[J]. IEEE Trans Circuits Syst Vid Technol 32(1):411–422
Li J, Wang P, Xiong P, Cai T, Yan Z, Yang L, Liu J, Fan H, Liu S (2022) Practical stereo matching via cascaded recurrent network with adaptive correlation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). 16263–16271
Liang Z, Guo Y, Feng Y, Chen W, Qiao L, Zhou Li, Zhang J, Liu H (2021) Stereo Matching Using Multi-Level Cost Volume and Multi-Scale Feature Constancy[J]. IEEE Trans Pattern Anal Mach Intell 43(1):300–315
Lin TY, Dollar P, Girshick R, et al (2017) Feature Pyramid Networks for Object Detection[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 936–944
Lipson L, Teed Z, Deng J (2021) Raft-stereo: Multilevel recurrent field transforms for stereo matching[C]. 2021 International Conference on 3D Vision (3DV). 202: 218–227
Liu P, King I, Lyu M, Xu J (2020) Flow2Stereo: Effective Self-Supervised Learning of Optical Flow and Stereo Matching[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6647–6656
Mao Y, Liu Z, Li W, Dai Y, Wang Q, Kim Y-T, Lee H-S (2021) UASNet: Uncertainty adaptive sampling network for deep stereo matching[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV). 6291–6299
Mayer N, Ilg E, Hausser P, et al. (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation [C]. IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas. 4040–4048
Mayer N, Ilg E, Häusser P, Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016:4040–4048
Menze M, Geiger A (2015) Object scene flow for antonomous vehicles [C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3061–3070
Cheng MY, Gai SY, Da FP (2020) Three-dimensional Matching Network Research Based on Attention Mechanism [J]. Optic J 40(14):144–152
Park Jinsun, Jeong Yongseop, Joo Kyungdon, Cho Donghyeon, Kweon In So (2022) Adaptive Cost Volume Fusion Network for Multi-Modal Depth Estimation in Changing Environments[J]. IEEE Robot Autom Lett 7(2):5095–5102
Shankar K, Tjersland M, Ma J, Stone K, Bajracharya M (2022) A Learned Stereo Depth System for Robotic Manipulation in Homes. IEEE Robot Autom Lett 7(2):2305–2312
Shen Z, Dai Y, Rao Z (2021) Cfnet: Cascade and fused cost volume for robust stereo matching[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). 13901-13910
Song X, Zhao X, Fang L et al (2020) Edge Stereo: An Effective Multi-task Learning Network for Stereo Matching and Edge Detection[J]. Int Jo Comput Vis 128(5):910–930. https://doi.org/10.48550/arXiv.1903.01700
Tankovich V, Häne C, Zhang Y, Kowdle A, Fanello S, Bouaziz S (2021) Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR) 14357–14367
Li T, Ma W, Xu SB, Zhang XP (2020) Task-Adaptive End-to-End Networks for Stereo Matching [J]. Comput Res Dev 57(07):1531–1538
Tonioni A, Tosi F, Poggi M, Mattoccia S, Di Stefano L (2019) Real-Time self-adaptive deep stereo[C].The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 195–204
Wang H, Fan R, Cai P, Liu M (2021) PVStereo: Pyramid voting module for end-to-end self-supervised stereo matching[J]. IEEE Robot Autom Lett 6(3):4353–4360
Xu H, Zhang J (2020) AANet: Adaptive Aggregation Network for Efficient Stereo Matching[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1956–1965
Xu B, Xu Y, Yang X, Jia W, Guo Y (2021) Bilateral grid learning for stereo matching networks [C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). 12492–12501
Yang G Zhao H Shi J Deng Z, Jia J (2018) SegStereo: Exploiting Semantic Information for Disparity Estimation[C]. European Conference on Computer Vision (ECCV). 660–676
Yang J, Wang C, Wang H et al (2020) A RGB-D Based Real-Time Multiple Object Detection and Ranging System for Autonomous Driving[J]. IEEE Sens J 20(20):11959–11966
Yao C, Jia Y, Di H, Li P, Wu Y (2021) A decomposition model for stereo matching[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). 6087–6096
Ye X, Sang X, Chen D, Wang P, Wang K, Yan B, Liu B, Wang H, Qi S (2022) Super pixel Guided Network for Three-Dimensional Stereo Matching[J]. IEEE Trans Comput Imaging 8:54–68
Yufeng Wang, Wang Hongwei Yu, Guang Yang Mingquan, Yuwei Yuan, Jicheng Quan (2019) A Stereo-matching Algorithm based on a 3 D Convolutional Neural Network [J]. Optics 39(11):227–234
Yufeng W, Hongwei W, Liu Yu, Mingquan Y, Jicheng Q (2020) Progressive-refined real-time stereo matching algorithm [J]. Opt J 40(09):99–109
Zeng K, Wang Y, Mao J, Liu C, Peng W, Yang Y (2022) Deep Stereo Matching With Hysteresis Attention and Supervised Cost Volume Construction[J]. IEEE Trans Image Process 31:812–822
Zhang F, Prisacariu V, Yang R, Torr PH (2019) GANet: Guided Aggregation Net for end-to-end Stereo Matching. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 185–194
Zhang J, Skinner K, Vasudevan R, Johnson-Roberson M (2019) DispSegNet: Leveraging Semantics for Endto-End Learning of Disparity Estimation From Stereo Imagery[J]. IEEE Robot Autom Lett 4(2):1162–1169
Zhang Y, Chen Y, Bai X, Yu S, Yu K, Li Z, Yang K (2020) Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching[C]. Proceed AAAI Conf Artif Intell 34(7):12926–12934
Acknowledgements
This research was financially supported by the Major Scientific Research Project for Universities of Guangdong Province (2020ZDZX3058); Science and technology projects of Zhuhai in the field of social development (2220004000066); the Key Laboratory of Intelligent Multimedia Technology (201762005)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
We declare that we have no financial or personal relationships with other people or organizations that may have inappropriately influenced our work. There is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, G., Liao, Y. An improved binocular stereo matching algorithm based on AANet. Multimed Tools Appl 82, 40987–41003 (2023). https://doi.org/10.1007/s11042-023-15183-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15183-6