Abstract
Various approaches using neural networks have been proposed to address multi-view stereopsis, but most of them lack capabilities to handle large textureless regions. Hence, a compelling matching network learning comprehensive information from stereo images is constructed to enforce smoothness constraints globally. Trained over binocular stereo datasets only, we show that the network can directly handle the DTU multi-view stereo dataset. When merging together multiple depth maps obtained using either stereo matching, an additional point consolidation procedure is often needed for removing outliers and better aligning individual patches. A second network that consolidates 3D point clouds through directly projecting individual 3D points based on point distributions in their neighborhoods is proposed. Unlike the matching network, this network is trained on local information and is scalable for handling point clouds of any sizes and is capable of processing selected areas of interest as well. Quantitative evaluation on the DTU dataset demonstrates our two networks together can generate point clouds comparable to existing state-of-the-art approaches.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aanæs, H., Jensen, R.R., Vogiatzis, G., Tola, E., Dahl, A.B.: Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis. 2016, 1–16 (2016)
Arvanitis, G., Spathis-Papadiotis, A., Lalos, A.S., Moustakas, K., Fakotakis, N.: Outliers removal and consolidation of dynamic point cloud. In: 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, pp. 3888–3892 (2018)
Bleyer, M., Rhemann, C., Rother, C.: Patchmatch stereo-stereo matching with slanted support windows. In: Proceedings of the British Machine Vision Conference (BMVC), vol. 11, pp. 1–11 (2011)
Boulch, A., Marlet, R.: Deep learning for robust normal estimation in unstructured point clouds. Comput. Graph. Forum 35, 281–290 (2016)
Campbell, N., Vogiatzis, G., Hernandez, C., Cipolla, R.: Using multiple hypotheses to improve depth-maps for multi-view stereo, vol. 5302, pp. 766–779 (2008)
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)
Chen, R., Han, S., Xu, J., Su, H.: Point-based multi-view stereo network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1538–1547 (2019)
Choi, S., Kim, S., Sohn, K., et al.: Learning descriptor, confidence, and depth estimation in multi-view stereo. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, pp. 389–3896 (2018)
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2010)
Furukawa, Y., Hernández, C., et al.: Multi-view stereo: a tutorial. Found. Trends Comput. Graph. Vis. 9(1–2), 1–148 (2015)
Galliani, S., Schindler, K.: Just look at the image: viewpoint-specific surface normal prediction for improved multi-view reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5479–5487 (2016)
Galliani, S., Lasinger, K., Schindler, K.: Massively parallel multiview stereopsis by surface normal diffusion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 873–881 (2015)
Guerrero, P., Kleiman, Y., Ovsjanikov, M., Mitra, N.J.: PCPNet: learning local shape properties from raw point clouds. Comput. Graph. Forum 37(2), 75–85 (2018). https://doi.org/10.1111/cgf.13343
Han, X., Li, Z., Huang, H., Kalogerakis, E., Yu, Y.: High-resolution shape completion using deep neural networks for global structure and local geometry inference, pp 85–93 (2017)
Hartmann, W., Galliani, S., Havlena, M., Van Gool, L., Schindler, K.: Learned multi-patch similarity. In: Proceedings of the International Conference on Computer Vision (ICCV), IEEE, pp 1595–1603 (2017)
Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008)
Huang, H., Li, D., Zhang, H., Ascher, U., Cohen-Or, D.: Consolidation of unorganized point clouds for surface reconstruction. ACM Trans. Graph. 28(5), 176 (2009)
Huang, H., Wu, S., Gong, M., Cohen-Or, D., Ascher, U., Zhang, H.R.: Edge-aware point set resampling. ACM Trans. Graph. 32(1), 9 (2013)
Huang, P.H., Matzen, K., Kopf, J., Ahuja, N., Huang, J.B.: Deepmvs: Learning multi-view stereopsis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2821–2830 (2018)
Im, S., Jeon, H.G., Lin, S., Kweon, I.S.: DPSNet: End-to-end deep plane sweep stereo. In: International Conference on Learning Representations (2019)
Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: SurfaceNet: An end-to-end 3D neural network for multiview stereopsis. arXiv preprint arXiv:1708.01749 (2017)
Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. In: Advances in Neural Information Processing Systems, pp. 365–376 (2017)
Kazhdan, M., Hoppe, H.: Screened Poisson surface reconstruction. ACM Trans. Graph. 32(3), 29 (2013)
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., Bry, A.: End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the International Conference on Computer Vision (ICCV) (2017)
Kim, P., Chen, J., Cho, Y.K.: Slam-driven robotic mapping and registration of 3D point clouds. Autom. Constr. 89, 38–48 (2018)
Kim, S.H., Chung, K.Y.: Medical information service system based on human 3D anatomical model. Multimed. Tools Appl. 74(20), 8939–8950 (2015)
Luo, K., Guan, T., Ju, L., Huang, H., Luo, Y.: P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 10452–10461 (2019)
Mao, W., Gong, M., Huang, X., Cai, H., Yi, Z.: A global-matching framework for multi-view stereopsis. In: Vento, M., Percannella, G. (eds.) Computer Analysis of Images and Patterns, pp. 635–647. Springer, Cham (2019)
Menze, M., Heipke, C., Geiger, A.: Joint 3D estimation of vehicles and scene flow. In: ISPRS Workshop on Image Sequence Analysis (ISA) (2015)
Pang, J., Sun, W., Ren, J.S., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: Proceedings of the International Conference on Computer Vision (ICCV), vol. 7 (2017)
Park, H., Lee, K.M.: Look wider to match image patches with convolutional neural networks. IEEE Signal Process. Lett. 24(12), 1788–1792 (2017)
Poms, A., Wu, C., Yu, S.I., Sheikh, Y.: Learning patch reconstructability for accelerating multi-view stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3041–3050 (2018)
Preiner, R., Mattausch, O., Arikan, M., Pajarola, R., Wimmer, M.: Continuous projection for fast l1 reconstruction. ACM Trans. Graph. 33(4), 47–1 (2014)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: Deep learning on point sets for 3D classification and segmentation. arXiv preprint arXiv:1612.00593 (2016)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017)
Romanoni, A., Matteucci, M.: Tapa-mvs: Textureless-aware patchmatch multi-view stereo. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 10413–10422 (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, pp. 234–241 (2015)
Roveri, R., Öztireli, A.C., Pandele, I., Gross, M.H.: PointProNets: consolidation of point clouds with convolutional neural networks. Comput. Graph. Forum 37, 87–99 (2018)
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)
Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., Westling, P.: High-resolution stereo datasets with subpixel-accurate ground truth. In: German Conference on Pattern Recognition. Springer, pp. 31–42 (2014)
Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, vol. 1, pp. 519–528 (2006)
Sun, Y., Schaefer, S., Wang, W.: Denoising point sets via l0 minimization. Comput. Aided Geom. Des. 35, 2–15 (2015)
Tola, E., Strecha, C., Fua, P.: Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach. Vis. Appl. 23(5), 903–920 (2012)
Wu, S., Huang, H., Gong, M., Zwicker, M., Cohen-Or, D.: Deep points consolidation. ACM Trans. Graph. 34(6), 176 (2015)
Yan, T., Gan, Y., Xia, Z., Zhao, Q.: Segment-based disparity refinement with occlusion handling for stereo matching. IEEE Trans. Image Process. 28, 3885–3897 (2019)
Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: Depth inference for unstructured multi-view stereo. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 767–783 (2018)
Yao, Y., Luo, Z., Li, S., Shen, T., Fang, T., Quan, L.: Recurrent mvsnet for high-resolution multi-view stereo depth inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525–5534 (2019)
Ye, X., Li, J., Wang, H., Huang, H., Zhang, X.: Efficient stereo matching leveraging deep local and context information. IEEE Access 5, 18745–18755 (2017)
Yifan, W., Wu, S., Huang, H., Cohen-Or, D., Sorkine-Hornung, O.: Patch-base progressive 3D Point Set Upsampling. ArXiv e-prints arXiv:1811.11286 (2018)
Yu, L., Li, X., Fu, C.W., Cohen-Or, D., Heng, P.A.: EC-Net: an edge-aware point set consolidation network. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018a)
Yu, L., Li, X., Fu, C.W., Cohen-Or, D., Heng, P.A.: PU-Net: Point cloud upsampling network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018b)
Yu, Z., Gao, S.: Fast-mvsnet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1949–1958 (2020)
Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1–32), 2 (2016)
Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1330–1334 (2000)
Zollhöfer, M., Siegl, C., Vetter, M., Dreyer, B., Stamminger, M., Aybek, S., Bauer, F.: Low-cost real-time 3D reconstruction of large-scale excavation sites. J. Comput. Cult. Herit. 9(1), 2 (2016)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mao, W., Wang, M., Huang, H. et al. A robust framework for multi-view stereopsis. Vis Comput 38, 1539–1551 (2022). https://doi.org/10.1007/s00371-021-02087-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-021-02087-5