Unpaired Underwater Image Synthesis with a Disentangled Representation for Underwater Depth Map Prediction
Abstract
:1. Introduction
- We propose a novel end-to-end framework that applies image-to-image translation to underwater depth map estimation and further boosts current underwater depth map estimation research.
- To enrich our synthesized underwater dataset, we propose a disentangled representation loss along with style diversification loss to identify interpretable and meaningful representations from the unlabeled underwater dataset and the synthesized underwater images with a rich diversity.
- Following the coarse-to-fine principle, and inspired by the work of Eigen et al. [23] and Skinner et al. [19], our approach adopted global–local generators for the estimation of coarse and fine depth maps, respectively. We evaluated our model on a real underwater RGB-D dataset and achieved better results than those of other state-of-the-art models.
2. Methods
2.1. Overall Framework
2.2. Loss Functions
3. Results
3.1. Datasets and Implementation Details
3.1.1. Qualitative Evaluation
3.1.2. Quantitative Evaluation
3.2. Ablation Study
4. Discussion
4.1. Cross-Domain Underwater Image Synthesis with Disentangled Representation
4.2. Challenges of Underwater Scenes with Inhomogeneous Illumination
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Eigen, D.; Puhrsch, C.; Fergus, R. Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inf. Process. Syst. 2014, 27, 2366–2374. [Google Scholar]
- Chi, S.; Xie, Z.; Chen, W. A laser line auto-scanning system for underwater 3D reconstruction. Sensors 2016, 16, 1534. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Palomer, A.; Ridao, P.; Ribas, D.; Forest, J. Underwater 3D laser scanners: The deformation of the plane. In Sensing and Control for Autonomous Vehicles; Springer: Berlin/Heidelberg, Germany, 2017; pp. 73–88. [Google Scholar]
- Xi, Q.; Rauschenbach, T.; Daoliang, L. Review of underwater machine vision technology and its applications. Mar. Technol. Soc. J. 2017, 51, 75–97. [Google Scholar] [CrossRef]
- Dancu, A.; Fourgeaud, M.; Franjcic, Z.; Avetisyan, R. Underwater reconstruction using depth sensors. In SIGGRAPH ASIA Technical Briefs; ACM: New York, NY, USA, 2014; pp. 1–4. [Google Scholar]
- Churnside, J.H.; Marchbanks, R.D.; Lembke, C.; Beckler, J. Optical backscattering measured by airborne lidar and underwater glider. Remote Sens. 2017, 9, 379. [Google Scholar] [CrossRef] [Green Version]
- Deris, A.; Trigonis, I.; Aravanis, A.; Stathopoulou, E. Depth cameras on UAVs: A first approach. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 42, 231–236. [Google Scholar] [CrossRef] [Green Version]
- Ahamed, J.R.; Abas, P.E.; De Silva, L.C. Review of underwater image restoration algorithms. IET Image Process. 2019, 13, 1587–1596. [Google Scholar]
- Massot-Campos, M.; Oliver-Codina, G. Optical sensors and methods for underwater 3D reconstruction. Sensors 2015, 15, 31525–31557. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, N.; Zheng, Z.; Zhang, S.; Yu, Z.; Zheng, H.; Zheng, B. The synthesis of unpaired underwater images using a multistyle generative adversarial network. IEEE Access 2018, 6, 54241–54257. [Google Scholar] [CrossRef]
- Gupta, H.; Mitra, K. Unsupervised single image underwater depth estimation. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 624–628. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Wang, T.C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8798–8807. [Google Scholar]
- Choi, Y.; Choi, M.; Kim, M.; Ha, J.W.; Kim, S.; Choo, J. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8789–8797. [Google Scholar]
- Li, J.; Skinner, K.A.; Eustice, R.M.; Johnson-Roberson, M. WaterGAN: Unsupervised generative network to enable real-time color correction of monocular underwater images. IEEE Robot. Autom. Lett. 2017, 3, 387–394. [Google Scholar] [CrossRef] [Green Version]
- Cao, K.; Peng, Y.T.; Cosman, P.C. Underwater image restoration using deep networks to estimate background light and scene depth. In Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation, Las Vegas, NV, USA, 8–10 April 2018; pp. 1–4. [Google Scholar]
- Li, C.; Anwar, S.; Porikli, F. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 2020, 98, 107038. [Google Scholar] [CrossRef]
- Skinner, K.A.; Zhang, J.; Olson, E.A.; Johnson-Roberson, M. Uwstereonet: Unsupervised learning for depth estimation and color correction of underwater stereo imagery. In Proceedings of the International Conference on Robotics and Automation, Montreal, QC, Canada, 20–24 May 2019; pp. 7947–7954. [Google Scholar]
- Wang, N.; Zhou, Y.; Han, F.; Zhu, H.; Zheng, Y. UWGAN: Underwater GAN for real-world underwater color restoration and dehazing. arXiv 2019, arXiv:1912.10269. [Google Scholar]
- Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. arXiv 2016, arXiv:1606.03657. [Google Scholar]
- Spurr, A.; Aksan, E.; Hilliges, O. Guiding InfoGAN with semi-supervision. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2017; pp. 119–134. [Google Scholar]
- Eigen, D.; Fergus, R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015; pp. 2650–2658. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2794–2802. [Google Scholar]
- Wang, C.; Xu, C.; Wang, C.; Tao, D. Perceptual adversarial networks for image-to-image transformation. IEEE Trans. Image Process. 2018, 27, 4066–4079. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–13. [Google Scholar]
- Kupyn, O.; Martyniuk, T.; Wu, J.; Wang, Z. DeblurGAN-v2: Deblurring (orders-of-magnitude) faster and better. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–29 October 2019; pp. 8878–8887. [Google Scholar]
- Choi, Y.; Uh, Y.; Yoo, J.; Ha, J.W. StarGAN v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8188–8197. [Google Scholar]
- Hu, J.; Ozay, M.; Zhang, Y.; Okatani, T. Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 1043–1051. [Google Scholar]
- Abady, L.; Barni, M.; Garzelli, A.; Tondi, B. GAN generation of synthetic multispectral satellite images. In Image and Signal Processing for Remote Sensing XXVI; International Society for Optics and Photonics: London, UK, 2020; Volume 11533, pp. 122–133. [Google Scholar]
- He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [PubMed]
- Drews, P.L.; Nascimento, E.R.; Botelho, S.S.; Campos, M.F.M. Underwater depth estimation and image restoration based on single images. IEEE Comput. Graph. Appl. 2016, 36, 24–35. [Google Scholar] [CrossRef] [PubMed]
- Berman, D.; Treibitz, T.; Avidan, S. Diving into haze-lines: Color restoration of underwater images. In Proceedings of the British Machine Vision Conference (BMVC), London, UK, 9–12 September 2017; Volume 1, pp. 1–12. [Google Scholar]
- Ancuti, C.; Ancuti, C.O.; De Vleeschouwer, C. D-hazy: A dataset to evaluate quantitatively dehazing algorithms. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 2226–2230. [Google Scholar]
- Xiao, J.; Hays, J.; Ehinger, K.A.; Oliva, A.; Torralba, A. Sun database: Large-scale scene recognition from abbey to zoo. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 3485–3492. [Google Scholar]
- Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April 30–3 May 2018; pp. 1–26. [Google Scholar]
- Brock, A.; Donahue, J.; Simonyan, K. Large scale GAN training for high fidelity natural image synthesis. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April 30–3 May 2018; pp. 1–35. [Google Scholar]
- Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 7354–7363. [Google Scholar]
- Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2012; pp. 746–760. [Google Scholar]
- Chen, Y.; Li, W.; Sakaridis, C.; Dai, D.; Van Gool, L. Domain adaptive faster R-CNN for object detection in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3339–3348. [Google Scholar]
DCP | UDCP | Berman et al. | UW-Net | UW-Net (FT) | Ours | Ours (FT) | |
---|---|---|---|---|---|---|---|
SI-MSE | 1.3618 | 0.6966 | 0.6755 | 0.4404 | 0.3708 | 0.3199 | 0.2486 |
0.2968 | 0.4894 | 0.6448 | 0.6202 | 0.6451 | 0.7018 | 0.7600 |
Proposed | w/o Disentangled Representation | w/o Coarse-to-Fine Pipeline | |
---|---|---|---|
SI-MSE | 0.2486 | 0.2797 | 0.2707 |
0.7600 | 0.7136 | 0.7117 |
DCP | UDCP | Berman et al. | UW-Net | UW-Net(FT) | Ours-C | Ours-C(FT) | |
---|---|---|---|---|---|---|---|
SI-MSE | 1.3618 | 0.6966 | 0.6755 | 0.4404 | 0.3708 | 0.3526 | 0.2447 |
0.2968 | 0.4894 | 0.6448 | 0.6202 | 0.6451 | 0.6823 | 0.7423 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Q.; Xin, Z.; Yu, Z.; Zheng, B. Unpaired Underwater Image Synthesis with a Disentangled Representation for Underwater Depth Map Prediction. Sensors 2021, 21, 3268. https://doi.org/10.3390/s21093268
Zhao Q, Xin Z, Yu Z, Zheng B. Unpaired Underwater Image Synthesis with a Disentangled Representation for Underwater Depth Map Prediction. Sensors. 2021; 21(9):3268. https://doi.org/10.3390/s21093268
Chicago/Turabian StyleZhao, Qi, Zhichao Xin, Zhibin Yu, and Bing Zheng. 2021. "Unpaired Underwater Image Synthesis with a Disentangled Representation for Underwater Depth Map Prediction" Sensors 21, no. 9: 3268. https://doi.org/10.3390/s21093268
APA StyleZhao, Q., Xin, Z., Yu, Z., & Zheng, B. (2021). Unpaired Underwater Image Synthesis with a Disentangled Representation for Underwater Depth Map Prediction. Sensors, 21(9), 3268. https://doi.org/10.3390/s21093268