A Novel Multi-Scale Attention PFE-UNet for Forest Image Segmentation
Abstract
:1. Introduction
2. Related Works
2.1. Attention Mechanism
2.2. Depthwise Separable Convolutions
2.3. DropBlock
3. Methods
3.1. PFE-UNet Architecture
3.2. Pyramid Feature Extraction
3.3. Spatial Attention
3.4. Channel-Wise Attention
3.5. Loss Function
4. Experiment
4.1. Dataset and Implementation
4.2. Evaluation Metrics
5. Results and Discussion
5.1. Performance and Comparative Analysis
5.2. Ablation Study
5.2.1. Reorder the Convolutional Layers or Not
5.2.2. Different Components Combinations
5.3. Training Time and Prediction Time
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Roy, D.P.; Kovalskyy, V.; Zhang, H.K.; Vermote, E.F.; Yan, L.; Kumar, S.S.; Egorov, A. Characterization of Landsat-7 to Landsat-8 reflective wavelength and normalized difference vegetation index continuity. J. Remote Sens. Environ. 2016, 185, 57–70. [Google Scholar] [CrossRef] [Green Version]
- Gabrielle, F.P.; Marcos, H.C. Deforestation causes different subregional effects on the Amazon bioclimatic equilibrium. J. Geophys. Res. Lett. 2013, 40, 3618–3623. [Google Scholar]
- Boers, N.; Marwan, N.; Barbosa, H.; Kurths, J. A deforestation-induced tipping point for the South American monsoon system. J. Sci. Rep. 2017, 7, 41489. [Google Scholar] [CrossRef] [PubMed]
- Angela, L.; Stefan, E.; Douglas, K.; Paul, M.; Marco, H. Understanding Forest Health with Remote Sensing-Part I—A Review of Spectral Traits, Processes and Remote-Sensing Characteristics. J. Remote Sens. 2016, 8, 1029. [Google Scholar]
- Schulze, K.; Malek, Ž.; Verburg, P. Towards better mapping of forest management patterns: A global allocation approach. J. For. Ecol. Manag. 2019, 432, 776–785. [Google Scholar] [CrossRef] [Green Version]
- Amigo, I. When will the Amazon hit a tipping point. Nature 2020, 578, 505507. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Curtis, P.G.; Slay, C.M.; Harris, N.L.; Tyukavina, A.; Hansen, M.C. Classifying drivers of global forest loss. Science 2018, 361, 1108–1111. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Peng, C.; Li, Y.; Jiao, L.; Chen, Y.; Shang, R. Densely Based Multi-Scale and Multi-Modal Fully Convolutional Networks for High-Resolution Remote-Sensing Image Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2019, 12, 2612–2626. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 640–651. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Lect. Notes Comput. Sci. 2015, 9353, 234–241. [Google Scholar]
- Basaeed, E.; Bhaskar, H.; Al-Mualla, M. Supervised remote sensing image segmentation using boosted convolutional neural networks. Knowl. Based Syst. 2016, 99, 19–27. [Google Scholar] [CrossRef]
- Kampffmeyer, M.; Salberg, A.B.; Jenssen, R. Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]
- Rottensteiner, F.; Sohn, G.; Gerke, M.; Wegner, J.D.; Breitkopf, U.; Jung, J. The ISPRS benchmark on urban object classification and 3D building reconstruction. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, 13, 293–298. [Google Scholar] [CrossRef] [Green Version]
- Audebert, N.; Saux, B.L.; Lefèvre, S. Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks. Lect. Notes Comput. Sci. 2017, 10111, 180–196. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. Comput. Sci. 2014, 4, 357–361. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Lect. Notes Comput. Sci. 2018, 11211, 833–851. [Google Scholar]
- Zhao, T.; Wu, X. Pyramid Feature Attention Network for Saliency detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR, Long Beach, CA, USA, 15–20 June 2019; pp. 3085–3094. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 20 June 2019; pp. 3146–3154. [Google Scholar]
- Li, H.; Xiong, P.; Fan, H.; Sun, J. DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2020; pp. 9522–9531. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. arXiv 2016, arXiv:1606.02147,. [Google Scholar]
- Zhao, H.; Qi, X.; Shen, X.; Shi, J.; Jia, J. ICNet for Real-Time Semantic Segmentation on High-Resolution Images. Lect. Notes Comput. Sci. 2018, 11207, 418–434. [Google Scholar]
- Xie, D.; Cheng, D.; Hao, W.; Chao, L.; Tao, D. Semantic Adversarial Network with Multi-Scale Pyramid Attention for Video Classification. Proc. AAAI Conf. Artif. Intell. 2019, 33, 9030–9037. [Google Scholar] [CrossRef]
- Zhang, H.; Dana, K.; Shi, J.; Zhang, Z.; Wang, X.; Tyagi, A. Context Encoding for Semantic Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Li, H.C.; Xiong, P.F.; An, J. Pyramid Attention Network for Semantic Segmentation. arXiv 2018, arXiv:1805.10180. [Google Scholar]
- Roy, A.G.; Nav, A.N.; Wachinger, C. Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks. Lect. Notes Comput. Sci. 2018, 11070, 421–429. [Google Scholar]
- Zhu, Y.; Zhao, C.; Guo, H. Attention Couplenet: Fully convolutional attention coupling network for object detection. IEEE Trans. Image Process. 2018, 28, 113–126. [Google Scholar] [CrossRef]
- Bo, Z.; Xiao, W.; Feng, J.S. Diversified Visual Attention Networks for Fine-Grained Object Classification. IEEE Trans. Multimed. 2017, 19, 1245–1256. [Google Scholar]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Learning a Discriminative Feature Network for Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Jie, H.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar]
- Oktay, O.; Schlemper, J.; Folgoc, L.L. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1804.03999. [Google Scholar]
- Howard, A.; Zhmoginov, A.; Chen, L.C. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. arXiv 2018, arXiv:1801.04381. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
- Guo, C.; Szemenyei, M.; Pei, Y.; Yi, Y.; Zhou, W. SD-Unet: A Structured Dropout U-Net for Retinal Vessel Segmentation. In Proceedings of the IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE), Athens, Greece, 28–30 October 2019. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. DropBlock: A regularization method for convolutional networks. arXiv 2018, arXiv:1810.12890. [Google Scholar]
- Guo, C.L.; Szemenyei, M.; Yi, Y.; Xue, Y.; Zhou, W.; Li, Y. Dense Residual Network for Retinal Vessel Segmentation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020. [Google Scholar]
- Fisher, Y.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. In Proceedings of the ICLR, San Juan, Puerto Rico, 2–4 May 2016; pp. 1–9. [Google Scholar]
- Chen, L.; Zhang, H.W.; Xiao, J. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 6298–6306. [Google Scholar]
Hardware Environment | Software Environment | ||
---|---|---|---|
CPU | Intel(R) Xeon(R) Bronze 3204 CPU 1.90 GHz | Operating System | Windows 10 |
Graphics Card | NVIDIA GeForce RTX 3090 | Python | 3.6.5 |
Memory | 128 GB | Opencv | 3.4.2 |
Framework | Pytorch-gpu-1.8.1 |
Related Parameter | Value | Meaning |
---|---|---|
Batch size | 16 | Number of pictures per training |
Learning rate | 0.00008 | Initial learning rate |
Epoch | 100 | Training iteration times |
CUDA | Enable | Computer unified device architecture |
CUDNN | Enable | A GPU acceleration library for deep neural networks |
Models | F1 | Precision | Recall | Accuracy |
---|---|---|---|---|
UNet | 0.8769 | 0.8971 | 0.8823 | 0.8875 |
DA-Net | 0.9023 | 0.9035 | 0.8973 | 0.9082 |
DFA-Net | 0.9004 | 0.9103 | 0.9186 | 0.9156 |
PFE-UNet (Ours) | 0.9328 | 0.9418 | 0.9386 | 0.9423 |
Backbone | PFE | SA | CA | ACC |
---|---|---|---|---|
√ | 0.8932 | |||
√ | √ | 0.9156 | ||
√ | √ | √ | 0.9328 | |
√ | √ | √ | √ | 0.9423 |
Method | Training Time (s) | Prediction Time (s) |
---|---|---|
UNet | 3390.82 | 1.49 |
Attention UNet | 3931.44 | 1.62 |
DA-Net | 4027.65 | 2.01 |
DFA-Net | 3756.23 | 1.76 |
PFE-UNet (Ours) | 3667.15 | 1.58 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, B.; Mu, H.; Gao, M.; Ni, H.; Chen, J.; Yang, H.; Qi, D. A Novel Multi-Scale Attention PFE-UNet for Forest Image Segmentation. Forests 2021, 12, 937. https://doi.org/10.3390/f12070937
Zhang B, Mu H, Gao M, Ni H, Chen J, Yang H, Qi D. A Novel Multi-Scale Attention PFE-UNet for Forest Image Segmentation. Forests. 2021; 12(7):937. https://doi.org/10.3390/f12070937
Chicago/Turabian StyleZhang, Boyang, Hongbo Mu, Mingyu Gao, Haiming Ni, Jianfeng Chen, Hong Yang, and Dawei Qi. 2021. "A Novel Multi-Scale Attention PFE-UNet for Forest Image Segmentation" Forests 12, no. 7: 937. https://doi.org/10.3390/f12070937