Abstract
Classifying and segmenting natural disaster images are crucial for predicting and responding to disasters. However, current convolutional networks perform poorly in processing natural disaster images, and there are few proprietary networks for this task. To address the varying scales of the region of interest (ROI) in these images, we propose the Hierarchical TSAM-CB-ViT (HTCViT) network, which builds on the ViT network’s attention mechanism to better process natural disaster images. Considering that ViT excels at extracting global context but struggles with local features, our method combines the strengths of ViT and convolution, and can capture overall contextual information within each patch using the Triple-Strip Attention Mechanism (TSAM) structure. Experiments validate that our HTCViT can improve the classification task with \(3-4 \%\) and the segmentation task with \(1-2 \%\) on natural disaster datasets compared to the vanilla ViT network.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Data will be made available on request.
References
Jonkman, Sebastiaan N., Vrijling, Johannes K.: Loss of life due to floods. J. Flood Risk Manag. 1(1), 43–56 (2008)
Cruden, David: A simple definition of a landslide. Bulletin of the International Association of Engineering Geology - Bulletin de l’Association Internationale de Géologie de l’Ingénieur 43, 27–29 (1991)
Lendering, K.T., Jonkman, S.N., Kok, M.: Effectiveness of emergency measures for flood prevention. J. Flood Risk Manag. 9(4), 320–334 (2016)
Sato, Hiroshi P.: Hasegawa, Hiroyuki, Fujiwara, Satoshi, Tobita, Mikio, Koarai, Mamoru, Une, Hiroshi, Iwahashi, Junko, Interpretation of landslide distribution triggered by the northern pakistan earthquake using spot 5 imagery. Landslides 4(113–122), 2007 (2005)
Bellotti, Fernando, Bianchi, Marco, Colombo, Davide, Ferretti, Alessandro, Tamburini, Andrea: Advanced insar techniques to support landslide monitoring. In: Mathematics of Planet Earth: Proceedings of the 15th Annual Conference of the International Association for Mathematical Geosciences, pages , 287–290 (2013)
geomorphological features and landslide distribution: A Rosi, V Tofani, L Tanteri, C Tacconi Stefanelli, A Agostini, F Catani, and N Casagli. The new landslide inventory of tuscany (italy) updated with ps-insar. Landslides 15, 5–19 (2018)
Ding, Anzi, Zhang, Qingyong, Zhou, Xinmin, Dai, Bicheng: Automatic recognition of landslide based on cnn and texture change detection. In: 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), pages 444–448 (2016)
Balestriero, Randall, Pesenti, Jerome, LeCun, Yann: Learning in high dimension always amounts to extrapolation. arXiv preprint arXiv:2110.09485, 2021
Panboonyuen, Teerapong, Jitkajornwanich, Kulsawasd, Lawawirojwong, Siam, Srestasathiern, Panu, Vateekul, Peerapon: Transformer-based decoder designs for semantic segmentation on remotely sensed images. Remote Sensing 13(24) (2021)
Ma, Zhihao, Yuan, Mengke, Jiaming, Gu., Meng, Weiliang, Shibiao, Xu., Zhang, Xiaopeng: Triple-strip attention mechanism-based natural disaster images classification and segmentation. Visual Comput 38, 3163–3173 (2022)
Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: looking wider to see better. International Conference on Learning Representions (2016)
Lin, Tsung-Yi., Dollár, Piotr, Girshick, Ross, et al.: Feature pyramid networks for object detection.In: CVPR, pages 2117–2125 (2017)
Dosovitskiy, Alexey, Beyer, Lucas, et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Zheng, Sixiao, Jiachen, Lu., Zhao, Hengshuang, et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR, pages 6881–6890 (2021)
Strudel, Robin, Garcia, Ricardo, Laptev, Ivan: and Cordelia Schmid. Segmenter Transformer for semantic segmentation. In: ICCV, pages 7262–7272 (2021)
Carion, Nicolas, Massa, Francisco, Synnaeve, Gabriel, et al.: End-to-end object detection with transformers. In: European conference on computer vision, pages 213–22 (2020)
Jiang, Yifan: Chang, Shiyu, Wang, Zhangyang: Transgan: two pure transformers can make one strong gan, and that can scale up. Adv. Neural Inf. Proc. Syst. 34, 14745–14758 (2021)
Peng, Zhiliang, Huang, Wei, Shanzhi, Gu., et al.: Conformer: local features coupling global representations for visual recognition. In: ICCV, pages 367–376 (2021)
Jie, Hu., Shen, Li., Sun, Gang: squeeze-and-excitation networks. In: CVPR, pages 7132–7141 (2018)
Woo, Sanghyun, Park, Jongchan, Lee, Joon-Young., et al.: Cbam: convolutional block attention module. In: ECCV, pages 3–19 (2018)
Park, Jongchan, Woo, Sanghyun, et al.: Bam: bottleneck attention module. In: British Machine Vision Conference (BMVC) (2018)
Hou, Qibin, Zhou, Daquan, Feng, Jiashi: Coordinate attention for efficient mobile network design. In: CVPR, pages 13713–13722 (2021)
Hou, Qibin, Zhang, Li., Cheng, Ming-Ming., Feng, Jiashi: Strip pooling: rethinking spatial pooling for scene parsing. In: CVPR, pages 4003–4012 (2020)
Wang, Jingdong, Sun, Ke., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2021)
He, Kaiming, Zhang, Xiangyu, et al.: Deep residual learning for image recognition. In: CVPR, pages 770–778 (2016)
Ji, Shunping, Dawen, Yu., Shen, Chaoyong, Li, Weile, Qiang, Xu.: Landslide detection from an open satellite imagery and digital elevation model dataset using attention boosted convolutional neural networks. Landslides 17(6), 1337–1352 (2020)
Barz, Björn, Schröter, Kai, Münch, Moritz, et al.: Enhancing flood impact analysis using interactive retrieval of social media images. arXiv:1908.03361 (2019)
Misra, Diganta, Nalamada, Trikay: Ajay Uppili Arasanipalai, and Qibin Hou. Rotate to attend: convolutional triplet attention module . In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3139–3148 (2021)
Zhou, Bolei, Zhao, Hang, Puig, Xavier, et al.: Scene parsing through ade20k dataset. In: CVPR, pages 633–641 (2017)
Huang, Gao, Liu, Zhuang, Maaten, Laurens Van Der, Weinberger, Kilian Q.: Densely connected convolutional networks. In: CVPR, pages 2261–2269 (2017)
Szegedy, Christian, Liu, Wei, et al.: Going deeper with convolutions. In: CVPR, pages 1–9 (2015)
Szegedy, Christian, Vanhoucke, Vincent, Ioffe, Sergey, Shlens, Jon, Wojna, Zbigniew: Rethinking the inception architecture for computer vision. In: CVPR, pages 2818–2826 (2016)
Howard, Andrew G, Zhu, Menglong, Chen, Bo, et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv e-prints, pages arXiv–1704 (2017)
Sandler, Mark, Howard, Andrew, Zhu, Menglong, et al.: Mobilenetv2: inverted residuals and linear bottlenecks. In: CVPR, pages 4510–4520 (2018)
Zoph, Barret, Vasudevan, Vijay, Shlens, Jonathon, Le, Quoc V, Learning transferable architectures for scalable image recognition. In: CVPR, pages 8697–8710 (2018)
Simonyan, Karen, Zisserman, Andrew, Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (2015)
Chollet, François, Xception: deep learning with depthwise separable convolutions. In: CVPR, pages 1251–1258 (2017)
Tan, Mingxing, Le, Quoc, Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pages 6105–6114 (2019)
Ronneberger, Olaf, Fischer, Philipp, Brox, Thomas, U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, pages 234–241 (2015)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (Nos. U21A20515, 61972459, 62172416, 62102414, U2003109, 62071157, 62171321, and 62162044), in part by Open Research Projects of ZhejiangLab (No.2021KE0AB07) and the Project TC210H00L/42.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file 1 (mp4 131092 KB)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ma, Z., Li, W., Zhang, M. et al. HTCViT: an effective network for image classification and segmentation based on natural disaster datasets. Vis Comput 39, 3285–3297 (2023). https://doi.org/10.1007/s00371-023-02954-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02954-3