Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

TINYCD: a (not so) deep learning model for change detection

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In this paper, we present a lightweight and effective change detection model, called TinyCD. This model has been designed to be faster and smaller than current state-of-the-art change detection models due to industrial needs. Despite being from 13 to 140 times smaller than the compared change detection models, and exposing at least a third of the computational complexity, our model outperforms the current state-of-the-art models by at least \(1\%\) on both F1-score and IoU on the LEVIR-CD dataset, and more than \(8\%\) on the WHU-CD dataset. To reach these results, TinyCD uses a Siamese U-Net architecture exploiting low-level features in a globally temporal and locally spatial way. In addition, it adopts a new strategy to mix features in the space-time domain both to merge the embeddings obtained from the Siamese backbones, and, coupled with an MLP block, it forms a novel space-semantic attention mechanism, the Mix and Attention Mask Block (MAMB).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Find the latest articles, discoveries, and news in related topics.

Data availability

The datasets analyzed during the current study, and all the source code to reproduce our results, are available in the GitHub repository https://github.com/AndreaCodegoni/Tiny_model_4_CD.

Notes

  1. Notice that the backbone is evaluated twice in Siamese architectures.

  2. Both the adopted datasets have been obtained from https://github.com/wgcban/SemiCD in an already pre-processed version.

  3. The whole dataset depicts the city of Christchurch, in New Zealand. The crop, aimed to be used in CD tasks, is a sub-area acquired in two different times.

  4. To make our results reproducible, we fixed the random seed at the beginning of each experiment.

  5. We employed the sigmoid activation on this output.

  6. In fact, if we initialize all of our 2-depth kernels with the ”central” weights to 1 and \(-1\), and all the rest to 0, we have the standard subtraction.

  7. The parentheses are highlighting the size of each kernel and the number of kernels.

References

  1. Singh A (1989) Review article digital change detection techniques using remotely-sensed data. Int J Remote Sens 10(6):989–1003

    Article  Google Scholar 

  2. Shafique A, Cao G, Khan Z, Asad M, Aslam M (2022) Deep learning-based change detection in remote sensing images: a review. Remote Sens 14(4):871

    Article  Google Scholar 

  3. Bai T, Wang L, Yin D, Sun K, Chen Y, Li W, Li D (2022) Deep learning for change detection in remote sensing: a review. Geo-spat Inf Sci. https://doi.org/10.1080/10095020.2022.2085633

    Article  Google Scholar 

  4. Chen H, Shi Z (2020) A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens 12(10):1662

    Article  Google Scholar 

  5. De Bem PP, de Carvalho Junior OA, Fontes Guimarães R, Trancoso Gomes RA (2020) Change detection of deforestation in the brazilian amazon using landsat data and convolutional neural networks. Remote Sens 12(6):901

    Article  Google Scholar 

  6. Viña A, Echavarria FR, Rundquist DC (2004) Satellite change detection analysis of deforestation rates and patterns along the colombia–ecuador border. AMBIO: J Hum Environ 33(3):118–125

    Article  Google Scholar 

  7. Xu JZ, Lu W, Li Z, Khaitan P, Zaytseva V (2019) Building damage detection in satellite imagery using convolutional neural networks. arXiv preprint arXiv:1910.06444

  8. Zhang C, Yue P, Tapete D, Jiang L, Shangguan B, Huang L, Liu G (2020) A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J Photogramm Remote Sens 166:183–200

    Article  Google Scholar 

  9. Ji S, Wei S, Lu M (2018) Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans Geosci Remote Sens 57(1):574–586

    Article  Google Scholar 

  10. Varghese A, Gubbi J, Ramaswamy A, Balamuralidhar P (2018) Changenet: a deep learning architecture for visual change detection. In: European conference on computer vision, pp. 129–145

  11. Khelifi L, Mignotte M (2020) Deep learning for change detection in remote sensing images: comprehensive review and meta-analysis. IEEE Access 8:126385–126400

    Article  Google Scholar 

  12. Daudt RC, Le Saux B, Boulch A (2018) Fully convolutional siamese networks for change detection. In: IEEE international conference on image processing, pp. 4063–4067

  13. Zhang M, Xu G, Chen K, Yan M, Sun X (2018) Triplet-based semantic relation learning for aerial remote sensing image change detection. IEEE Geosci Remote Sens Lett 16(2):266–270

    Article  Google Scholar 

  14. Liu Y, Pang C, Zhan Z, Zhang X, Yang X (2020) Building change detection for remote sensing images using a dual-task constrained deep siamese convolutional network model. IEEE Geosci Remote Sens Lett 18(5):811–815

    Article  Google Scholar 

  15. Peng D, Zhang Y, Guan H (2019) End-to-end change detection for high resolution satellite images using improved unet++. Remote Sens 11(11):1382

    Article  Google Scholar 

  16. Jiang H, Hu X, Li K, Zhang J, Gong J, Zhang M (2020) Pga-siamnet: pyramid feature-based attention-guided siamese network for remote sensing orthoimagery building change detection. Remote Sens 12(3):484

    Article  Google Scholar 

  17. Chen J, Yuan Z, Peng J, Chen L, Huang H, Zhu J, Liu Y, Li H (2020) Dasnet: Dual attentive fully convolutional siamese networks for change detection in high-resolution satellite images. IEEE J Sel Top Appl Earth Obs Remote Sens 14:1194–1206

    Article  Google Scholar 

  18. Chen H, Qi Z, Shi Z (2021) Remote sensing image change detection with transformers. IEEE Trans Geosci Remote Sens 60:1–14

    Article  Google Scholar 

  19. Bandara WGC, Patel VM (2022) A transformer-based siamese network for change detection. In: IEEE international geoscience and remote sensing symposium, pp. 207–210

  20. Chen S, Yang K, Stiefelhagen R (2021) Dr-tanet: dynamic receptive temporal attention network for street scene change detection. In: IEEE intelligent vehicles symposium, pp. 502–509

  21. Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. Proc IEEE Conf Comput Vis Pattern Recognit 1:539–546

    Google Scholar 

  22. Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4353–4361

  23. Stent S, Gherardi R, Stenger B, Cipolla R (2015) Detecting change for multi-view, long-term surface inspection. In: Proceedings of the British machine vision conference, pp. 127–112712

  24. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440

  25. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 234–241

  26. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, pp. 850–865

  27. Ioannidou A, Chatzilari E, Nikolopoulos S, Kompatsiaris I (2017) Deep learning advances in computer vision with 3d data: a survey. ACM Comput Surv CSUR 50(2):1–38

    Google Scholar 

  28. Chu Y, Cao G, Hayat H (2016) Change detection of remote sensing image based on deep neural networks. In: International conference on artificial intelligence and industrial engineering, pp. 262–267

  29. Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R (1993) Signature verification using a “siamese’’ time delay neural network. Adv Neural Inform Process Syst 6:737–744

    Google Scholar 

  30. Lebedev M, Vizilter YV, Vygolov O, Knyaz V, Rubis AY (2018) Change detection in remote sensing images using conditional adversarial networks. Int Arch Photogramm, Remote Sens & Spat Inform Sci 42(2):565–571

    Article  Google Scholar 

  31. Zhao W, Chen X, Ge X, Chen J (2020) Using adversarial network for multiple change detection in bitemporal remote sensing imagery. IEEE Geosci Remote Sens Lett 99:1–5

    Google Scholar 

  32. Peng X, Zhong R, Li Z, Li Q (2020) Optical remote sensing image change detection based on attention mechanism and image difference. IEEE Trans Geosci Remote Sens 59(9):7296–7307

    Article  Google Scholar 

  33. Bao T, Fu C, Fang T, Huo H (2020) Ppcnet: a combined patch-level and pixel-level end-to-end deep network for high-resolution remote sensing image change detection. IEEE Geosci Remote Sens Lett 17(10):1797–1801

    Article  Google Scholar 

  34. Hou B, Liu Q, Wang H, Wang Y (2019) From w-net to cdgan: bitemporal change detection via deep learning techniques. IEEE Trans Geosci Remote Sens 58(3):1790–1802

    Article  Google Scholar 

  35. Zhan Y, Fu K, Yan M, Sun X, Wang H, Qiu X (2017) Change detection based on deep siamese convolutional network for optical aerial images. IEEE Geosci Remote Sens Lett 14(10):1845–1849

    Article  Google Scholar 

  36. Fang B, Pan L, Kou R (2019) Dual learning-based siamese framework for change detection using bi-temporal vhr optical remote sensing images. Remote Sens 11(11):1292

    Article  Google Scholar 

  37. Chen H, Li W, Shi Z (2021) Adversarial instance augmentation for building change detection in remote sensing images. IEEE Trans Geosci Remote Sens 60:1–16

    Article  Google Scholar 

  38. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

  39. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, pp. 1–14

  40. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  41. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803

  42. Liu T, Yang L, Lunga D (2021) Change detection using deep learning approach with object-based image analysis. Remote Sens Environ 256:112308

    Article  Google Scholar 

  43. Wu B, Xu C, Dai X, Wan A, Zhang P, Yan Z, Tomizuka M, Gonzalez J, Keutzer K, Vajda P (2020) Visual transformers: token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677

  44. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30:6000–6010

    Google Scholar 

  45. Chen S, Xie E, Chongjian G, Chen R, Liang D, Luo P (2022) Cyclemlp: a mlp-like architecture for dense prediction. In: International conference on learning representations. Oral Presentation

  46. Lian D, Yu Z, Sun X, Gao S (2022) As-mlp: an axial shifted mlp architecture for vision. In: International conference on learning representations. Poster presentation

  47. Zhang J, Yang K, Ma C, Reiß S, Peng K, Stiefelhagen R (2022) Bending reality: distortion-aware transformers for adapting to panoramic semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16917–16927

  48. Tolstikhin IO, Houlsby N, Kolesnikov A, Beyer L, Zhai X, Unterthiner T, Yung J, Steiner A, Keysers D, Uszkoreit J (2021) Mlp-mixer: an all-mlp architecture for vision. Adv Neural Inform Process Syst 34:24261–24272

    Google Scholar 

  49. Touvron H, Bojanowski P, Caron M, Cord M, El-Nouby A, Grave E, Izacard G, Joulin A, Synnaeve G, Verbeek J et al (2022) Resmlp: feedforward networks for image classification with data-efficient training. In: IEEE transactions on pattern analysis and machine intelligence, early access, pp 1–9

  50. Liu H, Dai Z, So D, Le QV (2021) Pay attention to mlps. Adv Neural Inform Process Syst 34:9204–9215

    Google Scholar 

  51. Yu T, Li X, Cai Y, Sun M, Li P (2022) S2-mlp: spatial-shift mlp architecture for vision. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 297–306

  52. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1026–1034

  53. Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022

  54. Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A field guide to dynamical recurrent neural networks, pp 237–244

  55. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural comput 9(8):1735–1780

    Article  Google Scholar 

  56. Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning, pp. 6105–6114

  57. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 248–255

  58. Liu N, Han J, Yang M-H (2018) Picanet: Learning pixel-wise contextual attention for saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3089–3098 (2018)

  59. Lin M, Chen Q, Yan S (2013) Network in network. In: International conference on learning representations. arXiv preprint arXiv:1312.4400

  60. Sifre L, Mallat S (2014) Rigid-motion scattering for texture classification. Comput Sci 3559:501–515

    Google Scholar 

  61. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258

  62. Bandara WGC, Patel VM (2022) Revisiting consistency regularization for semi-supervised change detection in remote sensing images. arXiv preprint arXiv:2204.08454

  63. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inform Process Syst 32:8026–8037

    Google Scholar 

  64. Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101

  65. Microsoft: neural network intelligence (2021). https://github.com/microsoft/nni

  66. Loshchilov I, Hutter F (2017) SGDR: stochastic gradient descent with warm restarts. In: International conference on learning representations. Poster presentation

  67. Buslaev A, Iglovikov VI, Khvedchenya E, Parinov A, Druzhinin M, Kalinin AA (2020) Albumentations: fast and flexible image augmentations. Information 11(2):125

    Article  Google Scholar 

  68. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  69. Sankararaman KA, De S, Xu Z, Huang WR, Goldstein T (2020) The impact of neural network overparameterization on gradient confusion and stochastic gradient descent. In: International conference on machine learning, pp. 8469–8479

  70. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022

  71. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations. Oral presentation

Download references

Acknowledgements

The authors want to thank the whole ARGO Vision team, Professor Stefano Gualandi, Gabriele Loli and Gennaro Auricchio for the useful discussion and comments. We also want to thank all those who have provided their codes in an accessible and reproducible way. We want to thank all the anonymous reviewers for their comments and suggestions that help us to improve our work. The Ph.D. scholarship of Andrea Codegoni is founded by SEA Vision.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Codegoni.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1 Backbones comparison

In this Appendix we report the results obtained by varying the backbone adopted in the model. In each backbone we select only few initial blocks in order to work with features that are not very complex and not excessively aggregated from a spatial point of view. We decide to select the all the initial blocks up to the first having spatial resolution 32\(\times\)32. Due to the different compositions of the considered networks, the final size of the models changes it the range starting from a minimum of 32k parameters up to 1.3 M.

As we can see from Table 9, the results obtained are stable from the performances point of view. The backbones of the EfficientNet family appear to be, in accordance with the experiments carried out on our proprietary dataset, those that achieve the best performances. However, the other backbone types also produce comparable results making our approach:

  • robust with respect to the backbone used;

  • flexible with respect to the required size and computational complexity.

In this comparison we have not considered Transformers-type backbones such as [70, 71]. The reason for this choice lies in the fact that the philosophy of the Transformers is a global philosophy, as opposed to the blocks we propose which are instead local. As mentioned in Sect. 7, an integration of these two philosophies will be the subject of future works.

Table 9 Comparison of different backbones on LEVIR-CD dataset

Appendix 2 Hyperparameters tuning

In this Appendix we report the details about the hyperparameters tuning experiments. One of the advantages of using limited computational complexity models is being able to fine-tune hyperparameters using relatively few computational resources and in a reasonable time from an industrial point of view. In our experiments we tune the learning rate, the weight decay, and the usage of the amsgrad strategy. The framework used to run the experiments and optimize the hyperparameters is NNI [65].

Since we execute only 100 epochs per run, we chose a higher learning rate range \((10^{-3},4\cdot 10^{-3})\), in order to explore whether a higher than standard learning rate leads to faster model convergence. As for the weight decay, we follow a conservative choice by setting the range between \(10^{-2}\) and \(8\cdot 10^{-3}\). We also test other simple loss functions for model training such as Mean Square Error (MSE), Intersection over Union (IoU) and a combination of IoU and BCE.

In Fig. 5 we show the various combinations of hyperparameters explored in a batch of 30 experiments, and the relative performances on the LEVIR-CD validation set. Analyzing the results, we note that BCE and MSE, regardless of the other parameters, obtain superior performance compared to the IoU. In addition, the BCE + IoU combination, although better than IoU, also scores lower than the BCE and MSE. Regarding the other hyperparameters, as can be seen in particular from Fig. 6, our model obtains robust performances with respect to all the tested combinations. Finally, we note that in the conducted experiments, BCE has lower variance in terms of F1-score with respect to the choices of the other hyperparameters. This represents another motivation for us to chose BCE as loss function.

Fig. 5
figure 5

Different combination of parameters and their impact on the F1-score on the LEVIR-CD dataset

Fig. 6
figure 6

Behavior of the final F1-score in the different experiments conducted to tune the hyperparameters. The drop in the F1-score is due to the use of IoU as loss function

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Codegoni, A., Lombardi, G. & Ferrari, A. TINYCD: a (not so) deep learning model for change detection. Neural Comput & Applic 35, 8471–8486 (2023). https://doi.org/10.1007/s00521-022-08122-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-08122-3

Keywords