TINYCD: a (not so) deep learning model for change detection

Codegoni, Andrea; Lombardi, Gabriele; Ferrari, Alessandro

doi:10.1007/s00521-022-08122-3

TINYCD: a (not so) deep learning model for change detection

Original Article
Published: 18 December 2022

Volume 35, pages 8471–8486, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

834 Accesses
25 Citations
2 Altmetric
Explore all metrics

Abstract

In this paper, we present a lightweight and effective change detection model, called TinyCD. This model has been designed to be faster and smaller than current state-of-the-art change detection models due to industrial needs. Despite being from 13 to 140 times smaller than the compared change detection models, and exposing at least a third of the computational complexity, our model outperforms the current state-of-the-art models by at least $1\%$ on both F1-score and IoU on the LEVIR-CD dataset, and more than $8\%$ on the WHU-CD dataset. To reach these results, TinyCD uses a Siamese U-Net architecture exploiting low-level features in a globally temporal and locally spatial way. In addition, it adopts a new strategy to mix features in the space-time domain both to merge the embeddings obtained from the Siamese backbones, and, coupled with an MLP block, it forms a novel space-semantic attention mechanism, the Mix and Attention Mask Block (MAMB).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HQFS: High-Quality Feature Selection for Accurate Change Detection

Dual-Branch Enhanced Network for Change Detection

Article 15 November 2021

ActionFormer: Localizing Moments of Actions with Transformers

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Data availability

The datasets analyzed during the current study, and all the source code to reproduce our results, are available in the GitHub repository https://github.com/AndreaCodegoni/Tiny_model_4_CD.

Notes

Notice that the backbone is evaluated twice in Siamese architectures.
Both the adopted datasets have been obtained from https://github.com/wgcban/SemiCD in an already pre-processed version.
The whole dataset depicts the city of Christchurch, in New Zealand. The crop, aimed to be used in CD tasks, is a sub-area acquired in two different times.
To make our results reproducible, we fixed the random seed at the beginning of each experiment.
We employed the sigmoid activation on this output.
In fact, if we initialize all of our 2-depth kernels with the ”central” weights to 1 and $-1$, and all the rest to 0, we have the standard subtraction.
The parentheses are highlighting the size of each kernel and the number of kernels.

References

Singh A (1989) Review article digital change detection techniques using remotely-sensed data. Int J Remote Sens 10(6):989–1003
Article Google Scholar
Shafique A, Cao G, Khan Z, Asad M, Aslam M (2022) Deep learning-based change detection in remote sensing images: a review. Remote Sens 14(4):871
Article Google Scholar
Bai T, Wang L, Yin D, Sun K, Chen Y, Li W, Li D (2022) Deep learning for change detection in remote sensing: a review. Geo-spat Inf Sci. https://doi.org/10.1080/10095020.2022.2085633
Article Google Scholar
Chen H, Shi Z (2020) A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens 12(10):1662
Article Google Scholar
De Bem PP, de Carvalho Junior OA, Fontes Guimarães R, Trancoso Gomes RA (2020) Change detection of deforestation in the brazilian amazon using landsat data and convolutional neural networks. Remote Sens 12(6):901
Article Google Scholar
Viña A, Echavarria FR, Rundquist DC (2004) Satellite change detection analysis of deforestation rates and patterns along the colombia–ecuador border. AMBIO: J Hum Environ 33(3):118–125
Article Google Scholar
Xu JZ, Lu W, Li Z, Khaitan P, Zaytseva V (2019) Building damage detection in satellite imagery using convolutional neural networks. arXiv preprint arXiv:1910.06444
Zhang C, Yue P, Tapete D, Jiang L, Shangguan B, Huang L, Liu G (2020) A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J Photogramm Remote Sens 166:183–200
Article Google Scholar
Ji S, Wei S, Lu M (2018) Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans Geosci Remote Sens 57(1):574–586
Article Google Scholar
Varghese A, Gubbi J, Ramaswamy A, Balamuralidhar P (2018) Changenet: a deep learning architecture for visual change detection. In: European conference on computer vision, pp. 129–145
Khelifi L, Mignotte M (2020) Deep learning for change detection in remote sensing images: comprehensive review and meta-analysis. IEEE Access 8:126385–126400
Article Google Scholar
Daudt RC, Le Saux B, Boulch A (2018) Fully convolutional siamese networks for change detection. In: IEEE international conference on image processing, pp. 4063–4067
Zhang M, Xu G, Chen K, Yan M, Sun X (2018) Triplet-based semantic relation learning for aerial remote sensing image change detection. IEEE Geosci Remote Sens Lett 16(2):266–270
Article Google Scholar
Liu Y, Pang C, Zhan Z, Zhang X, Yang X (2020) Building change detection for remote sensing images using a dual-task constrained deep siamese convolutional network model. IEEE Geosci Remote Sens Lett 18(5):811–815
Article Google Scholar
Peng D, Zhang Y, Guan H (2019) End-to-end change detection for high resolution satellite images using improved unet++. Remote Sens 11(11):1382
Article Google Scholar
Jiang H, Hu X, Li K, Zhang J, Gong J, Zhang M (2020) Pga-siamnet: pyramid feature-based attention-guided siamese network for remote sensing orthoimagery building change detection. Remote Sens 12(3):484
Article Google Scholar
Chen J, Yuan Z, Peng J, Chen L, Huang H, Zhu J, Liu Y, Li H (2020) Dasnet: Dual attentive fully convolutional siamese networks for change detection in high-resolution satellite images. IEEE J Sel Top Appl Earth Obs Remote Sens 14:1194–1206
Article Google Scholar
Chen H, Qi Z, Shi Z (2021) Remote sensing image change detection with transformers. IEEE Trans Geosci Remote Sens 60:1–14
Article Google Scholar
Bandara WGC, Patel VM (2022) A transformer-based siamese network for change detection. In: IEEE international geoscience and remote sensing symposium, pp. 207–210
Chen S, Yang K, Stiefelhagen R (2021) Dr-tanet: dynamic receptive temporal attention network for street scene change detection. In: IEEE intelligent vehicles symposium, pp. 502–509
Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. Proc IEEE Conf Comput Vis Pattern Recognit 1:539–546
Google Scholar
Zagoruyko S, Komodakis N (2015) Learning to compare image patches via convolutional neural networks. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4353–4361
Stent S, Gherardi R, Stenger B, Cipolla R (2015) Detecting change for multi-view, long-term surface inspection. In: Proceedings of the British machine vision conference, pp. 127–112712
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 234–241
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, pp. 850–865
Ioannidou A, Chatzilari E, Nikolopoulos S, Kompatsiaris I (2017) Deep learning advances in computer vision with 3d data: a survey. ACM Comput Surv CSUR 50(2):1–38
Google Scholar
Chu Y, Cao G, Hayat H (2016) Change detection of remote sensing image based on deep neural networks. In: International conference on artificial intelligence and industrial engineering, pp. 262–267
Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R (1993) Signature verification using a “siamese’’ time delay neural network. Adv Neural Inform Process Syst 6:737–744
Google Scholar
Lebedev M, Vizilter YV, Vygolov O, Knyaz V, Rubis AY (2018) Change detection in remote sensing images using conditional adversarial networks. Int Arch Photogramm, Remote Sens & Spat Inform Sci 42(2):565–571
Article Google Scholar
Zhao W, Chen X, Ge X, Chen J (2020) Using adversarial network for multiple change detection in bitemporal remote sensing imagery. IEEE Geosci Remote Sens Lett 99:1–5
Google Scholar
Peng X, Zhong R, Li Z, Li Q (2020) Optical remote sensing image change detection based on attention mechanism and image difference. IEEE Trans Geosci Remote Sens 59(9):7296–7307
Article Google Scholar
Bao T, Fu C, Fang T, Huo H (2020) Ppcnet: a combined patch-level and pixel-level end-to-end deep network for high-resolution remote sensing image change detection. IEEE Geosci Remote Sens Lett 17(10):1797–1801
Article Google Scholar
Hou B, Liu Q, Wang H, Wang Y (2019) From w-net to cdgan: bitemporal change detection via deep learning techniques. IEEE Trans Geosci Remote Sens 58(3):1790–1802
Article Google Scholar
Zhan Y, Fu K, Yan M, Sun X, Wang H, Qiu X (2017) Change detection based on deep siamese convolutional network for optical aerial images. IEEE Geosci Remote Sens Lett 14(10):1845–1849
Article Google Scholar
Fang B, Pan L, Kou R (2019) Dual learning-based siamese framework for change detection using bi-temporal vhr optical remote sensing images. Remote Sens 11(11):1292
Article Google Scholar
Chen H, Li W, Shi Z (2021) Adversarial instance augmentation for building change detection in remote sensing images. IEEE Trans Geosci Remote Sens 60:1–16
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, pp. 1–14
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803
Liu T, Yang L, Lunga D (2021) Change detection using deep learning approach with object-based image analysis. Remote Sens Environ 256:112308
Article Google Scholar
Wu B, Xu C, Dai X, Wan A, Zhang P, Yan Z, Tomizuka M, Gonzalez J, Keutzer K, Vajda P (2020) Visual transformers: token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30:6000–6010
Google Scholar
Chen S, Xie E, Chongjian G, Chen R, Liang D, Luo P (2022) Cyclemlp: a mlp-like architecture for dense prediction. In: International conference on learning representations. Oral Presentation
Lian D, Yu Z, Sun X, Gao S (2022) As-mlp: an axial shifted mlp architecture for vision. In: International conference on learning representations. Poster presentation
Zhang J, Yang K, Ma C, Reiß S, Peng K, Stiefelhagen R (2022) Bending reality: distortion-aware transformers for adapting to panoramic semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16917–16927
Tolstikhin IO, Houlsby N, Kolesnikov A, Beyer L, Zhai X, Unterthiner T, Yung J, Steiner A, Keysers D, Uszkoreit J (2021) Mlp-mixer: an all-mlp architecture for vision. Adv Neural Inform Process Syst 34:24261–24272
Google Scholar
Touvron H, Bojanowski P, Caron M, Cord M, El-Nouby A, Grave E, Izacard G, Joulin A, Synnaeve G, Verbeek J et al (2022) Resmlp: feedforward networks for image classification with data-efficient training. In: IEEE transactions on pattern analysis and machine intelligence, early access, pp 1–9
Liu H, Dai Z, So D, Le QV (2021) Pay attention to mlps. Adv Neural Inform Process Syst 34:9204–9215
Google Scholar
Yu T, Li X, Cai Y, Sun M, Li P (2022) S2-mlp: spatial-shift mlp architecture for vision. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 297–306
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1026–1034
Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022
Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A field guide to dynamical recurrent neural networks, pp 237–244
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural comput 9(8):1735–1780
Article Google Scholar
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning, pp. 6105–6114
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 248–255
Liu N, Han J, Yang M-H (2018) Picanet: Learning pixel-wise contextual attention for saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3089–3098 (2018)
Lin M, Chen Q, Yan S (2013) Network in network. In: International conference on learning representations. arXiv preprint arXiv:1312.4400
Sifre L, Mallat S (2014) Rigid-motion scattering for texture classification. Comput Sci 3559:501–515
Google Scholar
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258
Bandara WGC, Patel VM (2022) Revisiting consistency regularization for semi-supervised change detection in remote sensing images. arXiv preprint arXiv:2204.08454
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inform Process Syst 32:8026–8037
Google Scholar
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101
Microsoft: neural network intelligence (2021). https://github.com/microsoft/nni
Loshchilov I, Hutter F (2017) SGDR: stochastic gradient descent with warm restarts. In: International conference on learning representations. Poster presentation
Buslaev A, Iglovikov VI, Khvedchenya E, Parinov A, Druzhinin M, Kalinin AA (2020) Albumentations: fast and flexible image augmentations. Information 11(2):125
Article Google Scholar
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Sankararaman KA, De S, Xu Z, Huang WR, Goldstein T (2020) The impact of neural network overparameterization on gradient confusion and stochastic gradient descent. In: International conference on machine learning, pp. 8469–8479
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations. Oral presentation

Download references

Acknowledgements

The authors want to thank the whole ARGO Vision team, Professor Stefano Gualandi, Gabriele Loli and Gennaro Auricchio for the useful discussion and comments. We also want to thank all those who have provided their codes in an accessible and reproducible way. We want to thank all the anonymous reviewers for their comments and suggestions that help us to improve our work. The Ph.D. scholarship of Andrea Codegoni is founded by SEA Vision.

Author information

Authors and Affiliations

Department of Mathematics, University of Pavia, Pavia, Italy
Andrea Codegoni
ARGO Vision, Milano, Italy
Gabriele Lombardi & Alessandro Ferrari

Authors

Andrea Codegoni
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele Lombardi
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Ferrari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Codegoni.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1 Backbones comparison

In this Appendix we report the results obtained by varying the backbone adopted in the model. In each backbone we select only few initial blocks in order to work with features that are not very complex and not excessively aggregated from a spatial point of view. We decide to select the all the initial blocks up to the first having spatial resolution 32$\times$32. Due to the different compositions of the considered networks, the final size of the models changes it the range starting from a minimum of 32k parameters up to 1.3 M.

As we can see from Table 9, the results obtained are stable from the performances point of view. The backbones of the EfficientNet family appear to be, in accordance with the experiments carried out on our proprietary dataset, those that achieve the best performances. However, the other backbone types also produce comparable results making our approach:

robust with respect to the backbone used;
flexible with respect to the required size and computational complexity.

In this comparison we have not considered Transformers-type backbones such as [70, 71]. The reason for this choice lies in the fact that the philosophy of the Transformers is a global philosophy, as opposed to the blocks we propose which are instead local. As mentioned in Sect. 7, an integration of these two philosophies will be the subject of future works.

Table 9 Comparison of different backbones on LEVIR-CD dataset

Full size table

Appendix 2 Hyperparameters tuning

In this Appendix we report the details about the hyperparameters tuning experiments. One of the advantages of using limited computational complexity models is being able to fine-tune hyperparameters using relatively few computational resources and in a reasonable time from an industrial point of view. In our experiments we tune the learning rate, the weight decay, and the usage of the amsgrad strategy. The framework used to run the experiments and optimize the hyperparameters is NNI [65].

Since we execute only 100 epochs per run, we chose a higher learning rate range $(10^{-3},4\cdot 10^{-3})$, in order to explore whether a higher than standard learning rate leads to faster model convergence. As for the weight decay, we follow a conservative choice by setting the range between $10^{-2}$ and $8\cdot 10^{-3}$. We also test other simple loss functions for model training such as Mean Square Error (MSE), Intersection over Union (IoU) and a combination of IoU and BCE.

In Fig. 5 we show the various combinations of hyperparameters explored in a batch of 30 experiments, and the relative performances on the LEVIR-CD validation set. Analyzing the results, we note that BCE and MSE, regardless of the other parameters, obtain superior performance compared to the IoU. In addition, the BCE + IoU combination, although better than IoU, also scores lower than the BCE and MSE. Regarding the other hyperparameters, as can be seen in particular from Fig. 6, our model obtains robust performances with respect to all the tested combinations. Finally, we note that in the conducted experiments, BCE has lower variance in terms of F1-score with respect to the choices of the other hyperparameters. This represents another motivation for us to chose BCE as loss function.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Codegoni, A., Lombardi, G. & Ferrari, A. TINYCD: a (not so) deep learning model for change detection. Neural Comput & Applic 35, 8471–8486 (2023). https://doi.org/10.1007/s00521-022-08122-3

Download citation

Received: 16 August 2022
Accepted: 24 November 2022
Published: 18 December 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00521-022-08122-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TINYCD: a (not so) deep learning model for change detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

HQFS: High-Quality Feature Selection for Accurate Change Detection

Dual-Branch Enhanced Network for Change Detection

ActionFormer: Localizing Moments of Actions with Transformers

Data availability

Notes

References

Acknowledgements