Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

DeFusion: Aerial Image Matching Based on Fusion of Handcrafted and Deep Features

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1968))

Included in the following conference series:

  • 859 Accesses

Abstract

Machine vision has become a crucial method for drones to perceive their surroundings, and image matching, as a fundamental task in machine vision, has also gained widespread attention. However, due to the complexity of aerial images, traditional matching methods based on handcrafted features lack the ability to extract high-level semantics and unavoidably suffer from low robustness. Although deep learning has potential to improve matching accuracy, it comes with the high cost of requiring specific samples and computing resources, making it infeasible for many scenarios. To fully leverage the strengths of both approaches, we introduce DeFusion, a novel image matching scheme with a fine-grained decision-level fusion algorithm that effectively combines handcrafted and deep features. We train generic features on public datasets, enabling us to handle unseen scenarios. We use RootSIFT as prior knowledge to guide the extraction of deep features, significantly reducing computational overhead. We also carefully design preprocessing steps by incorporating drone attitude information. Eventually, as evidenced by our experimental results, the proposed scheme achieves an overall 2.5–6x more correct matches with improved robustness when compared to existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Our code is publicly available on Github: https://github.com/songxf1024/DeFusion.

References

  1. Sharma, M., Singh, H., Singh, S., Gupta, A., Goyal, S., Kakkar, R.: A novel approach of object detection using point feature matching technique for colored images. In: Singh, P.K., Kar, A.K., Singh, Y., Kolekar, M.H., Tanwar, S. (eds.) Proceedings of ICRIC 2019. LNEE, vol. 597, pp. 561–576. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29407-6_40

    Chapter  Google Scholar 

  2. Rashid, M., Khan, M.A., Sharif, M., Raza, M., Sarfraz, M.M., Afza, F.: Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and sift point features. Multimedia Tools Appl. 78(12), 15751–15777 (2019)

    Google Scholar 

  3. Jiayi, M., Huabing, Z., Ji, Z., Yuan, G., Junjun, J., Jinwen, T.: Robust feature matching for remote sensing image registration via locally linear transforming. IEEE Trans. Geosci. Remote Sens. 53(12), 6469–6481 (2015)

    Google Scholar 

  4. Ravi, C., Gowda, R.M.: Development of image stitching using feature detection and feature matching techniques. In: 2020 IEEE International Conference for Innovation in Technology (INOCON), pp. 1–7. IEEE (2020)

    Google Scholar 

  5. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: YOLOv4: Optimal speed and accuracy of object detection. CoRR, abs/2004.10934 (2020)

    Google Scholar 

  6. O’Mahony, N., et al.: Deep learning vs. traditional computer vision. In: Science and information conference, pp. 128–144. Springer (2019)

    Google Scholar 

  7. Tian, Y., Laguna, A.B., Ng, T., Balntas, V., Mikolajczyk, K.: HyNet: learning local descriptor with hybrid similarity measure and triplet loss. Adv. Neural Inf. Process. Syst. 33, 7401–7412 (2020)

    Google Scholar 

  8. Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2911–2918. IEEE (2012)

    Google Scholar 

  9. Pérez-Lorenzo, J., Vázquez-Martín, R., Marfil, R., Bandera, A., Sandoval, F.: Image Matching Based on Curvilinear Regions. na (2007)

    Google Scholar 

  10. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE (1999)

    Google Scholar 

  11. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32

    Chapter  Google Scholar 

  12. Calonder, M., Lepetit, V., Strecha, C., Brief, F.P.: Binary robust independent elementary features. In: Proceedings of the European Conference on Computer Vision, pp. 778–792

    Google Scholar 

  13. Rublee, E., Rabaud, V., Konolige, K., Orb, G.B.: An efficient alternative to sift or surf. In: Proceedings of International Conference on Computer Vision, pp. 2564–2571

    Google Scholar 

  14. Alcantarilla, P.F., Bartoli, A., Davison, A.J.: KAZE features. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 214–227. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_16

    Chapter  Google Scholar 

  15. Efe, U., Ince, K.G., Alatan, A.A.: Effect of parameter optimization on classical and learning-based image matching methods. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2506–2513 (2021)

    Google Scholar 

  16. Verdie, Y., Yi, K., Fua, P., Lepetit, V.: TILDE: a temporally invariant learned detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5279–5288 (2015)

    Google Scholar 

  17. Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28

    Chapter  Google Scholar 

  18. Tian, Y., Fan, B., Wu, F.: L2-Net: deep learning of discriminative patch descriptor in Euclidean space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 661–669 (2017)

    Google Scholar 

  19. Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  20. Luo, Z., et al.: GeoDesc: learning local descriptors by integrating geometry constraints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 168–183 (2018)

    Google Scholar 

  21. Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., Balntas, V.: SOSNet: second order similarity regularization for local descriptor learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11016–11025 (2019)

    Google Scholar 

  22. Liang, Z., Yi, Y., Qi, T.: SIFT Meets CNN: a decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), 1224–1244 (2017)

    Google Scholar 

  23. Barroso-Laguna, A., Riba, E., Ponsa, D., Mikolajczyk, K.: Key. net: keypoint detection by handcrafted and learned CNN filters. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5836–5844 (2019)

    Google Scholar 

  24. Tianyu, Z., Zhenjiang, M., Jianhu, Z.: Combining CNN with hand-crafted features for image classification. In: 2018 14th IEEE International Conference on Signal Processing (ICSP), pp. 554–557. IEEE (2018)

    Google Scholar 

  25. Rodríguez, M., Facciolo, G., von Gioi, R.G., Musé, P., Morel, J.-M., Delon, J.: SIFT-AID: boosting sift with an affine invariant descriptor based on convolutional neural networks. In 2019 IEEE International Conference on Image Processing (ICIP), pp. 4225–4229. IEEE (2019)

    Google Scholar 

  26. Song, Y., Zhengyu, X., Xinwei, W., Yingquan, Z.: MS-YOLO: object detection based on yolov5 optimized fusion millimeter-wave radar and machine vision. IEEE Sens. J. 22(15), 15435–15447 (2022)

    Google Scholar 

  27. Yu, G., Jean-Michel, M.: ASIFT: an algorithm for fully affine invariant comparison. Image Process. Line 1, 11–38 (2011)

    Google Scholar 

  28. Morel, J.-M., Guoshen, Yu.: ASIFT: a new framework for fully affine invariant image comparison. SIAM J. Img. Sci. 2(2), 438–469 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  29. Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)

    Google Scholar 

  30. Zhou, D., Hou, Q., Chen, Y., Feng, J., Yan, S.: Rethinking bottleneck structure for efficient mobile network design. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 680–697. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_40

    Chapter  Google Scholar 

  31. Winder, S.A.J., Brown, M.: Learning local image descriptors. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)

    Google Scholar 

  32. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Google Scholar 

  33. Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: British Machine Vision Conference (BMVC), vol. 1, pp. 3 (2016)

    Google Scholar 

  34. He, K., Lu, Y., Sclaroff, S.: Local descriptors optimized for average precision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 596–605 (2018)

    Google Scholar 

  35. Kim, J., Jung, W., Kim, H., Lee, J.: CyCNN: a rotation invariant CNN using polar mapping and cylindrical convolution layers. arXiv preprint arXiv:2007.10588 (2020)

  36. Gunatilaka, A.H., Baertlein, B.A.: Feature-level and decision-level fusion of noncoincidently sampled sensors for land mine detection. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 577–589 (2001)

    Article  Google Scholar 

  37. Chum, O., Werner, T., Matas, J.: Two-view geometry estimation unaffected by a dominant plane. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 772–779. IEEE (2005)

    Google Scholar 

  38. Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

Download references

Acknowledgment

This research was supported in part by the South China University of Technology Research Start-up Fund No. X2WD/K3200890, as well as partly by the Guangzhou Huangpu District International Research Collaboration Fund No. 2022GH13. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the sponsoring agencies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Zou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Song, X., Zou, Y., Shi, Z., Yang, Y., Li, D. (2024). DeFusion: Aerial Image Matching Based on Fusion of Handcrafted and Deep Features. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1968. Springer, Singapore. https://doi.org/10.1007/978-981-99-8181-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8181-6_25

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8180-9

  • Online ISBN: 978-981-99-8181-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics