research-article

Learning Efficient Transformer Representation for Siamese Tracker to UAV

Authors:

Qiang WangAuthors Info & Claims

ICIAI '23: Proceedings of the 2023 7th International Conference on Innovation in Artificial Intelligence

Pages 135 - 140

https://doi.org/10.1145/3594409.3594436

Published: 26 July 2023 Publication History

Abstract

In the last few years, there has been growing recognition of the vital links between visual tracking and unmanned aerial vehicle (UAV). Questions have been raised about the feasibility of CNN-based and transformer-based trackers on UAVs. However, deep neural network modules or self-attention modules can be adversely affected when employing on UAVs for their complex architecture. In this paper, we propose an efficient transformer tracker (ET2) with a lightweight network. The study set out to examine the usability of a transformer-based tracker. Our tracker performs at frame rates far surpassing real-time, achieving the balance of precision and running speed.

References

[1]

Luca Bertinetto, Jack Valmadre, Joao F Henriques, Andrea Vedaldi, and Philip HS Torr. 2016. Fully-convolutional siamese networks for object tracking. In European conference on computer vision. Springer, 850–865.

[2]

Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2019. Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF international conference on computer vision. 6182–6191.

[3]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213–229.

Digital Library

[4]

Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, and Huchuan Lu. 2021. Transformer Tracking. In CVPR.

[5]

Zedu Chen, Bineng Zhong, Guorong Li, Shengping Zhang, and Rongrong Ji. 2020. Siamese box adaptive network for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6668–6677.

[6]

Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2019. Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4660–4669.

[7]

Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2017. Eco: Efficient convolution operators for tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6638–6646.

[8]

Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In European conference on computer vision. Springer, 472–488.

[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). 4171–4186.

[10]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR (2021).

[11]

Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, and Haibin Ling. 2019. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]

Benjamin Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, and Matthijs Douze. 2021. LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12259–12269.

[13]

Dongyan Guo, Jun Wang, Ying Cui, Zhenhua Wang, and Shengyong Chen. 2020. SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]

Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. 2020. GhostNet: More Features from Cheap Operations. In CVPR.

[15]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]

João F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2014. High-speed tracking with kernelized correlation filters. IEEE transactions on pattern analysis and machine intelligence 37, 3 (2014), 583–596.

[17]

Lianghua Huang, Xin Zhao, and Kaiqi Huang. 2019. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (2019), 1562–1577.

[18]

Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size. arXiv:1602.07360 (2016).

[19]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).

[20]

Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. 2019. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4282–4291.

[21]

Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu. 2018. High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8971–8980.

[22]

Liting Lin, Heng Fan, Yong Xu, and Haibin Ling. 2021. SwinTrack: A Simple and Strong Baseline for Transformer Tracking. arXiv preprint arXiv:2112.00995 (2021).

[23]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.

[24]

Matthias Mueller, Neil Smith, and Bernard Ghanem. 2016. A benchmark and simulator for uav tracking. In European conference on computer vision. Springer, 445–461.

[25]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015).

[26]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y

Digital Library

[27]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.

[28]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[29]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1–9. https://doi.org/10.1109/CVPR.2015.7298594

[30]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems.

[31]

Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li. 2021. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]

Yinda Xu, Zeyu Wang, Zuoxin Li, Ye Yuan, and Gang Yu. 2020. Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12549–12556.

[33]

Bin Yan, Houwen Peng, Jianlong Fu, Dong Wang, and Huchuan Lu. 2021. Learning spatio-temporal transformer for visual tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10448–10457.

[34]

H. Yu, G. Li, W. Zhang, Q. Huang, and N Sebe. 2019. The Unmanned Aerial Vehicle Benchmark: Object Detection, Tracking and Baseline. International Journal of Computer Vision 128, 4 (2019).

[35]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6848–6856.

[36]

Zhipeng Zhang, Houwen Peng, Jianlong Fu, Bing Li, and Weiming Hu. 2020. Ocean: Object-aware anchor-free tracking. In European Conference on Computer Vision. Springer, 771–787.

Digital Library

[37]

Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, Qinghua Hu, and Haibin Ling. 2021. Detection and Tracking Meet Drones Challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1–1. https://doi.org/10.1109/TPAMI.2021.3119563

[38]

Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, and Weiming Hu. 2018. Distractor-aware siamese networks for visual object tracking. In Proceedings of the European conference on computer vision (ECCV). 101–117.

Digital Library

Index Terms

Learning Efficient Transformer Representation for Siamese Tracker to UAV
1. General and reference
  1. Document types
    1. General conference proceedings

Recommendations

Automatic Take Off, Tracking and Landing of a Miniature UAV on a Moving Carrier Vehicle

We present a system consisting of a miniature unmanned aerial vehicle (UAV) and a small carrier vehicle, in which the UAV is capable of autonomously starting from the moving ground vehicle, tracking it at a constant distance and landing on a platform on ...
Multi-Camera Tracking and Mapping for Unmanned Aerial Vehicles in Unstructured Environments

Pose estimation for small unmanned aerial vehicles has made large improvements in recent years, leading to vehicles that use a suite of sensors to navigate and explore various environments. In particular, cameras have become popular due to their low ...
Trajectory and image-based detection and identification of UAV
Abstract
Much more attentions have been attracted to the inspection and prevention of unmanned aerial vehicle (UAV) in the wake of increasing high frequency of security accident. Many factors like the interferences and the small fuselage of UAV pose ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICIAI '23: Proceedings of the 2023 7th International Conference on Innovation in Artificial Intelligence

March 2023

212 pages

ISBN:9781450398398

DOI:10.1145/3594409

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICIAI 2023

ICIAI 2023: 2023 the 7th International Conference on Innovation in Artificial Intelligence

March 3 - 5, 2023

Harbin, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
42
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)3

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents