research-article

DSGA: Distractor-Suppressing Graph Attention for Multi-object Tracking

Authors:

Yuhua TangAuthors Info & Claims

ICRAI '22: Proceedings of the 8th International Conference on Robotics and Artificial Intelligence

Pages 69 - 76

https://doi.org/10.1145/3573910.3573916

Published: 20 January 2023 Publication History

Abstract

Multiple object tracking (MOT) methods based on single object tracking are of great interest because of their ability to balance efficiency and performance on the strength of the localization capability of single-target tracking. However, most of the single object tracking methods only distinguish foreground and background. They are susceptible to the influence of similar interfering objects during localization, while in multiple object tracking scenarios, there are more interfering objects and the influence is more severe. Therefore, we propose a Distractor-Suppressing Graph Attention (DSGA) to learn more discriminative attention by reducing the influence of distractors on learning attention weight features. Furthermore, DSGA is embedded into the basic MOT framework “SiamMOT” formed as DSGA-SiamMOT and applied to multiple object tracking to verify its effectiveness. We conduct experiments on the MOT Challenge benchmark with "public detection", and obtain MOTA 66.65%, IDF1 62.2% accuracy on the MOT17 dataset with 14fps.

References

[1]

LEE, M.-K., PYO, J.-W., BAE, S.-H., JOO, S.-H., AND KUC, T.-Y. Traffic light recognition for autonomous driving vehicle: Using mono camera and its. Journal of Image and Graphics 10, 3 (2022), 102–108.

[2]

GIRSHICK, R. B. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015 (2015), IEEE Computer Society, pp. 1440–1448.

[3]

REN, S., HE, K., GIRSHICK, R. B., AND SUN, J. Faster R-CNN: towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada (2015), C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds., pp. 91–99.

[4]

REDMON, J., DIVVALA, S. K., GIRSHICK, R. B., AND FARHADI, A. You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016 (2016), IEEE Computer Society, pp. 779–788.

[5]

TIAN, Z., SHEN, C., CHEN, H., AND HE, T. FCOS: fully convolutional one-stage object detection. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019 (2019), IEEE, pp. 9626–9635.

[6]

WOJKE, N., BEWLEY, A., AND PAULUS, D. Simple online and realtime tracking with a deep association metric. In 2017 IEEE International Conference on Image Processing, ICIP 2017, Beijing, China, September 17-20, 2017 (2017), IEEE, pp. 3645–3649.

Digital Library

[7]

YANG, F., CHANG, X., SAKTI, S., WU, Y., AND NAKAMURA, S. Remot: A model-agnostic refinement for multiple object tracking. Image Vis. Comput. 106 (2021), 104091.

[8]

SHUAI, B., BERNESHAWI, A. G., LI, X., MODOLO, D., AND TIGHE, J. Siammot: Siamese multi-object tracking. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 (2021), Computer Vision Foundation / IEEE, pp. 12372–12382.

[9]

YIN, J., WANG, W., MENG, Q., YANG, R., AND SHEN, J. A unified object motion and affinity model for online multi-object tracking. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020 (2020), Computer Vision Foundation / IEEE, pp. 6767–6776.

[10]

BERTINETTO, L., VALMADRE, J., HENRIQUES, J. F., VEDALDI, A., AND TORR, P. H. S. Fully-convolutional siamese networks for object tracking. In Computer Vision - ECCV 2016 Workshops - Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II (2016), G. Hua and H. Jégou, Eds., vol. 9914 of Lecture Notes in Computer Science, pp. 850–865.

[11]

DANELLJAN, M., BHAT, G., KHAN, F. S., AND FELSBERG, M. ECO: efficient convolution operators for tracking. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 (2017), IEEE Computer Society, pp. 6931–6939.

[12]

ZHU, Z., WANG, Q., LI, B., WU, W., YAN, J., AND HU, W. Distractor-aware siamese networks for visual object tracking. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IX (2018), V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., vol. 11213 of Lecture Notes in Computer Science, Springer, pp. 103–119.

[13]

GUO, D., SHAO, Y., CUI, Y., WANG, Z., ZHANG, L., AND SHEN, C. Graph attention tracking. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 (2021), Computer Vision Foundation / IEEE, pp. 9543–9552.

[14]

MILAN, A., LEAL-TAIXÉ, L., REID, I. D., ROTH, S., AND SCHINDLER, K. MOT16: A benchmark for multi-object tracking. CoRR abs/1603.00831 (2016).

[15]

DENDORFER, P., REZATOFIGHI, H., MILAN, A., SHI, J., CREMERS, D., REID, I. D., ROTH, S., SCHINDLER, K., AND LEAL-TAIXÉ, L. MOT20: A benchmark for multi object tracking in crowded scenes. CoRR abs/2003.09003 (2020).

[16]

BEWLEY, A., GE, Z., OTT, L., RAMOS, F. T., AND UPCROFT, B. Simple online and realtime tracking. In 2016 IEEE International Conference on Image Processing, ICIP 2016, Phoenix, AZ, USA, September 25-28, 2016 (2016), IEEE, pp. 3464–3468.

[17]

BERGMANN, P., MEINHARDT, T., AND LEAL-TAIXÉ, L. Tracking without bells and whistles. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019 (2019), IEEE, pp. 941–951.

[18]

HE, J., HUANG, Z., WANG, N., AND ZHANG, Z. Learnable graph matching: Incorporating graph partitioning with deep feature learning for multiple object tracking. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 (2021), Computer Vision Foundation / IEEE, pp. 5299–5309.

[19]

LIANG, T., LAN, L., ZHANG, X., PENG, X., AND LUO, Z. Enhancing the association in multi-object tracking via neighbor graph. Int. J. Intell. Syst. 36, 11 (2021), 6713–6730.

Digital Library

[20]

ZHENG, L., TANG, M., CHEN, Y., ZHU, G., WANG, J., AND LU, H. Improving multiple object tracking with single object tracking. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 (2021), Computer Vision Foundation / IEEE, pp. 2453–2462.

[21]

ZHU, J., YANG, H., LIU, N., KIM, M., ZHANG, W., AND YANG, M. Online multi-object tracking with dual matching attention networks. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part V (2018), V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., vol. 11209 of Lecture Notes in Computer Science, Springer, pp. 379–396.

Digital Library

[22]

LI, B., YAN, J., WU, W., ZHU, Z., AND HU, X. High performance visual tracking with siamese region proposal network. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018 (2018), Computer Vision Foundation / IEEE Computer Society, pp. 8971–8980.

[23]

ZHOU, X., KOLTUN, V., AND KRÄHENBÜHL, P. Tracking objects as points. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IV (2020), A. Vedaldi, H. Bischof, T. Brox, and J. Frahm, Eds., vol. 12349 of Lecture Notes in Computer Science, Springer, pp. 474–490.

[24]

LIANG, T., LAN, L., ZHANG, X., AND LUO, Z. A generic MOT boosting framework by combining cues from sot, tracklet and re-identification. Knowl. Inf. Syst. 63, 8 (2021), 2109–2127.

Digital Library

[25]

CHU, Q., OUYANG, W., LI, H., WANG, X., LIU, B., AND YU, N. Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017 (2017), IEEE Computer Society, pp. 4846–4855.

[26]

DOSOVITSKIY, A., BEYER, L., KOLESNIKOV, A., WEISSENBORN, D., ZHAI, X., UNTERTHINER, T., DEHGHANI, M., MINDERER, M., HEIGOLD, G., GELLY, S., USZKOREIT, J., AND HOULSBY, N. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 (2021), OpenReview.net.

[27]

CARION, N., MASSA, F., SYNNAEVE, G., USUNIER, N., KIRILLOV, A., AND ZAGORUYKO, S. End-to-end object detection with transformers. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I (2020), A. Vedaldi, H. Bischof, T. Brox, and J. Frahm, Eds., vol. 12346 of Lecture Notes in Computer Science, Springer, pp. 213–229.

Digital Library

[28]

SUN, P., JIANG, Y., ZHANG, R., XIE, E., CAO, J., HU, X., KONG, T., YUAN, Z., WANG, C., AND LUO, P. Transtrack: Multiple-object tracking with transformer. CoRR abs/2012.15460 (2020).

[29]

XU, Y., BAN, Y., DELORME, G., GAN, C., RUS, D., AND ALAMEDA-PINEDA, X. Transcenter: Transformers with dense queries for multiple-object tracking. CoRR abs/2103.15145 (2021).

[30]

CUI, Y., JIANG, C., WANG, L., AND WU, G. Target transformed regression for accurate tracking. CoRR abs/2104.00403 (2021).

[31]

XING, D., EVANGELIOU, N., TSOUKALAS, A., AND TZES, A. Siamese transformer pyramid networks for real-time UAV tracking. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, January 3-8, 2022 (2022), IEEE, pp. 1898–1907.

[32]

GUO, D., WANG, J., CUI, Y., WANG, Z., AND CHEN, S. Siamcar: Siamese fully convolutional classification and regression for visual tracking. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020 (2020), Computer Vision Foundation / IEEE, pp. 6268–6276.

[33]

XU, Y., BAN, Y., ALAMEDA-PINEDA, X., AND HORAUD, R. Deepmot: A differentiable framework for training multiple object trackers. CoRR abs/1906.06618 (2019).

[34]

GUO, S., WANG, J., WANG, X., AND TAO, D. Online multiple object tracking with cross-task synergy. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 (2021), Computer Vision Foundation / IEEE, pp. 8136–8145.

[35]

STADLER, D., AND BEYERER, J. Improving multiple pedestrian tracking by track management and occlusion handling. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 (2021), Computer Vision Foundation / IEEE, pp. 10958–10967.

[36]

CHU, P., AND LING, H. Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019 (2019), IEEE, pp. 6171–6180.

[37]

FENG, W., HU, Z., WU, W., YAN, J., AND OUYANG, W. Multi-object tracking with multiple cues and switcher-aware classification. CoRR abs/1901.06129 (2019).

Cited By

Zhang YLiang YElazab AWang ZWang C(2023)Graph Attention Networks and Track Management for Multiple Object TrackingElectronics10.3390/electronics1219407912:19(4079)Online publication date: 28-Sep-2023
https://doi.org/10.3390/electronics12194079

Index Terms

DSGA: Distractor-Suppressing Graph Attention for Multi-object Tracking
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking
      2. Computer vision tasks
        Scene understanding

Index terms have been assigned to the content through auto-classification.

Recommendations

Siamese Network for Underwater Multiple Object Tracking
ICMLC '17: Proceedings of the 9th International Conference on Machine Learning and Computing

For underwater videos, the performance of object tracking is greatly affected by illumination changes, background disturbances and occlusion. Hence, there is a need to have a robust function that computes image similarity, to accurately track the moving ...
Robust object tracking via multi-cue fusion

A long-term object tracking method based on calibrated binocular cameras by fusing information of the two channels and binocular geometry constraints is proposed.The stereo filter which is built based on the epipolar geometry of the binocular cameras is ...
Multi-object detection and tracking by stereo vision

This paper presents a new stereo vision-based model for multi-object detection and tracking in surveillance systems. Unlike most existing monocular camera-based systems, a stereo vision system is constructed in our model to overcome the problems of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICRAI '22: Proceedings of the 8th International Conference on Robotics and Artificial Intelligence

November 2022

89 pages

ISBN:9781450397544

DOI:10.1145/3573910

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 January 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National University of Defense Technology Foundation
National Natural Science Foundation of China

Conference

ICRAI 2022

ICRAI 2022: 2022 8th International Conference on Robotics and Artificial Intelligence

November 18 - 20, 2022

Singapore, Singapore

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
25
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YLiang YElazab AWang ZWang C(2023)Graph Attention Networks and Track Management for Multiple Object TrackingElectronics10.3390/electronics1219407912:19(4079)Online publication date: 28-Sep-2023
https://doi.org/10.3390/electronics12194079

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents