Abstract
In intelligent transportation systems, real-time detection performance and accuracy are essential metrics. This paper proposes a lightweight real-time detection model, RT-DETRmg, to address the challenges of false and missed detections of small traffic signs and to improve the algorithm's real-time performance. RT-DETRmg enhances the multi-scale feature extraction capability of the RT-DETR backbone network by incorporating a Multiple Scale Sequence Fusion module, which effectively integrates global and local semantic information from different scales of images. Additionally, a cascaded group attention module is utilized within an efficient hybrid encoder to reduce computational complexity, thereby enhancing real-time performance. To further optimize small object detection, a small receptive field feature layer is introduced, while a large receptive field feature layer is removed. Experimental results on the TT100K and GTSDB datasets demonstrate the superiority of RT-DETRmg over existing models. On the TT100K dataset, RT-DETRmg achieves a 2.0% improvement in mean average precision and a 6.6% increase in frames per second compared to the baseline RT-DETR model, while reducing model parameters and computational complexity. On the GTSDB dataset, RT-DETRmg further demonstrates its strong generalization ability, achieving a 2.2% improvement in the F1 score and a 1.7% increase in mean average precision compared to the baseline network. These findings highlight the effectiveness of RT-DETRmg in enhancing both detection accuracy and real-time performance of small traffic signs in diverse scenarios.








Similar content being viewed by others
Data availability
Data are provided within the manuscript or supplementary information files.
References
De La Escalera A, Moreno LE, Salichs MA, Armingol JM (1997) Road traffic sign detection and classification. IEEE Trans Industr Electron 44(6):848–859
Zhu Z, Liang D, Zhang S, Huang X, Li B, Hu S (2016) Traffic-sign detection and classification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2110–2118
Yang Y, Luo H, Xu H, Wu F (2015) Towards real-time traffic sign detection and classification. IEEE Trans Intell Transp Syst 17(7):2022–2031
Zhang J, Huang M, Jin X, Li X (2017) A real-time Chinese traffic sign detection algorithm based on modified YOLOv2. Algorithms 10(4):127
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multi-box detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Neth-erlands, October 11–14, 2016, Proceedings, Part I 14, pp 21–37. Springer
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13, pp 740–755. Springer
Carion N, Massa F, Synnaeve, G, Usunier N, Kirillov A, Zagoruyko S (2020). End-to-end object detection with transformers. In: European Conference on Computer Vision, pp 213–229. Cham: Springer International Publishing
Zhao Y, Lv W, Xu S, Wei J, Wang G, Dang Q et al (2024) Detrs beat yolos on real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16965–16974
Zhang Z, Jiang Y, Jiang J, Wang X, Luo P, Gu J (2021) Star: a structure-aware lightweight transformer for real-time image enhancement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4106–4115
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1440–1448
Kattenborn T, Leitloff J, Schiefer F, Hinz S (2021) Review on convolutional neural networks (CNN) in vegetation remote sensing. ISPRS J Photogram Remote Sens 173:24–49
Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8:1–74
Han K, Xiao A, Wu E, Guo J, Xu C, Wang Y (2021) Transformer in transformer. Adv Neural Inf Process Syst 34:15908–15919
Zhao H, Jiang L, Jia J, Torr P H, Koltun V (2021) Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 16259–16268
Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tao D (2022) A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell 45(1):87–110
Kitaev N, Kaiser Ł, Levskaya A (2020) Reformer: the efficient transformer. arXiv preprint arXiv:2001.04451.
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detec-tion and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:1137
Zhang J, Xie Z, Sun J, Zou X, Wang J (2020) A cascaded r-cnn with multiscale attention and imbalanced samples for traffic sign detection. IEEE Access 8:29742–29754
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv: 1804.02767
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv: 2004.10934
Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W, et al. (2022) Yolov6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv: 2209.02976
Tian Z, Shen C, Chen H, He T (2020) FCOS: a simple and strong anchor-free object detector. IEEE Trans Pattern Anal Mach Intell 44(4):1922–1933
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159
Wang Y, Zhang X, Yang T, Sun J (2022). Anchor detr: query design for transformer-based detector. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, No. 3, pp 2567-2575
Yao Z, Ai J, Li B, Zhang C (2021) Efficient detr: improving end-to-end object detector with dense prior. arXiv preprint arXiv:2104.01318
Liu X, Peng H, Zheng N, Yang Y, Hu H, Yuan Y (2023) Efficientvit: memory efficient vision transformer with cascaded group attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14420–14430
Kang M, Ting CM, Ting FF, Phan RCW (2024) ASF-YOLO: a novel YOLO model with attentional scale sequence fusion for cell instance segmentation. Image Vis Comput 147:105057
Funding
This work was supported by the Clinical research project initiated by researchers from the Sichuan Provincial Health Commission (23LCYJ020), the Special Project for City-College Science and Technology Strategic Cooperation of Nanchong City in 2022 (22SXQT0292), and Sichuan Provincial Key R&D Plan (Major Science and Technology Project) (2022YFS0020).
Author information
Authors and Affiliations
Contributions
Y.W. and J.C. contributed to conceptualization and writing—original draft preparation; B.Y. helped with the methodology; Y.W. assisted with software; Y.C. and R.L. carried out validation; Y.W., J.C., and R.L. conducted investigation; Y.S. and Y.C. were involved in writing—review and editing; and Y.S. and Y.C. were responsible for visualization. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Ethics approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Chen, J., Yang, B. et al. RT-DETRmg: a lightweight real-time detection model for small traffic signs. J Supercomput 81, 307 (2025). https://doi.org/10.1007/s11227-024-06800-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06800-8