research-article

Foreground Gating and Background Refining Network for Surveillance Object Detection

Authors:

Xian-Sheng HuaAuthors Info & Claims

IEEE Transactions on Image Processing, Volume 28, Issue 12

Pages 6077 - 6090

https://doi.org/10.1109/TIP.2019.2922095

Published: 01 December 2019 Publication History

Abstract

Detecting objects in surveillance videos is an important problem due to its wide applications in traffic control and public security. Existing methods tend to face performance degradation because of false positive or misalignment problems. We propose a novel framework, namely, Foreground Gating and Background Refining Network (FG-BR Net), for surveillance object detection (SOD). To reduce false positives in background regions, which is a critical problem in SOD, we introduce a new module that first subtracts the background of a video sequence and then generates high-quality region proposals. Unlike previous background subtraction methods that may wrongly remove the static foreground objects in a frame, a feedback connection from detection results to background subtraction process is proposed in our model to distill both static and moving objects in surveillance videos. Furthermore, we introduce another module, namely, the background refining stage, to refine the detection results with more accurate localizations. Pairwise non-local operations are adopted to cope with the misalignments between the features of original and background frames. Extensive experiments on real-world traffic surveillance benchmarks demonstrate the competitive performance of the proposed FG-BR Net. In particular, FG-BR Net ranks on the top among all the methods on hard and sunny subsets of the UA-DETRAC detection dataset, without any bells and whistles.

References

[1]

J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “High-speed tracking with kernelized correlation filters,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp. 583–596, Mar. 2015.

Digital Library

[2]

H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking,” in Proc. CVPR, Jun. 2016, pp. 4293–4302.

[3]

L. Zhang, T. Xiang, and S. Gong, “Learning a discriminative Null space for person re-identification,” in Proc. CVPR, Jun. 2016, pp. 1239–1248.

[4]

W. Li, R. Zhao, T. Xiao, and X. Wang, “DeepReID: Deep filter pairing neural network for person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 152–159.

[5]

K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2961–2969.

[6]

S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, “Learning rich features from RGB-D images for object detection and segmentation,” in Proc. Eur. Conf. Comput. Vis., Sep. 2014, pp. 345–360.

[7]

G. Gkioxari, R. Girshick, P. Dollár, and K. He, “Detecting and recognizing human-object interactions,” in Proc. CVPR, Jun. 2018, pp. 8359–8367.

[8]

G. Gkioxari, B. Hariharan, R. Girshick, and J. Malik, “R-CNNs for pose estimation and action detection,” Jun. 2014, arXiv:1406.5212. [Online]. Available: https://arxiv.org/abs/1406.5212

[9]

W. Liuet al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf. Comput. Vis., Sep. 2016, pp. 21–37.

[10]

J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proc. CVPR, Jul. 2017, pp. 6517–6525.

[11]

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. NIPS, 2015, pp. 91–99.

[12]

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. CVPR, Jun. 2014, pp. 580–587.

[13]

Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos, “A unified multi-scale deep convolutional neural network for fast object detection,” in Proc. Eur. Conf. Comput. Vis., Sep. 2016, pp. 354–370.

[14]

Z. Fuet al., “Previewer for multi-scale object detector,” in Proc. 26th ACM Int. Conf. Multimedia, Oct. 2018, pp. 265–273.

[15]

J. Daiet al., “Deformable convolutional networks,” Jun. 2017, arXiv:1703.06211. [Online]. Available: https://arxiv.org/abs/1703.06211

[16]

L. Li, W. Huang, I. Y.-H. Gu, and Q. Tian, “Statistical modeling of complex backgrounds for foreground object detection,” IEEE Trans. Image Process., vol. 13, no. 11, pp. 1459–1472, Nov. 2004.

[17]

C. Zhan, X. Duan, S. Xu, Z. Song, and M. Luo, “An improved moving object detection algorithm based on frame difference and edge detection,” in Proc. 4th Int. Conf. Image Graph., Aug. 2007, pp. 519–523.

[18]

C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models for real-time tracking,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jun. 1999, pp. 246–252.

[19]

J. He, L. Balzano, and A. Szlam, “Incremental gradient on the Grassmannian for online foreground and background separation in subsampled video,” in Proc. CVPR, Jun. 2012, pp. 1568–1575.

[20]

A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in Proc. CVPR, Jun. 2012, pp. 3354–3361.

[21]

L. Wenet al., “UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking,” Nov. 2015, arXiv:1511.04136. [Online]. Available: https://arxiv.org/abs/1511.04136

[22]

H. Yong, D. Meng, W. Zuo, and L. Zhang, “Robust online matrix factorization for dynamic background subtraction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 7, pp. 1726–1740, Jul. 2017.

[23]

X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. CVPR, Jun. 2018, pp. 7794–7803.

[24]

A. Vaswaniet al., “Attention is all you need,” in Proc. NIPS, 2017, pp. 5998–6008, pp. 5998–6008.

[25]

Z. Fuet al., “Foreground gated network for surveillance object detection,” in Proc. IEEE 4th Int. Conf. Multimedia Big Data, Sep. 2018, pp. 1–7.

[26]

Y. Khandhediya, K. Sav, and V. Gajjar, “Human detection for night surveillance using adaptive background subtracted image,” Sep. 2017, arXiv:1709.09389. [Online]. Available: https://arxiv.org/abs/1709.09389

[27]

D. Kang, J. Emmons, F. Abuzaid, P. Bailis, and M. Zaharia, “NoScope: Optimizing neural network queries over video at scale,” Proc. VLDB Endowment, vol. 10, no. 11, pp. 1586–1597, Aug. 2017.

Digital Library

[28]

T. Bouwmans, “Traditional and recent approaches in background modeling for foreground detection: An overview,” Comput. Sci. Rev., vol. 11, pp. 31–66, May 2014.

[29]

L. Maddalena and A. Petrosino, “Background subtraction for moving object detection in RGBD data: A survey,” J. Imag., vol. 4, no. 5, p. 71, May 2018.

[30]

T. Bouwmans and B. Garcia-Garcia, “Background subtraction in real applications: Challenges, current models and future directions,” 2019, arXiv:1901.03577. [Online]. Available: https://arxiv.org/abs/1901.03577

[31]

N. J. B. McFarlane and C. P. Schofield, “Segmentation and tracking of piglets in images,” Brit. Mach. Vis. Appl., vol. 8, no. 3, pp. 187–193, 1995.

[32]

B. Lee and M. Hedley, “Background estimation for video surveillance,” in Image and Vision Computing New Zealand. 2002, pp. 315–320.

[33]

N. Vaswani, T. Bouwmans, S. Javed, and P. Narayanamurthy, “Robust subspace learning: Robust PCA, robust subspace tracking, and robust subspace recovery,” IEEE Signal Process. Mag., vol. 35, no. 4, pp. 32–55, Jul. 2018.

[34]

T. Bouwmans, S. Javed, H. Zhang, Z. Lin, and R. Otazo, “On the applications of robust PCA in image and video processing,” Proc. IEEE, vol. 106, no. 8, pp. 1427–1457, Aug. 2018.

[35]

P. Rodriguez and B. Wohlberg, “Incremental principal component pursuit for video background modeling,” J. Math. Imag. Vis., vol. 55, no. 1, pp. 1–18, 2016.

Digital Library

[36]

H. Guo, C. Qiu, and N. Vaswani, “Practical ReProCS for separating sparse and low-dimensional signal sequences from their sum—Part 1,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., May 2014, pp. 4161–4165.

[37]

H. Guo, N. Vaswani, and C. Qiu, “Practical ReProCS for separating sparse and low-dimensional signal sequences from their sum—Part 2,” in Proc. IEEE Global Conf. Signal Inf. Process., Dec. 2014, pp. 369–373.

[38]

P. Narayanamurthy and N. Vaswani, “A fast and memory-efficient algorithm for robust PCA (MEROP),” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Apr. 2018, pp. 4684–4688.

[39]

R. Chartrand and W. Yin, “Iteratively reweighted algorithms for compressive sensing,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Mar./Apr. 2008, pp. 3869–3872.

[40]

A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in Proc. CVPR, vol. 2, Jun. 2005, pp. 60–65.

[41]

D. Glasner, S. Bagon, and M. Irani, “Super-resolution from a single image,” in Proc. ICCV, Sep./Oct. 2009, pp. 349–356.

[42]

J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Non-local sparse models for image restoration,” in Proc. ICCV, Sep./Oct. 2009, pp. 2272–2279.

[43]

J. Dai, K. He, and J. Sun, “Convolutional feature masking for joint object and stuff segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 3992–4000.

[44]

F. Wanget al., “Residual attention network for image classification,” in Proc. CVPR, Jul. 2017, pp. 6450–6458.

[45]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. CVPR, Jun. 2016, pp. 770–778.

[46]

T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. CVPR, Jul. 2017, pp. 936–944.

[47]

T. Kong, F. Sun, A. Yao, H. Liu, M. Lu, and Y. Chen, “RON: Reverse connection with objectness prior networks for object detection,” in Proc. CVPR, Jul. 2017, pp. 5244–5252.

[48]

T.-Y. Linet al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis. Berlin, Germany: Springer, 2014, pp. 740–755.

[49]

M. J. Shafiee, P. Siva, P. Fieguth, and A. Wong, “Embedded motion detection via neural response mixture background modeling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, Jun./Jul. 2016, pp. 837–844.

[50]

J. Dou, Q. Qin, and Z. Tu, “Background subtraction based on deep convolutional neural networks features,” Multimedia Tools Appl., vol. 78, no. 11, pp. 14549–14571, 2019.

Digital Library

[51]

T. Bouwmans, S. Javed, M. Sultana, and S. K. Jung, “Deep neural network concepts for background subtraction: A systematic review and comparative evaluation,” Nov. 2018, arXiv:1811.05255. [Online]. Available: https://arxiv.org/abs/1811.05255

[52]

A. Paszkeet al., “Automatic differentiation in pytorch,” in Proc. NIPS-W, 2017, pp. 1–4.

[53]

P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627–1645, Sep. 2010.

Digital Library

[54]

P. Dollár, R. Appel, S. Belongie, and P. Perona, “Fast feature pyramids for object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 8, pp. 1532–1545, Aug. 2014.

Digital Library

[55]

Z. Cai, M. Saberian, and N. Vasconcelos, “Learning complexity-aware cascades for deep pedestrian detection,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 3361–3369.

[56]

S. Amin and F. Galasso, “Geometric proposals for faster R-CNN,” in Proc. 14th IEEE Int. Conf. Adv. Video Signal Based Surveill., Aug./Sep. 2017, pp. 1–6.

[57]

L. Wang, Y. Lu, H. Wang, Y. Zheng, H. Ye, and X. Xue, “Evolving boxes for fast vehicle detection,” in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2017, pp. 1135–1140.

[58]

S. Lyuet al., “UA-DETRAC 2017: Report of AVSS2017 & IWT4S challenge on advanced traffic monitoring,” in Proc. 14th IEEE Int. Conf. Adv. Video Signal Based Surveill., Aug./Sep. 2017, pp. 1–7.

[59]

J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective search for object recognition,” Int. J. Comput. Vis., vol. 104, no. 2, pp. 154–171, Apr. 2013.

Digital Library

[60]

R. Girshick, “Fast R-CNN,” in Proc. ICCV, Dec. 2015, pp. 1440–1448.

[61]

M. Casares, S. Velipasalar, and A. Pinto, “Light-weight salient foreground detection for embedded smart cameras,” Comput. Vis. Image Understand., vol. 114, no. 11, pp. 1223–1237, 2010.

Digital Library

[62]

M. Casares and S. Velipasalar, “Adaptive methodologies for energy-efficient object detection and tracking with battery-powered embedded smart cameras,” IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 10, pp. 1438–1452, Oct. 2011.

Digital Library

[63]

S. Apewokin, B. Valentine, J. Choi, L. Wills, and S. Wills, “Real-time adaptive background modeling for multicore embedded systems,” J. Signal Process. Syst., vol. 62, no. 1, pp. 65–76, Jan. 2011.

Digital Library

[64]

Y. Wang, S. Velipasalar, and M. Casares, “Cooperative object tracking and composite event detection with wireless embedded smart cameras,” IEEE Trans. Image Process., vol. 19, no. 10, pp. 2614–2633, Oct. 2010.

Digital Library

Cited By

Liu KFu ZJin SChen ZZhou FJiang RChen YYe J(2025)ESOD: Efficient Small Object Detection on High-Resolution ImagesIEEE Transactions on Image Processing10.1109/TIP.2024.350185334(183-195)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TIP.2024.3501853
An SPark SKim GBaek JLee BKim SWooldridge MDy JNatarajan S(2024)Context enhanced transformer for single image object detection in video dataProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i2.27825(682-690)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i2.27825
Dong GZhao CPan XBasu A(2024)Learning Temporal Distribution and Spatial Correlation Toward Universal Moving Object SegmentationIEEE Transactions on Image Processing10.1109/TIP.2024.337847333(2447-2461)Online publication date: 22-Mar-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3378473
Show More Cited By

Index Terms

Foreground Gating and Background Refining Network for Surveillance Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Interest point and salient region detections
        Video segmentation
      2. Computer vision tasks
        Scene understanding
        Video summarization

Index terms have been assigned to the content through auto-classification.

Recommendations

A Hybrid Background Subtraction Method with Background and Foreground Candidates Detection

Background subtraction for motion detection is often used in video surveillance systems. However, difficulties in bootstrapping restrict its development. This article proposes a novel hybrid background subtraction technique to solve this problem. For ...
Background Modeling and Foreground Detection for Video Surveillance
Detection of foreground in dynamic scene via two-step background subtraction

Various computer vision applications such as video surveillance and gait analysis have to perform human detection. This is usually done via background modeling and subtraction. It is a challenging problem when the image sequence captures the human ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing

IEEE Transactions on Image Processing Volume 28, Issue 12

Dec. 2019

202 pages

ISSN:1057-7149

Issue’s Table of Contents

1057-7149 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 December 2019

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu KFu ZJin SChen ZZhou FJiang RChen YYe J(2025)ESOD: Efficient Small Object Detection on High-Resolution ImagesIEEE Transactions on Image Processing10.1109/TIP.2024.350185334(183-195)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TIP.2024.3501853
An SPark SKim GBaek JLee BKim SWooldridge MDy JNatarajan S(2024)Context enhanced transformer for single image object detection in video dataProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i2.27825(682-690)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i2.27825
Dong GZhao CPan XBasu A(2024)Learning Temporal Distribution and Spatial Correlation Toward Universal Moving Object SegmentationIEEE Transactions on Image Processing10.1109/TIP.2024.337847333(2447-2461)Online publication date: 22-Mar-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3378473
Braga AAcchetta ESergio Rodrigues P(2023)Scale Invariant low frame rate trackingExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.119366215:COnline publication date: 15-Feb-2023
https://dl.acm.org/doi/10.1016/j.eswa.2022.119366
Tang ZHuang J(2022)Harmonious Multi-branch Network for Person Re-identification with Harder Triplet LossACM Transactions on Multimedia Computing, Communications, and Applications10.1145/350140518:4(1-21)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3501405
Mandal MVipparthi S(2022)An Empirical Review of Deep Learning Frameworks for Change Detection: Model Design, Experimental Frameworks, Challenges and Research NeedsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2021.307788323:7(6101-6122)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1109/TITS.2021.3077883
Yu TChen CZhou YHu X(2022)Improving Surveillance Object Detection with Adaptive Omni-Attention over Both Inter-frame and Intra-frame ContextComputer Vision – ACCV 202210.1007/978-3-031-26284-5_14(222-237)Online publication date: 4-Dec-2022
https://dl.acm.org/doi/10.1007/978-3-031-26284-5_14
Li CHuang FYang ZWang ZBan D(2022)Target Detection Algorithm Based on Feature Optimization and Sample EqualizationWireless Algorithms, Systems, and Applications10.1007/978-3-031-19214-2_28(343-355)Online publication date: 24-Nov-2022
https://dl.acm.org/doi/10.1007/978-3-031-19214-2_28
Song CCheng XLiu LLi D(2021)ACFIM: Adaptively Cyclic Feature Information-Interaction Model for Object DetectionPattern Recognition and Computer Vision10.1007/978-3-030-88004-0_31(379-391)Online publication date: 29-Oct-2021
https://dl.acm.org/doi/10.1007/978-3-030-88004-0_31
Chang YLi CLi ZWang ZYin G(2021)DAFV: A Unified and Real-Time Framework of Joint Detection and Attributes Recognition for Fast VehiclesWireless Algorithms, Systems, and Applications10.1007/978-3-030-86130-8_28(353-365)Online publication date: 25-Jun-2021
https://dl.acm.org/doi/10.1007/978-3-030-86130-8_28
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents