Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Foreground Gating and Background Refining Network for Surveillance Object Detection

Published: 01 December 2019 Publication History

Abstract

Detecting objects in surveillance videos is an important problem due to its wide applications in traffic control and public security. Existing methods tend to face performance degradation because of false positive or misalignment problems. We propose a novel framework, namely, Foreground Gating and Background Refining Network (FG-BR Net), for surveillance object detection (SOD). To reduce false positives in background regions, which is a critical problem in SOD, we introduce a new module that first subtracts the background of a video sequence and then generates high-quality region proposals. Unlike previous background subtraction methods that may wrongly remove the static foreground objects in a frame, a feedback connection from detection results to background subtraction process is proposed in our model to distill both static and moving objects in surveillance videos. Furthermore, we introduce another module, namely, the background refining stage, to refine the detection results with more accurate localizations. Pairwise non-local operations are adopted to cope with the misalignments between the features of original and background frames. Extensive experiments on real-world traffic surveillance benchmarks demonstrate the competitive performance of the proposed FG-BR Net. In particular, FG-BR Net ranks on the top among all the methods on hard and sunny subsets of the UA-DETRAC detection dataset, without any bells and whistles.

References

[1]
J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “High-speed tracking with kernelized correlation filters,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp. 583–596, Mar. 2015.
[2]
H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking,” in Proc. CVPR, Jun. 2016, pp. 4293–4302.
[3]
L. Zhang, T. Xiang, and S. Gong, “Learning a discriminative Null space for person re-identification,” in Proc. CVPR, Jun. 2016, pp. 1239–1248.
[4]
W. Li, R. Zhao, T. Xiao, and X. Wang, “DeepReID: Deep filter pairing neural network for person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 152–159.
[5]
K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2961–2969.
[6]
S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, “Learning rich features from RGB-D images for object detection and segmentation,” in Proc. Eur. Conf. Comput. Vis., Sep. 2014, pp. 345–360.
[7]
G. Gkioxari, R. Girshick, P. Dollár, and K. He, “Detecting and recognizing human-object interactions,” in Proc. CVPR, Jun. 2018, pp. 8359–8367.
[8]
G. Gkioxari, B. Hariharan, R. Girshick, and J. Malik, “R-CNNs for pose estimation and action detection,” Jun. 2014, arXiv:1406.5212. [Online]. Available: https://arxiv.org/abs/1406.5212
[9]
W. Liuet al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf. Comput. Vis., Sep. 2016, pp. 21–37.
[10]
J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proc. CVPR, Jul. 2017, pp. 6517–6525.
[11]
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. NIPS, 2015, pp. 91–99.
[12]
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. CVPR, Jun. 2014, pp. 580–587.
[13]
Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos, “A unified multi-scale deep convolutional neural network for fast object detection,” in Proc. Eur. Conf. Comput. Vis., Sep. 2016, pp. 354–370.
[14]
Z. Fuet al., “Previewer for multi-scale object detector,” in Proc. 26th ACM Int. Conf. Multimedia, Oct. 2018, pp. 265–273.
[15]
J. Daiet al., “Deformable convolutional networks,” Jun. 2017, arXiv:1703.06211. [Online]. Available: https://arxiv.org/abs/1703.06211
[16]
L. Li, W. Huang, I. Y.-H. Gu, and Q. Tian, “Statistical modeling of complex backgrounds for foreground object detection,” IEEE Trans. Image Process., vol. 13, no. 11, pp. 1459–1472, Nov. 2004.
[17]
C. Zhan, X. Duan, S. Xu, Z. Song, and M. Luo, “An improved moving object detection algorithm based on frame difference and edge detection,” in Proc. 4th Int. Conf. Image Graph., Aug. 2007, pp. 519–523.
[18]
C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models for real-time tracking,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jun. 1999, pp. 246–252.
[19]
J. He, L. Balzano, and A. Szlam, “Incremental gradient on the Grassmannian for online foreground and background separation in subsampled video,” in Proc. CVPR, Jun. 2012, pp. 1568–1575.
[20]
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in Proc. CVPR, Jun. 2012, pp. 3354–3361.
[21]
L. Wenet al., “UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking,” Nov. 2015, arXiv:1511.04136. [Online]. Available: https://arxiv.org/abs/1511.04136
[22]
H. Yong, D. Meng, W. Zuo, and L. Zhang, “Robust online matrix factorization for dynamic background subtraction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 7, pp. 1726–1740, Jul. 2017.
[23]
X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. CVPR, Jun. 2018, pp. 7794–7803.
[24]
A. Vaswaniet al., “Attention is all you need,” in Proc. NIPS, 2017, pp. 5998–6008, pp. 5998–6008.
[25]
Z. Fuet al., “Foreground gated network for surveillance object detection,” in Proc. IEEE 4th Int. Conf. Multimedia Big Data, Sep. 2018, pp. 1–7.
[26]
Y. Khandhediya, K. Sav, and V. Gajjar, “Human detection for night surveillance using adaptive background subtracted image,” Sep. 2017, arXiv:1709.09389. [Online]. Available: https://arxiv.org/abs/1709.09389
[27]
D. Kang, J. Emmons, F. Abuzaid, P. Bailis, and M. Zaharia, “NoScope: Optimizing neural network queries over video at scale,” Proc. VLDB Endowment, vol. 10, no. 11, pp. 1586–1597, Aug. 2017.
[28]
T. Bouwmans, “Traditional and recent approaches in background modeling for foreground detection: An overview,” Comput. Sci. Rev., vol. 11, pp. 31–66, May 2014.
[29]
L. Maddalena and A. Petrosino, “Background subtraction for moving object detection in RGBD data: A survey,” J. Imag., vol. 4, no. 5, p. 71, May 2018.
[30]
T. Bouwmans and B. Garcia-Garcia, “Background subtraction in real applications: Challenges, current models and future directions,” 2019, arXiv:1901.03577. [Online]. Available: https://arxiv.org/abs/1901.03577
[31]
N. J. B. McFarlane and C. P. Schofield, “Segmentation and tracking of piglets in images,” Brit. Mach. Vis. Appl., vol. 8, no. 3, pp. 187–193, 1995.
[32]
B. Lee and M. Hedley, “Background estimation for video surveillance,” in Image and Vision Computing New Zealand. 2002, pp. 315–320.
[33]
N. Vaswani, T. Bouwmans, S. Javed, and P. Narayanamurthy, “Robust subspace learning: Robust PCA, robust subspace tracking, and robust subspace recovery,” IEEE Signal Process. Mag., vol. 35, no. 4, pp. 32–55, Jul. 2018.
[34]
T. Bouwmans, S. Javed, H. Zhang, Z. Lin, and R. Otazo, “On the applications of robust PCA in image and video processing,” Proc. IEEE, vol. 106, no. 8, pp. 1427–1457, Aug. 2018.
[35]
P. Rodriguez and B. Wohlberg, “Incremental principal component pursuit for video background modeling,” J. Math. Imag. Vis., vol. 55, no. 1, pp. 1–18, 2016.
[36]
H. Guo, C. Qiu, and N. Vaswani, “Practical ReProCS for separating sparse and low-dimensional signal sequences from their sum—Part 1,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., May 2014, pp. 4161–4165.
[37]
H. Guo, N. Vaswani, and C. Qiu, “Practical ReProCS for separating sparse and low-dimensional signal sequences from their sum—Part 2,” in Proc. IEEE Global Conf. Signal Inf. Process., Dec. 2014, pp. 369–373.
[38]
P. Narayanamurthy and N. Vaswani, “A fast and memory-efficient algorithm for robust PCA (MEROP),” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Apr. 2018, pp. 4684–4688.
[39]
R. Chartrand and W. Yin, “Iteratively reweighted algorithms for compressive sensing,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Mar./Apr. 2008, pp. 3869–3872.
[40]
A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in Proc. CVPR, vol. 2, Jun. 2005, pp. 60–65.
[41]
D. Glasner, S. Bagon, and M. Irani, “Super-resolution from a single image,” in Proc. ICCV, Sep./Oct. 2009, pp. 349–356.
[42]
J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Non-local sparse models for image restoration,” in Proc. ICCV, Sep./Oct. 2009, pp. 2272–2279.
[43]
J. Dai, K. He, and J. Sun, “Convolutional feature masking for joint object and stuff segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 3992–4000.
[44]
F. Wanget al., “Residual attention network for image classification,” in Proc. CVPR, Jul. 2017, pp. 6450–6458.
[45]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. CVPR, Jun. 2016, pp. 770–778.
[46]
T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. CVPR, Jul. 2017, pp. 936–944.
[47]
T. Kong, F. Sun, A. Yao, H. Liu, M. Lu, and Y. Chen, “RON: Reverse connection with objectness prior networks for object detection,” in Proc. CVPR, Jul. 2017, pp. 5244–5252.
[48]
T.-Y. Linet al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis. Berlin, Germany: Springer, 2014, pp. 740–755.
[49]
M. J. Shafiee, P. Siva, P. Fieguth, and A. Wong, “Embedded motion detection via neural response mixture background modeling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, Jun./Jul. 2016, pp. 837–844.
[50]
J. Dou, Q. Qin, and Z. Tu, “Background subtraction based on deep convolutional neural networks features,” Multimedia Tools Appl., vol. 78, no. 11, pp. 14549–14571, 2019.
[51]
T. Bouwmans, S. Javed, M. Sultana, and S. K. Jung, “Deep neural network concepts for background subtraction: A systematic review and comparative evaluation,” Nov. 2018, arXiv:1811.05255. [Online]. Available: https://arxiv.org/abs/1811.05255
[52]
A. Paszkeet al., “Automatic differentiation in pytorch,” in Proc. NIPS-W, 2017, pp. 1–4.
[53]
P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627–1645, Sep. 2010.
[54]
P. Dollár, R. Appel, S. Belongie, and P. Perona, “Fast feature pyramids for object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 8, pp. 1532–1545, Aug. 2014.
[55]
Z. Cai, M. Saberian, and N. Vasconcelos, “Learning complexity-aware cascades for deep pedestrian detection,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 3361–3369.
[56]
S. Amin and F. Galasso, “Geometric proposals for faster R-CNN,” in Proc. 14th IEEE Int. Conf. Adv. Video Signal Based Surveill., Aug./Sep. 2017, pp. 1–6.
[57]
L. Wang, Y. Lu, H. Wang, Y. Zheng, H. Ye, and X. Xue, “Evolving boxes for fast vehicle detection,” in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2017, pp. 1135–1140.
[58]
S. Lyuet al., “UA-DETRAC 2017: Report of AVSS2017 & IWT4S challenge on advanced traffic monitoring,” in Proc. 14th IEEE Int. Conf. Adv. Video Signal Based Surveill., Aug./Sep. 2017, pp. 1–7.
[59]
J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective search for object recognition,” Int. J. Comput. Vis., vol. 104, no. 2, pp. 154–171, Apr. 2013.
[60]
R. Girshick, “Fast R-CNN,” in Proc. ICCV, Dec. 2015, pp. 1440–1448.
[61]
M. Casares, S. Velipasalar, and A. Pinto, “Light-weight salient foreground detection for embedded smart cameras,” Comput. Vis. Image Understand., vol. 114, no. 11, pp. 1223–1237, 2010.
[62]
M. Casares and S. Velipasalar, “Adaptive methodologies for energy-efficient object detection and tracking with battery-powered embedded smart cameras,” IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 10, pp. 1438–1452, Oct. 2011.
[63]
S. Apewokin, B. Valentine, J. Choi, L. Wills, and S. Wills, “Real-time adaptive background modeling for multicore embedded systems,” J. Signal Process. Syst., vol. 62, no. 1, pp. 65–76, Jan. 2011.
[64]
Y. Wang, S. Velipasalar, and M. Casares, “Cooperative object tracking and composite event detection with wireless embedded smart cameras,” IEEE Trans. Image Process., vol. 19, no. 10, pp. 2614–2633, Oct. 2010.

Cited By

View all
  • (2025)ESOD: Efficient Small Object Detection on High-Resolution ImagesIEEE Transactions on Image Processing10.1109/TIP.2024.350185334(183-195)Online publication date: 1-Jan-2025
  • (2024)Context enhanced transformer for single image object detection in video dataProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i2.27825(682-690)Online publication date: 20-Feb-2024
  • (2024)Learning Temporal Distribution and Spatial Correlation Toward Universal Moving Object SegmentationIEEE Transactions on Image Processing10.1109/TIP.2024.337847333(2447-2461)Online publication date: 22-Mar-2024
  • Show More Cited By

Index Terms

  1. Foreground Gating and Background Refining Network for Surveillance Object Detection
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image IEEE Transactions on Image Processing
          IEEE Transactions on Image Processing  Volume 28, Issue 12
          Dec. 2019
          202 pages

          Publisher

          IEEE Press

          Publication History

          Published: 01 December 2019

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 21 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2025)ESOD: Efficient Small Object Detection on High-Resolution ImagesIEEE Transactions on Image Processing10.1109/TIP.2024.350185334(183-195)Online publication date: 1-Jan-2025
          • (2024)Context enhanced transformer for single image object detection in video dataProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i2.27825(682-690)Online publication date: 20-Feb-2024
          • (2024)Learning Temporal Distribution and Spatial Correlation Toward Universal Moving Object SegmentationIEEE Transactions on Image Processing10.1109/TIP.2024.337847333(2447-2461)Online publication date: 22-Mar-2024
          • (2023)Scale Invariant low frame rate trackingExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.119366215:COnline publication date: 15-Feb-2023
          • (2022)Harmonious Multi-branch Network for Person Re-identification with Harder Triplet LossACM Transactions on Multimedia Computing, Communications, and Applications10.1145/350140518:4(1-21)Online publication date: 4-Mar-2022
          • (2022)An Empirical Review of Deep Learning Frameworks for Change Detection: Model Design, Experimental Frameworks, Challenges and Research NeedsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2021.307788323:7(6101-6122)Online publication date: 1-Jul-2022
          • (2022)Improving Surveillance Object Detection with Adaptive Omni-Attention over Both Inter-frame and Intra-frame ContextComputer Vision – ACCV 202210.1007/978-3-031-26284-5_14(222-237)Online publication date: 4-Dec-2022
          • (2022)Target Detection Algorithm Based on Feature Optimization and Sample EqualizationWireless Algorithms, Systems, and Applications10.1007/978-3-031-19214-2_28(343-355)Online publication date: 24-Nov-2022
          • (2021)ACFIM: Adaptively Cyclic Feature Information-Interaction Model for Object DetectionPattern Recognition and Computer Vision10.1007/978-3-030-88004-0_31(379-391)Online publication date: 29-Oct-2021
          • (2021)DAFV: A Unified and Real-Time Framework of Joint Detection and Attributes Recognition for Fast VehiclesWireless Algorithms, Systems, and Applications10.1007/978-3-030-86130-8_28(353-365)Online publication date: 25-Jun-2021
          • Show More Cited By

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media