Abstract
Object detection is the key process in any video surveillance application. In case of remote surveillance, it is a necessity to accurately detect the target and transmit the detected data rapidly to main station so that further actions can be taken. This paper concentrates on a framework which uses deep neural network and Internet of Things for target detection and transferring detected information to the cloud at low transmission rates. The detection framework is based on combination of YOLO-Lite which is a simpler version of you only look once (YOLO) detector and spatial pyramid pooling (SPP). When trained on COCO dataset, YOLO-Lite + SPP model runs at a speed of 40 fps with mAP of 35.7% on non-GPU platform. Performance of the same has been analyzed on PASCAL VOC, COCO, TB-50 and TB-100 dataset. On GPU based platform, precision and recall values of 89.79% and 91.67% has been achieved with processing speed of 218 fps. ThingSpeak platform has been used for data reception on cloud. Results in real-time are also demonstrated which proves the efficiency of the anticipated framework and also confirms its suitability for remote video surveillance.
Similar content being viewed by others
Code availability
The authors declare that no exact code has been copied to carry out the research.
References
Zhao, Z. Q., Zheng, P., Xu, S. T., & Wu, X. (2019). Object detection with deep learning: A review. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3212–3232.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916.
Ren, Y., Huang, J., Hong, Z., Lu, W., Yin, J., Zou, L., & Shen, X. (2020). Image-based concrete crack detection in tunnels using deep fully convolutional networks. Construction and Building Materials, 234, 117367.
Feng, W., Ji, D., Wang, Y., Chang, S., Ren, H. & Gan, W. (2018). Challenges on large scale surveillance video analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 69–76).
Zhang, R., Liu, X., Hu, J., Chang, K., & Liu, K. (2017). A fast method for moving object detection in video surveillance image. Signal, Image and Video Processing, 11(5), 841–848.
Varga, D., & Szirányi, T. (2017). Robust real-time pedestrian detection in surveillance videos. Journal of Ambient Intelligence and Humanized Computing, 8(1), 79–85.
Zhou, P., Ding, Q., Luo, H., & Hou, X. (2018). Violence detection in surveillance video using low-level features. PLoS ONE, 13(10), e0203668.
Hu, L., & Ni, Q. (2017). IoT-driven automated object detection algorithm for urban surveillance systems in smart cities. IEEE Internet of Things Journal, 5(2), 747–754.
Nikouei, S. Y., Chen, Y., Song, S., Xu, R., Choi, B. Y., & Faughnan, T. R. (2018). Real-time human detection as an edge service enabled by a lightweight cnn. In 2018 IEEE International Conference on Edge Computing (EDGE) (pp. 125–129). IEEE.
Wang, H., Wang, P., & Qian, X. (2018). MPNET: An end-to-end deep neural network for object detection in surveillance video. IEEE Access, 6, 30296–30308.
Muhammad, K., Ahmad, J., Mehmood, I., Rho, S., & Baik, S. W. (2018). Convolutional neural networks based fire detection in surveillance videos. IEEE Access, 6, 18174–18183.
Kim, K.H., Hong, S., Roh, B., Cheon, Y. & Park, M. (2016). Pvanet: Deep but lightweight neural networks for real-time object detection. arXiv preprint . arXiv:1608.08021.
Nguyen, T. B. & Chung, S. T. (2016). ConvNets and AGMM based real-time human detection under fisheye camera for embedded surveillance. In 2016 international conference on information and communication technology convergence (ICTC) (pp. 840–845). IEEE.
Anisimov, D. and Khanova, T. (2017). Towards lightweight convolutional neural networks for object detection. In 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS) (pp. 1–8). IEEE.
Li, X., Ye, M., Liu, Y., & Zhu, C. (2017). Adaptive deep convolutional neural networks for scene-specific object detection. IEEE Transactions on Circuits and Systems for Video Technology. https://doi.org/10.1109/TCSVT.2017.2749620.
He, Z., & He, H. (2018). Unsupervised multi-object detection for video surveillance using memory-based recurrent attention networks. Symmetry, 10(9), 375.
Huang, R., Pedoeem, J., & Chen, C. (2018). YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 2503–2510). IEEE.
Redmon, J. (2016). Darknet: Open source neural networks in c. Pjreddie. com.
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788).
Cai, Z. & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6154–6162).
He, K., Gkioxari, G., Dollár, P. & Ross, B. (2017). Girshick. Mask R-CNN. In ICCV
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H. and Wei, Y., 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 764–773).
Lin, T.Y., Goyal, P., Girshick, R., He, K. & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y. & Berg, A. C. (2016). Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21–37). Springer, Cham.
Fu, C.Y., Liu, W., Ranga, A., Tyagi, A. & Berg, A.C. (2017). Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659.
Zhou, P., Ni, B., Geng, C., Hu, J. & Xu, Y. (2018). Scale-transferrable object detection. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 528–537).
Kumar, S., Raja, R., & Gandham, A. (2020). Tracking an Object Using Traditional MS (Mean Shift) and CBWH MS (Mean Shift) Algorithm with Kalman Filter. In Applications of Machine Learning.
Kumar, S., Singh, S., & Kumar, J. (2018). Automatic live facial expression detection using genetic algorithm with haar wavelet features and SVM. Wireless Personal Communications, 103(3), 2423–2453.
Kumar, S., Singh, S., & Kumar, J. (2018). Live detection of face using machine learning with multi-feature method. Wireless Personal Communications, 103(3), 2233–2375.
Funding
The authors declare that no funding was received for this research work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Availability of data and materials
The authors declare that no data or material was taken illegally. However, publically available benchmark datasets were taken for implementation.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gautam, A., Singh, S. Deep Learning Based Object Detection Combined with Internet of Things for Remote Surveillance. Wireless Pers Commun 118, 2121–2140 (2021). https://doi.org/10.1007/s11277-021-08071-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-021-08071-5