Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3625468.3647618acmconferencesArticle/Chapter ViewAbstractPublication PagesmmsysConference Proceedingsconference-collections
research-article

Accelerated Event-Based Feature Detection and Compression for Surveillance Video Systems

Published: 17 April 2024 Publication History

Abstract

The strong temporal consistency of surveillance video enables compelling compression performance with traditional methods, but downstream vision applications operate on decoded image frames with a high data rate. Since it is not straightforward for applications to extract information on temporal redundancy from the compressed video representations, we propose a novel system which conveys temporal redundancy within a sparse decompressed representation. We leverage a video representation framework called ADΔER to transcode framed videos to sparse, asynchronous intensity samples. We introduce mechanisms for content adaptation, lossy compression, and asynchronous forms of classical vision algorithms. We evaluate our system on the VIRAT surveillance video dataset, and we show a median 43.7% speed improvement in FAST feature detection compared to OpenCV. We run the same algorithm as OpenCV, but only process pixels that receive new asynchronous events, rather than process every pixel in an image frame. Our work paves the way for upcoming neuromorphic sensors and is amenable to future applications with spiking neural networks.

References

[1]
Raymond Baldwin, Ruixu Liu, Mohammed Mutlaq Almatrafi, Vijayan K Asari, and Keigo Hirakawa. 2022. Time-Ordered Recent Event (TORE) Volumes for Event Cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022), 1--1. https://doi.org/10.1109/TPAMI.2022.3172212
[2]
Thomas Barbier, Celine Teuliere, and Jochen Triesch. 2021. Spike Timing-Based Unsupervised Learning of Orientation, Disparity, and Motion Representations in a Spiking Neural Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 1377--1386.
[3]
Ryad Benosman, Charles Clercq, Xavier Lagorce, Sio-Hoi Ieng, and Chiara Bartolozzi. 2014. Event-Based Visual Flow. IEEE Transactions on Neural Networks and Learning Systems 25, 2 (2014), 407--417. https://doi.org/10.1109/TNNLS.2013.2273537
[4]
G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000).
[5]
Marco Cannici, Marco Ciccone, Andrea Romanoni, and Matteo Matteucci. 2019. Asynchronous Convolutional Networks for Object Detection in Neuromorphic Cameras. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 1656--1665. https://doi.org/10.1109/CVPRW.2019.00209
[6]
Hao Chen, Bo He, Hanyu Wang, Yixuan Ren, Ser-Nam Lim, and Abhinav Shrivastava. 2021. NeRV: Neural Representations for Videos. In Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (Eds.). https://openreview.net/forum?id=BbikqBWZTGB
[7]
Y. Chen, Y. Li, X. Zhang, J. Sun, and J. Jia. 2022. Focal Sparse Convolutional Networks for 3D Object Detection. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 5418--5427. https://doi.org/10.1109/CVPR52688.2022.00535
[8]
Yi-Chen Chen, Vishal M. Patel, Sumit Shekhar, Rama Chellappa, and P. Jonathon Phillips. 2013. Video-based face recognition via joint sparse representation. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). 1--8. https://doi.org/10.1109/FG.2013.6553787
[9]
Georgi Dikov, Mohsen Firouzi, Florian Röhrbein, Jörg Conradt, and Christoph Richter. 2017. Spiking Cooperative Stereo-Matching at 2 ms Latency with Neuromorphic Hardware. In Biomimetic and Biohybrid Systems, Michael Mangan, Mark Cutkosky, Anna Mura, Paul F.M.J. Verschure, Tony Prescott, and Nathan Lepora (Eds.). Springer International Publishing, Cham, 119--137.
[10]
Jiong Dong, Kaoru Ota, and Mianxiong Dong. 2023. Video Frame Interpolation: A Comprehensive Survey. ACM Trans. Multimedia Comput. Commun. Appl. 19, 2s, Article 78 (may 2023), 31 pages. https://doi.org/10.1145/3556544
[11]
Samuel Felipe dos Santos, Nicu Sebe, and Jurandy Almeida. 2019. CV-C3D: Action Recognition on Compressed Videos with Convolutional 3D Networks. In 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). 24--30. https://doi.org/10.1109/SIBGRAPI.2019.00012
[12]
Hadar Cohen Duwek, Albert Shalumov, and Elishai Ezra Tsur. 2021. Image Reconstruction From Neuromorphic Event Cameras Using Laplacian-Prediction and Poisson Integration With Spiking and Artificial Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 1333--1341.
[13]
Kynan Eng. 2023. Kynan Eng at CVPR 2023 Workshop on Event-based Vision. Youtube. https://www.youtube.com/watch?v=tv-GqKg4Mak&ab_channel=RPGWorkshops
[14]
FFmpeg Project. 2021. FFmpeg. https://ffmpeg.org/
[15]
Andrew C. Freeman. 2023. The ADER Framework: Tools for Event Video Representations. In Proceedings of the 14th Conference on ACM Multimedia Systems, MMSys 2023, Vancouver, BC, Canada, June 7-10, 2023. ACM, 343--347. https://doi.org/10.1145/3587819.3593028
[16]
Andrew C. Freeman, Chris Burgess, and Ketan Mayer-Patel. 2021. Motion Segmentation and Tracking for Integrating Event Cameras. In Proceedings of the 12th ACM Multimedia Systems Conference (Istanbul, Turkey) (MMSys '21). Association for Computing Machinery, New York, NY, USA, 1--11. https://doi.org/10.1145/3458305.3463373
[17]
Andrew C. Freeman and Ketan Mayer-Patel. 2020. Integrating Event Camera Sensor Emulator. In Proceedings of the 28th ACM International Conference on Multimedia (Seattle, WA, USA) (MM '20). Association for Computing Machinery, New York, NY, USA, 4503--4505. https://doi.org/10.1145/3394171.3414394
[18]
Andrew C. Freeman and Ketan Mayer-Patel. 2021. Lossy Compression for Integrating Event Cameras. In 2021 Data Compression Conference (DCC). 53--62. https://doi.org/10.1109/DCC50243.2021.00013
[19]
Andrew C. Freeman, Montek Singh, and Ketan Mayer-Patel. 2023. An Asynchronous Intensity Representation for Framed and Event Video Sources. In Proceedings of the 14th ACM Multimedia Systems Conference (Vancouver, BC, Canada) (MMSys '23). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3587819.3590969
[20]
G. Gallego, T. Delbruck, G. M. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. Davison, J. Conradt, K. Daniilidis, and D. Scaramuzza. 2020. Event-based Vision: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020), 1--1. https://doi.org/10.1109/TPAMI.2020.3008413
[21]
Daniel Gehrig, Antonio Loquercio, Konstantinos G. Derpanis, and Davide Scaramuzza. 2019. End-to-End Learning of Representations for Asynchronous Event-Based Data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
[22]
Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 2018. 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. CVPR (2018).
[23]
Alain Horé and Djemel Ziou. 2010. Image Quality Metrics: PSNR vs. SSIM. In 2010 20th International Conference on Pattern Recognition. 2366--2369. https://doi.org/10.1109/ICPR.2010.579
[24]
Mikael Jacquemont., Luca Antiga., Thomas Vuillaume., Giorgia Silvestri., Alexandre Benoit., Patrick Lambert., and Gilles Maurin. 2019. Indexed Operations for Non-rectangular Lattices Applied to Convolutional Neural Networks. In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019) - Volume 5: VISAPP. INSTICC, SciTePress, 362--371. https://doi.org/10.5220/0007364303620371
[25]
Chongyi Li, Chunle Guo, Linghao Han, Jun Jiang, Ming-Ming Cheng, Jinwei Gu, and Chen Change Loy. 2022. Low-Light Image and Video Enhancement Using Deep Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 12(2022), 9396--9416. https://doi.org/10.1109/TPAMI.2021.3126387
[26]
Zhi Li, Anne Aaron, Ioannis Katsavounidis, Anush Moorthy, and Megha Manohara. 2016. Toward a practical perceptual video quality metric. The Netflix Tech Blog 6, 2 (2016).
[27]
P. Lichtsteiner, C. Posch, and T. Delbruck. 2006. A 128 X 128 120db 30mw asynchronous vision sensor that responds to relative intensity change. In 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers. 2060--2069.
[28]
Hongying Liu, Zhubo Ruan, Peng Zhao, Chao Dong, Fanhua Shang, Yuanyuan Liu, Linlin Yang, and Radu Timofte. 2022. Video Super-Resolution Based on Deep Learning: A Comprehensive Survey. Artif. Intell. Rev. 55, 8 (dec 2022), 5981--6035. https://doi.org/10.1007/s10462-022-10147-y
[29]
Ana Maqueda, Antonio Loquercio, Guillermo Gallego, Narciso Garcia, and Davide Scaramuzza. 2018. Event-Based Vision Meets Deep Learning on Steering Prediction for Self-Driving Cars. 5419--5427. https://doi.org/10.1109/CVPR.2018.00568
[30]
Nico Messikommer, Daniel Gehrig, Antonio Loquercio, and Davide Scaramuzza. 2020. Event-based Asynchronous Sparse Convolutional Networks. European Conference on Computer Vision. (ECCV). http://rpg.ifi.uzh.ch/docs/ECCV20_Messikommer.pdf
[31]
Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee, J. K. Aggarwal, Hyungtae Lee, Larry Davis, Eran Swears, Xioyang Wang, Qiang Ji, Kishore Reddy, Mubarak Shah, Carl Vondrick, Hamed Pirsiavash, Deva Ramanan, Jenny Yuen, Antonio Torralba, Bi Song, Anesco Fong, Amit Roy-Chowdhury, and Mita Desai. 2011. A large-scale benchmark dataset for event recognition in surveillance video. In CVPR 2011. 3153--3160. https://doi.org/10.1109/CVPR.2011.5995586
[32]
Liyuan Pan, Cedric Scheerlinck, Xin Yu, Richard Hartley, Miaomiao Liu, and Yuchao Dai. 2019. Bringing a Blurry Frame Alive at High Frame-Rate With an Event Camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[33]
Mathias Parger, Chengcheng Tang, Christopher D. Twigg, Cem Keskin, Robert Wang, and Markus Steinberger. 2022. DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12487--12496. https://doi.org/10.1109/CVPR52688.2022.01217
[34]
W. B. Pennebaker, J. L. Mitchell, G. G. Langdon, and R. B. Arps. 1988. An overview of the basic principles of the Q-Coder adaptive binary arithmetic coder. IBM Journal of Research and Development 32, 6 (1988), 717--726. https://doi.org/10.1147/rd.326.0717
[35]
Reza Rassool. 2017. VMAF reproducibility: Validating a perceptual practical video quality metric. In 2017 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting(BMSB). 1--2. https://doi.org/10.1109/BMSB.2017.7986143
[36]
Henri Rebecq, Timo Horstschaefer, and Davide Scaramuzza. 2017. Real-time Visual-Inertial Odometry for Event Cameras using Keyframe-based Nonlinear Optimization. https://doi.org/10.5244/C.31.16
[37]
E. Rosten and T. Drummond. 2005. Fusing points and lines for high performance tracking. In Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, Vol. 2. 1508-1515 Vol. 2. https://doi.org/10.1109/ICCV.2005.104
[38]
Sourav Dey Roy and Mrinal Kanti Bhowmik. 2020. A Comprehensive Survey on Computer Vision Based Approaches for Moving Object Detection. In 2020 IEEE Region 10 Symposium (TENSYMP). 1531--1534. https://doi.org/10.1109/TENSYMP50017.2020.9230869
[39]
Chen Song, Qixing Huang, and Chandrajit Bajaj. 2022. E-CIR: Event-Enhanced Continuous Intensity Recovery. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7793--7802. https://doi.org/10.1109/CVPR52688.2022.00765
[40]
Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649--1668. https://doi.org/10.1109/TCSVT.2012.2221191
[41]
Bishan Wang, Jingwei He, Lei Yu, Gui-Song Xia, and Wen Yang. 2020. Event Enhanced High-Quality Image Recovery. In European Conference on Computer Vision. Springer.
[42]
Olivia Wiles, João Carreira, Iain Barr, Andrew Zisserman, and Mateusz Malinowski. 2023. Compressed Vision for Efficient Video Understanding. In Computer Vision - ACCV 2022, Lei Wang, Juergen Gall, Tat-Jun Chin, Imari Sato, and Rama Chellappa (Eds.). Springer Nature Switzerland, Cham, 679--695.
[43]
C. Wu, M. Zaheer, H. Hu, R. Manmatha, A. J. Smola, and P. Krahenbuhl. 2018. Compressed Video Action Recognition. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 6026--6035. https://doi.org/10.1109/CVPR.2018.00631
[44]
K. Xu, M. Qin, F. Sun, Y. Wang, Y. Chen, and F. Ren. 2020. Learning in the Frequency Domain. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 1737--1746. https://doi.org/10.1109/CVPR42600.2020.00181
[45]
Xiaokai Yi, Hanli Wang, Sam Kwong, and C.-C. Jay Kuo. 2022. Task-Driven Video Compression for Humans and Machines: Framework Design and Optimization. IEEE Transactions on Multimedia (2022), 1--12. https://doi.org/10.1109/TMM.2022.3233245
[46]
Alex Zhu, Liangzhe Yuan, Kenneth Chaney, and Kostas Daniilidis. 2018. EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras. https://doi.org/10.15607/RSS.2018.XIV.062
[47]
Sergey Zvezdakov, Denis Kondranin, and Dmitriy Vatolin. 2021. Machine-Learning-Based Method for Content-Adaptive Video Encoding. In 2021 Picture Coding Symposium (PCS). 1--5. https://doi.org/10.1109/PCS50896.2021.9477507

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MMSys '24: Proceedings of the 15th ACM Multimedia Systems Conference
April 2024
557 pages
ISBN:9798400704123
DOI:10.1145/3625468
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2024
Accepted: 15 March 2024
Revised: 08 February 2024
Received: 29 November 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. event representation
  2. event video
  3. event vision
  4. video processing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

MMSys '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 176 of 530 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 111
    Total Downloads
  • Downloads (Last 12 months)111
  • Downloads (Last 6 weeks)10
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media