research-article

Accelerated Event-Based Feature Detection and Compression for Surveillance Video Systems

Authors:

Andrew C. Freeman,

Ketan Mayer-Patel,

Montek SinghAuthors Info & Claims

MMSys '24: Proceedings of the 15th ACM Multimedia Systems Conference

Pages 132 - 143

https://doi.org/10.1145/3625468.3647618

Published: 17 April 2024 Publication History

Abstract

The strong temporal consistency of surveillance video enables compelling compression performance with traditional methods, but downstream vision applications operate on decoded image frames with a high data rate. Since it is not straightforward for applications to extract information on temporal redundancy from the compressed video representations, we propose a novel system which conveys temporal redundancy within a sparse decompressed representation. We leverage a video representation framework called ADΔER to transcode framed videos to sparse, asynchronous intensity samples. We introduce mechanisms for content adaptation, lossy compression, and asynchronous forms of classical vision algorithms. We evaluate our system on the VIRAT surveillance video dataset, and we show a median 43.7% speed improvement in FAST feature detection compared to OpenCV. We run the same algorithm as OpenCV, but only process pixels that receive new asynchronous events, rather than process every pixel in an image frame. Our work paves the way for upcoming neuromorphic sensors and is amenable to future applications with spiking neural networks.

References

[1]

Raymond Baldwin, Ruixu Liu, Mohammed Mutlaq Almatrafi, Vijayan K Asari, and Keigo Hirakawa. 2022. Time-Ordered Recent Event (TORE) Volumes for Event Cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022), 1--1. https://doi.org/10.1109/TPAMI.2022.3172212

[2]

Thomas Barbier, Celine Teuliere, and Jochen Triesch. 2021. Spike Timing-Based Unsupervised Learning of Orientation, Disparity, and Motion Representations in a Spiking Neural Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 1377--1386.

[3]

Ryad Benosman, Charles Clercq, Xavier Lagorce, Sio-Hoi Ieng, and Chiara Bartolozzi. 2014. Event-Based Visual Flow. IEEE Transactions on Neural Networks and Learning Systems 25, 2 (2014), 407--417. https://doi.org/10.1109/TNNLS.2013.2273537

[4]

G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000).

[5]

Marco Cannici, Marco Ciccone, Andrea Romanoni, and Matteo Matteucci. 2019. Asynchronous Convolutional Networks for Object Detection in Neuromorphic Cameras. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 1656--1665. https://doi.org/10.1109/CVPRW.2019.00209

[6]

Hao Chen, Bo He, Hanyu Wang, Yixuan Ren, Ser-Nam Lim, and Abhinav Shrivastava. 2021. NeRV: Neural Representations for Videos. In Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (Eds.). https://openreview.net/forum?id=BbikqBWZTGB

[7]

Y. Chen, Y. Li, X. Zhang, J. Sun, and J. Jia. 2022. Focal Sparse Convolutional Networks for 3D Object Detection. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 5418--5427. https://doi.org/10.1109/CVPR52688.2022.00535

[8]

Yi-Chen Chen, Vishal M. Patel, Sumit Shekhar, Rama Chellappa, and P. Jonathon Phillips. 2013. Video-based face recognition via joint sparse representation. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). 1--8. https://doi.org/10.1109/FG.2013.6553787

[9]

Georgi Dikov, Mohsen Firouzi, Florian Röhrbein, Jörg Conradt, and Christoph Richter. 2017. Spiking Cooperative Stereo-Matching at 2 ms Latency with Neuromorphic Hardware. In Biomimetic and Biohybrid Systems, Michael Mangan, Mark Cutkosky, Anna Mura, Paul F.M.J. Verschure, Tony Prescott, and Nathan Lepora (Eds.). Springer International Publishing, Cham, 119--137.

[10]

Jiong Dong, Kaoru Ota, and Mianxiong Dong. 2023. Video Frame Interpolation: A Comprehensive Survey. ACM Trans. Multimedia Comput. Commun. Appl. 19, 2s, Article 78 (may 2023), 31 pages. https://doi.org/10.1145/3556544

Digital Library

[11]

Samuel Felipe dos Santos, Nicu Sebe, and Jurandy Almeida. 2019. CV-C3D: Action Recognition on Compressed Videos with Convolutional 3D Networks. In 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). 24--30. https://doi.org/10.1109/SIBGRAPI.2019.00012

[12]

Hadar Cohen Duwek, Albert Shalumov, and Elishai Ezra Tsur. 2021. Image Reconstruction From Neuromorphic Event Cameras Using Laplacian-Prediction and Poisson Integration With Spiking and Artificial Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 1333--1341.

[13]

Kynan Eng. 2023. Kynan Eng at CVPR 2023 Workshop on Event-based Vision. Youtube. https://www.youtube.com/watch?v=tv-GqKg4Mak&ab_channel=RPGWorkshops

[14]

FFmpeg Project. 2021. FFmpeg. https://ffmpeg.org/

[15]

Andrew C. Freeman. 2023. The ADER Framework: Tools for Event Video Representations. In Proceedings of the 14th Conference on ACM Multimedia Systems, MMSys 2023, Vancouver, BC, Canada, June 7-10, 2023. ACM, 343--347. https://doi.org/10.1145/3587819.3593028

Digital Library

[16]

Andrew C. Freeman, Chris Burgess, and Ketan Mayer-Patel. 2021. Motion Segmentation and Tracking for Integrating Event Cameras. In Proceedings of the 12th ACM Multimedia Systems Conference (Istanbul, Turkey) (MMSys '21). Association for Computing Machinery, New York, NY, USA, 1--11. https://doi.org/10.1145/3458305.3463373

Digital Library

[17]

Andrew C. Freeman and Ketan Mayer-Patel. 2020. Integrating Event Camera Sensor Emulator. In Proceedings of the 28th ACM International Conference on Multimedia (Seattle, WA, USA) (MM '20). Association for Computing Machinery, New York, NY, USA, 4503--4505. https://doi.org/10.1145/3394171.3414394

Digital Library

[18]

Andrew C. Freeman and Ketan Mayer-Patel. 2021. Lossy Compression for Integrating Event Cameras. In 2021 Data Compression Conference (DCC). 53--62. https://doi.org/10.1109/DCC50243.2021.00013

[19]

Andrew C. Freeman, Montek Singh, and Ketan Mayer-Patel. 2023. An Asynchronous Intensity Representation for Framed and Event Video Sources. In Proceedings of the 14th ACM Multimedia Systems Conference (Vancouver, BC, Canada) (MMSys '23). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3587819.3590969

Digital Library

[20]

G. Gallego, T. Delbruck, G. M. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. Davison, J. Conradt, K. Daniilidis, and D. Scaramuzza. 2020. Event-based Vision: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020), 1--1. https://doi.org/10.1109/TPAMI.2020.3008413

Digital Library

[21]

Daniel Gehrig, Antonio Loquercio, Konstantinos G. Derpanis, and Davide Scaramuzza. 2019. End-to-End Learning of Representations for Asynchronous Event-Based Data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

[22]

Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 2018. 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. CVPR (2018).

[23]

Alain Horé and Djemel Ziou. 2010. Image Quality Metrics: PSNR vs. SSIM. In 2010 20th International Conference on Pattern Recognition. 2366--2369. https://doi.org/10.1109/ICPR.2010.579

Digital Library

[24]

Mikael Jacquemont., Luca Antiga., Thomas Vuillaume., Giorgia Silvestri., Alexandre Benoit., Patrick Lambert., and Gilles Maurin. 2019. Indexed Operations for Non-rectangular Lattices Applied to Convolutional Neural Networks. In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019) - Volume 5: VISAPP. INSTICC, SciTePress, 362--371. https://doi.org/10.5220/0007364303620371

[25]

Chongyi Li, Chunle Guo, Linghao Han, Jun Jiang, Ming-Ming Cheng, Jinwei Gu, and Chen Change Loy. 2022. Low-Light Image and Video Enhancement Using Deep Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 12(2022), 9396--9416. https://doi.org/10.1109/TPAMI.2021.3126387

[26]

Zhi Li, Anne Aaron, Ioannis Katsavounidis, Anush Moorthy, and Megha Manohara. 2016. Toward a practical perceptual video quality metric. The Netflix Tech Blog 6, 2 (2016).

[27]

P. Lichtsteiner, C. Posch, and T. Delbruck. 2006. A 128 X 128 120db 30mw asynchronous vision sensor that responds to relative intensity change. In 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers. 2060--2069.

[28]

Hongying Liu, Zhubo Ruan, Peng Zhao, Chao Dong, Fanhua Shang, Yuanyuan Liu, Linlin Yang, and Radu Timofte. 2022. Video Super-Resolution Based on Deep Learning: A Comprehensive Survey. Artif. Intell. Rev. 55, 8 (dec 2022), 5981--6035. https://doi.org/10.1007/s10462-022-10147-y

Digital Library

[29]

Ana Maqueda, Antonio Loquercio, Guillermo Gallego, Narciso Garcia, and Davide Scaramuzza. 2018. Event-Based Vision Meets Deep Learning on Steering Prediction for Self-Driving Cars. 5419--5427. https://doi.org/10.1109/CVPR.2018.00568

[30]

Nico Messikommer, Daniel Gehrig, Antonio Loquercio, and Davide Scaramuzza. 2020. Event-based Asynchronous Sparse Convolutional Networks. European Conference on Computer Vision. (ECCV). http://rpg.ifi.uzh.ch/docs/ECCV20_Messikommer.pdf

[31]

Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee, J. K. Aggarwal, Hyungtae Lee, Larry Davis, Eran Swears, Xioyang Wang, Qiang Ji, Kishore Reddy, Mubarak Shah, Carl Vondrick, Hamed Pirsiavash, Deva Ramanan, Jenny Yuen, Antonio Torralba, Bi Song, Anesco Fong, Amit Roy-Chowdhury, and Mita Desai. 2011. A large-scale benchmark dataset for event recognition in surveillance video. In CVPR 2011. 3153--3160. https://doi.org/10.1109/CVPR.2011.5995586

Digital Library

[32]

Liyuan Pan, Cedric Scheerlinck, Xin Yu, Richard Hartley, Miaomiao Liu, and Yuchao Dai. 2019. Bringing a Blurry Frame Alive at High Frame-Rate With an Event Camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]

Mathias Parger, Chengcheng Tang, Christopher D. Twigg, Cem Keskin, Robert Wang, and Markus Steinberger. 2022. DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12487--12496. https://doi.org/10.1109/CVPR52688.2022.01217

[34]

W. B. Pennebaker, J. L. Mitchell, G. G. Langdon, and R. B. Arps. 1988. An overview of the basic principles of the Q-Coder adaptive binary arithmetic coder. IBM Journal of Research and Development 32, 6 (1988), 717--726. https://doi.org/10.1147/rd.326.0717

Digital Library

[35]

Reza Rassool. 2017. VMAF reproducibility: Validating a perceptual practical video quality metric. In 2017 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting(BMSB). 1--2. https://doi.org/10.1109/BMSB.2017.7986143

[36]

Henri Rebecq, Timo Horstschaefer, and Davide Scaramuzza. 2017. Real-time Visual-Inertial Odometry for Event Cameras using Keyframe-based Nonlinear Optimization. https://doi.org/10.5244/C.31.16

[37]

E. Rosten and T. Drummond. 2005. Fusing points and lines for high performance tracking. In Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, Vol. 2. 1508-1515 Vol. 2. https://doi.org/10.1109/ICCV.2005.104

Digital Library

[38]

Sourav Dey Roy and Mrinal Kanti Bhowmik. 2020. A Comprehensive Survey on Computer Vision Based Approaches for Moving Object Detection. In 2020 IEEE Region 10 Symposium (TENSYMP). 1531--1534. https://doi.org/10.1109/TENSYMP50017.2020.9230869

[39]

Chen Song, Qixing Huang, and Chandrajit Bajaj. 2022. E-CIR: Event-Enhanced Continuous Intensity Recovery. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7793--7802. https://doi.org/10.1109/CVPR52688.2022.00765

[40]

Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649--1668. https://doi.org/10.1109/TCSVT.2012.2221191

Digital Library

[41]

Bishan Wang, Jingwei He, Lei Yu, Gui-Song Xia, and Wen Yang. 2020. Event Enhanced High-Quality Image Recovery. In European Conference on Computer Vision. Springer.

[42]

Olivia Wiles, João Carreira, Iain Barr, Andrew Zisserman, and Mateusz Malinowski. 2023. Compressed Vision for Efficient Video Understanding. In Computer Vision - ACCV 2022, Lei Wang, Juergen Gall, Tat-Jun Chin, Imari Sato, and Rama Chellappa (Eds.). Springer Nature Switzerland, Cham, 679--695.

[43]

C. Wu, M. Zaheer, H. Hu, R. Manmatha, A. J. Smola, and P. Krahenbuhl. 2018. Compressed Video Action Recognition. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 6026--6035. https://doi.org/10.1109/CVPR.2018.00631

[44]

K. Xu, M. Qin, F. Sun, Y. Wang, Y. Chen, and F. Ren. 2020. Learning in the Frequency Domain. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 1737--1746. https://doi.org/10.1109/CVPR42600.2020.00181

[45]

Xiaokai Yi, Hanli Wang, Sam Kwong, and C.-C. Jay Kuo. 2022. Task-Driven Video Compression for Humans and Machines: Framework Design and Optimization. IEEE Transactions on Multimedia (2022), 1--12. https://doi.org/10.1109/TMM.2022.3233245

Digital Library

[46]

Alex Zhu, Liangzhe Yuan, Kenneth Chaney, and Kostas Daniilidis. 2018. EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras. https://doi.org/10.15607/RSS.2018.XIV.062

[47]

Sergey Zvezdakov, Denis Kondranin, and Dmitriy Vatolin. 2021. Machine-Learning-Based Method for Content-Adaptive Video Encoding. In 2021 Picture Coding Symposium (PCS). 1--5. https://doi.org/10.1109/PCS50896.2021.9477507

Cited By

Index Terms

Accelerated Event-Based Feature Detection and Compression for Surveillance Video Systems
1. Applied computing
  1. Computer forensics
    1. Surveillance mechanisms
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
        Scene anomaly detection
  2. Computer graphics
    1. Image compression
    2. Image manipulation
      1. Image processing

Recommendations

An Open Software Suite for Event-Based Video
MMSys '24: Proceedings of the 15th ACM Multimedia Systems Conference

While traditional video representations are organized around discrete image frames, event-based video is a new paradigm that forgoes image frames altogether. Rather, pixel samples are temporally asynchronous and independent of one another. Until now, ...
An Asynchronous Intensity Representation for Framed and Event Video Sources
MMSys '23: Proceedings of the 14th ACM Multimedia Systems Conference

Neuromorphic "event" cameras, designed to mimic the human vision system with asynchronous sensing, unlock a new realm of high-speed and high-dynamic-range applications. However, researchers often either revert to a framed representation of event data for ...
The ADΔER Framework: Tools for Event Video Representations
MMSys '23: Proceedings of the 14th ACM Multimedia Systems Conference

The concept of "video" is synonymous with frame-sequence image representations. However, neuromorphic "event" cameras, which are rapidly gaining adoption for computer vision tasks, record frameless video. We believe that these different paradigms of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMSys '24: Proceedings of the 15th ACM Multimedia Systems Conference

April 2024

557 pages

ISBN:9798400704123

DOI:10.1145/3625468

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2024

Accepted: 15 March 2024

Revised: 08 February 2024

Received: 29 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

MMSys '24

Sponsor:

SIGMM

MMSys '24: ACM Multimedia Systems Conference 2024

April 15 - 18, 2024

Bari, Italy

Acceptance Rates

Overall Acceptance Rate 176 of 530 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
111
Total Downloads

Downloads (Last 12 months)111
Downloads (Last 6 weeks)10

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents