research-article

Open access

Towards Real-time Video Compressive Sensing on Mobile Devices

Authors:

Xin YuanAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 11080 - 11088

https://doi.org/10.1145/3664647.3680561

Published: 28 October 2024 Publication History

Abstract

Video Snapshot Compressive Imaging (SCI) uses a low-speed 2D camera to capture high-speed scenes as snapshot compressed measurements, followed by a reconstruction algorithm to retrieve the high-speed video frames. The fast evolving mobile devices and existing high-performance video SCI reconstruction algorithms motivate us to develop mobile reconstruction methods for real-world applications. Yet, it is still challenging to deploy previous reconstruction algorithms on mobile devices due to the complex inference process, let alone real-time mobile reconstruction. To the best of our knowledge, there is no video SCI reconstruction model designed to run on the mobile devices. Towards this end, in this paper, we present an effective approach for video SCI reconstruction, dubbed MobileSCI, which can run at real-time speed on the mobile devices for the first time. Specifically, we first build a U-shaped 2D convolution-based architecture, which is much more efficient and mobile-friendly than previous state-of-the-art reconstruction methods. Besides, an efficient feature mixing block, based on the channel splitting and shuffling mechanisms, is introduced as a novel bottleneck block of our proposed MobileSCI to alleviate the computational burden. Finally, a customized knowledge distillation strategy is utilized to further improve the reconstruction quality. Extensive results on both simulated and real data show that our proposed MobileSCI can achieve superior reconstruction quality with high efficiency on the mobile devices. Particularly, we can reconstruct a 256x256x8 snapshot compressed measurement with real-time performance (about 35 FPS) on an iPhone 15. Code is available at https://github.com/mcao92/MobileSCI.

References

[1]

Yuanhao Cai, Jing Lin, Xiaowan Hu, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, and Luc Van Gool. 2022. Coarse-to-fine sparse transformer for hyperspectral image reconstruction. In ECCV.

[2]

Miao Cao, Lishun Wang, Huan Wang, and Xin Yuan. 2024. A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging. In ECCV.

[3]

Miao Cao, Lishun Wang, Mingyu Zhu, and Xin Yuan. 2024. Hybrid CNNTransformer Architecture for Efficient Large-Scale Video Snapshot Compressive Imaging. IJCV (2024), 1--20.

[4]

Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Xiaoyi Dong, Lu Yuan, and Zicheng Liu. 2022. Mobile-former: Bridging mobilenet and transformer. In CVPR.

[5]

Ziheng Cheng, Bo Chen, Guanliang Liu, Hao Zhang, Ruiying Lu, ZhengjueWang, and Xin Yuan. 2021. Memory-efficient network for large-scale video compressive sensing. In CVPR.

[6]

Ziheng Cheng, Bo Chen, Ruiying Lu, ZhengjueWang, Hao Zhang, Ziyi Meng, and Xin Yuan. 2022. Recurrent neural networks for snapshot compressive imaging. TPAMI 45, 2 (2022), 2264--2281.

[7]

Chao Deng, Yuanlong Zhang, Yifeng Mao, Jingtao Fan, Jinli Suo, Zhili Zhang, and Qionghai Dai. 2019. Sinusoidal sampling enhanced compressive camera for high speed imaging. TPAMI 43, 4 (2019), 1380--1393.

[8]

David L Donoho. 2006. Compressed sensing. IEEE Transactions on Information Theory 52, 4 (2006), 1289--1306.

Digital Library

[9]

Yufei Dou, Miao Cao, Xiaodong Wang, Xing Liu, and Xin Yuan. 2023. Coded aperture temporal compressive digital holographic microscopy. Optics Letters 48, 20 (2023), 5427--5430.

[10]

Kai Han, YunheWang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. 2020. Ghostnet: More features from cheap operations. In CVPR.

[11]

Yasunobu Hitomi, Jinwei Gu, Mohit Gupta, Tomoo Mitsunaga, and Shree K Nayar. 2011. Video from a single coded exposure photograph using a learned over-complete dictionary. In ICCV.

[12]

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. 2019. Searching for mobilenetv3. In ICCV.

[13]

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[14]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.

[15]

Chenyu Li, Bing Zhang, Danfeng Hong, Jun Zhou, Gemine Vivone, Shutao Li, and Jocelyn Chanussot. 2024. CasFormer: Cascaded transformers for fusion-aware computational hyperspectral imaging. Information Fusion (2024), 102408.

[16]

Yanyu Li, Ju Hu, Yang Wen, Georgios Evangelidis, Kamyar Salahi, Yanzhi Wang, Sergey Tulyakov, and Jian Ren. 2023. Rethinking vision transformers for mobilenet size and speed. In ICCV.

[17]

Yang Liu, Xin Yuan, Jinli Suo, David J Brady, and Qionghai Dai. 2018. Rank minimization for snapshot compressive imaging. TPAMI 41, 12 (2018), 2990-- 3006.

[18]

Runqiu Luo, Miao Cao, Xing Liu, and Xin Yuan. 2024. Snapshot compressive structured illumination microscopy. Optics Letters 49, 2 (2024), 186--189.

[19]

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In ECCV.

[20]

Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. 2013. Rectifier nonlinearities improve neural network acoustic models. In ICML.

[21]

Muhammad Maaz, Abdelrahman Shaker, Hisham Cholakkal, Salman Khan, Syed Waqas Zamir, Rao Muhammad Anwer, and Fahad Shahbaz Khan. 2022. Edgenext: efficiently amalgamated cnn-transformer architecture for mobile vision applications. In ECCV.

[22]

Junting Pan, Adrian Bulat, Fuwen Tan, Xiatian Zhu, Lukasz Dudziak, Hongsheng Li, Georgios Tzimiropoulos, and Brais Martinez. 2022. Edgevits: Competing light-weight cnns on mobile devices with vision transformers. In ECCV.

[23]

Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbeláez, Alex Sorkine- Hornung, and Luc Van Gool. 2017. The 2017 davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675 (2017).

[24]

Mu Qiao, Ziyi Meng, Jiawei Ma, and Xin Yuan. 2020. Deep learning for video compressive sensing. Apl Photonics 5, 3 (2020), 030801.

[25]

Dikpal Reddy, Ashok Veeraraghavan, and Rama Chellappa. 2011. P2C2: Programmable pixel compressive camera for high speed imaging. In CVPR.

[26]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang- Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR.

[27]

Andranik Sargsyan, Shant Navasardyan, Xingqian Xu, and Humphrey Shi. 2023. Mi-gan: A simple baseline for image inpainting on mobile devices. In ICCV.

[28]

Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P Aitken, Rob Bishop, Daniel Rueckert, and ZehanWang. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR.

[29]

Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Chao Xu, and Yunhe Wang. 2022. GhostNetv2: Enhance cheap operation with long-range attention. In NeurIPS.

[30]

Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, and Anurag Ranjan. 2023. Mobileone: An improved one millisecond mobile backbone. In CVPR.

[31]

Lishun Wang, Miao Cao, and Xin Yuan. 2023. Efficientsci: Densely connected network with space-time factorization for large-scale video snapshot compressive imaging. In CVPR.

[32]

Lishun Wang, Miao Cao, Yong Zhong, and Xin Yuan. 2022. Spatial-temporal transformer for video snapshot compressive imaging. TPAMI 45, 7 (2022), 9072-- 9089.

[33]

LishunWang, ZongliangWu, Yong Zhong, and Xin Yuan. 2022. Snapshot spectral compressive imaging reconstruction using convolution and contextual Transformer. Photonics Research 10, 8 (2022), 1848--1858.

[34]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. TIP 13, 4 (2004), 600--612.

Digital Library

[35]

Zhengjue Wang, Hao Zhang, Ziheng Cheng, Bo Chen, and Xin Yuan. 2021. Metasci: Scalable and adaptive reconstruction for video compressive sensing. In CVPR.

[36]

Zhuoyuan Wu, Jian Zhang, and Chong Mou. 2021. Dense Deep Unfolding Network With 3D-CNN Prior for Snapshot Compressive Imaging. In ICCV.

[37]

Chengshuai Yang, Shiyu Zhang, and Xin Yuan. 2022. Ensemble learning priors unfolding for scalable Snapshot Compressive Sensing. In ECCV.

[38]

Jianbo Yang, Xuejun Liao, Xin Yuan, Patrick Llull, David J Brady, Guillermo Sapiro, and Lawrence Carin. 2014. Compressive sensing by learning a Gaussian mixture model from measurements. TIP 24, 1 (2014), 106--119.

[39]

Xin Yuan. 2016. Generalized alternating projection based total variation minimization for compressive sensing. In ICIP.

[40]

Xin Yuan, David J Brady, and Aggelos K Katsaggelos. 2021. Snapshot compressive imaging: Theory, algorithms, and applications. IEEE Signal Processing Magazine 38, 2 (2021), 65--88.

[41]

Xin Yuan, Yang Liu, Jinli Suo, and Qionghai Dai. 2020. Plug-and-play algorithms for large-scale snapshot compressive imaging. In CVPR.

[42]

Xin Yuan, Yang Liu, Jinli Suo, Fredo Durand, and Qionghai Dai. 2021. Plug-and- Play Algorithms for Video Snapshot Compressive Imaging. TPAMI 01 (2021), 1--1.

[43]

SyedWaqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. 2022. Restormer: Efficient transformer for highresolution image restoration. In CVPR.

[44]

Jiangning Zhang, Xiangtai Li, Jian Li, Liang Liu, Zhucun Xue, Boshen Zhang, Zhengkai Jiang, Tianxin Huang, Yabiao Wang, and Chengjie Wang. 2023. Rethinking mobile block for efficient attention-based models. In ICCV.

[45]

Siming Zheng and Xin Yuan. 2023. Unfolding framework with prior of convolution-transformer mixture and uncertainty estimation for video snapshot compressive imaging. In ICCV.

Index Terms

Towards Real-time Video Compressive Sensing on Mobile Devices
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction

Recommendations

GPU Assisted Towards Real-Time Reconstruction for Dual-Camera Compressive Hyperspectral Imaging
Advances in Multimedia Information Processing – PCM 2018
Abstract
The dual-camera compressive hyperspectral imager (DCCHI) can capture 3D hyperspectral image (HSI) with a single snapshot. However, due to the high computation complexity of reconstruction methods, DCCHI cannot apply to the time-crucial ...
Decentralized search and retrieval for mobile networks using SMS
WIMOB '12: Proceedings of the 2012 IEEE 8th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)

This paper describes the iTrust over SMS decentralized search and retrieval system for mobile networks. Any mobile device in the iTrust network can communicate with any other mobile device in the iTrust network to distribute, search for, and retrieve ...
Video streaming to mobile handheld devices: challenges in decoding, adaptation, and browsing
MCAM'07: Proceedings of the 2007 international conference on Multimedia content analysis and mining

Growing popularity and richer functionality of contemporary mobile handheld devices such as PDAs and smart phones have enabled emerging video streaming applications to these devices via various wireless networks. However, these handheld devices are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
145
Total Downloads

Downloads (Last 12 months)145
Downloads (Last 6 weeks)58

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten