Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3664647.3680561acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

Towards Real-time Video Compressive Sensing on Mobile Devices

Published: 28 October 2024 Publication History

Abstract

Video Snapshot Compressive Imaging (SCI) uses a low-speed 2D camera to capture high-speed scenes as snapshot compressed measurements, followed by a reconstruction algorithm to retrieve the high-speed video frames. The fast evolving mobile devices and existing high-performance video SCI reconstruction algorithms motivate us to develop mobile reconstruction methods for real-world applications. Yet, it is still challenging to deploy previous reconstruction algorithms on mobile devices due to the complex inference process, let alone real-time mobile reconstruction. To the best of our knowledge, there is no video SCI reconstruction model designed to run on the mobile devices. Towards this end, in this paper, we present an effective approach for video SCI reconstruction, dubbed MobileSCI, which can run at real-time speed on the mobile devices for the first time. Specifically, we first build a U-shaped 2D convolution-based architecture, which is much more efficient and mobile-friendly than previous state-of-the-art reconstruction methods. Besides, an efficient feature mixing block, based on the channel splitting and shuffling mechanisms, is introduced as a novel bottleneck block of our proposed MobileSCI to alleviate the computational burden. Finally, a customized knowledge distillation strategy is utilized to further improve the reconstruction quality. Extensive results on both simulated and real data show that our proposed MobileSCI can achieve superior reconstruction quality with high efficiency on the mobile devices. Particularly, we can reconstruct a 256x256x8 snapshot compressed measurement with real-time performance (about 35 FPS) on an iPhone 15. Code is available at https://github.com/mcao92/MobileSCI.

References

[1]
Yuanhao Cai, Jing Lin, Xiaowan Hu, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, and Luc Van Gool. 2022. Coarse-to-fine sparse transformer for hyperspectral image reconstruction. In ECCV.
[2]
Miao Cao, Lishun Wang, Huan Wang, and Xin Yuan. 2024. A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging. In ECCV.
[3]
Miao Cao, Lishun Wang, Mingyu Zhu, and Xin Yuan. 2024. Hybrid CNNTransformer Architecture for Efficient Large-Scale Video Snapshot Compressive Imaging. IJCV (2024), 1--20.
[4]
Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Xiaoyi Dong, Lu Yuan, and Zicheng Liu. 2022. Mobile-former: Bridging mobilenet and transformer. In CVPR.
[5]
Ziheng Cheng, Bo Chen, Guanliang Liu, Hao Zhang, Ruiying Lu, ZhengjueWang, and Xin Yuan. 2021. Memory-efficient network for large-scale video compressive sensing. In CVPR.
[6]
Ziheng Cheng, Bo Chen, Ruiying Lu, ZhengjueWang, Hao Zhang, Ziyi Meng, and Xin Yuan. 2022. Recurrent neural networks for snapshot compressive imaging. TPAMI 45, 2 (2022), 2264--2281.
[7]
Chao Deng, Yuanlong Zhang, Yifeng Mao, Jingtao Fan, Jinli Suo, Zhili Zhang, and Qionghai Dai. 2019. Sinusoidal sampling enhanced compressive camera for high speed imaging. TPAMI 43, 4 (2019), 1380--1393.
[8]
David L Donoho. 2006. Compressed sensing. IEEE Transactions on Information Theory 52, 4 (2006), 1289--1306.
[9]
Yufei Dou, Miao Cao, Xiaodong Wang, Xing Liu, and Xin Yuan. 2023. Coded aperture temporal compressive digital holographic microscopy. Optics Letters 48, 20 (2023), 5427--5430.
[10]
Kai Han, YunheWang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. 2020. Ghostnet: More features from cheap operations. In CVPR.
[11]
Yasunobu Hitomi, Jinwei Gu, Mohit Gupta, Tomoo Mitsunaga, and Shree K Nayar. 2011. Video from a single coded exposure photograph using a learned over-complete dictionary. In ICCV.
[12]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. 2019. Searching for mobilenetv3. In ICCV.
[13]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
[14]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.
[15]
Chenyu Li, Bing Zhang, Danfeng Hong, Jun Zhou, Gemine Vivone, Shutao Li, and Jocelyn Chanussot. 2024. CasFormer: Cascaded transformers for fusion-aware computational hyperspectral imaging. Information Fusion (2024), 102408.
[16]
Yanyu Li, Ju Hu, Yang Wen, Georgios Evangelidis, Kamyar Salahi, Yanzhi Wang, Sergey Tulyakov, and Jian Ren. 2023. Rethinking vision transformers for mobilenet size and speed. In ICCV.
[17]
Yang Liu, Xin Yuan, Jinli Suo, David J Brady, and Qionghai Dai. 2018. Rank minimization for snapshot compressive imaging. TPAMI 41, 12 (2018), 2990-- 3006.
[18]
Runqiu Luo, Miao Cao, Xing Liu, and Xin Yuan. 2024. Snapshot compressive structured illumination microscopy. Optics Letters 49, 2 (2024), 186--189.
[19]
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In ECCV.
[20]
Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. 2013. Rectifier nonlinearities improve neural network acoustic models. In ICML.
[21]
Muhammad Maaz, Abdelrahman Shaker, Hisham Cholakkal, Salman Khan, Syed Waqas Zamir, Rao Muhammad Anwer, and Fahad Shahbaz Khan. 2022. Edgenext: efficiently amalgamated cnn-transformer architecture for mobile vision applications. In ECCV.
[22]
Junting Pan, Adrian Bulat, Fuwen Tan, Xiatian Zhu, Lukasz Dudziak, Hongsheng Li, Georgios Tzimiropoulos, and Brais Martinez. 2022. Edgevits: Competing light-weight cnns on mobile devices with vision transformers. In ECCV.
[23]
Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbeláez, Alex Sorkine- Hornung, and Luc Van Gool. 2017. The 2017 davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675 (2017).
[24]
Mu Qiao, Ziyi Meng, Jiawei Ma, and Xin Yuan. 2020. Deep learning for video compressive sensing. Apl Photonics 5, 3 (2020), 030801.
[25]
Dikpal Reddy, Ashok Veeraraghavan, and Rama Chellappa. 2011. P2C2: Programmable pixel compressive camera for high speed imaging. In CVPR.
[26]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang- Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR.
[27]
Andranik Sargsyan, Shant Navasardyan, Xingqian Xu, and Humphrey Shi. 2023. Mi-gan: A simple baseline for image inpainting on mobile devices. In ICCV.
[28]
Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P Aitken, Rob Bishop, Daniel Rueckert, and ZehanWang. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR.
[29]
Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Chao Xu, and Yunhe Wang. 2022. GhostNetv2: Enhance cheap operation with long-range attention. In NeurIPS.
[30]
Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, and Anurag Ranjan. 2023. Mobileone: An improved one millisecond mobile backbone. In CVPR.
[31]
Lishun Wang, Miao Cao, and Xin Yuan. 2023. Efficientsci: Densely connected network with space-time factorization for large-scale video snapshot compressive imaging. In CVPR.
[32]
Lishun Wang, Miao Cao, Yong Zhong, and Xin Yuan. 2022. Spatial-temporal transformer for video snapshot compressive imaging. TPAMI 45, 7 (2022), 9072-- 9089.
[33]
LishunWang, ZongliangWu, Yong Zhong, and Xin Yuan. 2022. Snapshot spectral compressive imaging reconstruction using convolution and contextual Transformer. Photonics Research 10, 8 (2022), 1848--1858.
[34]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. TIP 13, 4 (2004), 600--612.
[35]
Zhengjue Wang, Hao Zhang, Ziheng Cheng, Bo Chen, and Xin Yuan. 2021. Metasci: Scalable and adaptive reconstruction for video compressive sensing. In CVPR.
[36]
Zhuoyuan Wu, Jian Zhang, and Chong Mou. 2021. Dense Deep Unfolding Network With 3D-CNN Prior for Snapshot Compressive Imaging. In ICCV.
[37]
Chengshuai Yang, Shiyu Zhang, and Xin Yuan. 2022. Ensemble learning priors unfolding for scalable Snapshot Compressive Sensing. In ECCV.
[38]
Jianbo Yang, Xuejun Liao, Xin Yuan, Patrick Llull, David J Brady, Guillermo Sapiro, and Lawrence Carin. 2014. Compressive sensing by learning a Gaussian mixture model from measurements. TIP 24, 1 (2014), 106--119.
[39]
Xin Yuan. 2016. Generalized alternating projection based total variation minimization for compressive sensing. In ICIP.
[40]
Xin Yuan, David J Brady, and Aggelos K Katsaggelos. 2021. Snapshot compressive imaging: Theory, algorithms, and applications. IEEE Signal Processing Magazine 38, 2 (2021), 65--88.
[41]
Xin Yuan, Yang Liu, Jinli Suo, and Qionghai Dai. 2020. Plug-and-play algorithms for large-scale snapshot compressive imaging. In CVPR.
[42]
Xin Yuan, Yang Liu, Jinli Suo, Fredo Durand, and Qionghai Dai. 2021. Plug-and- Play Algorithms for Video Snapshot Compressive Imaging. TPAMI 01 (2021), 1--1.
[43]
SyedWaqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. 2022. Restormer: Efficient transformer for highresolution image restoration. In CVPR.
[44]
Jiangning Zhang, Xiangtai Li, Jian Li, Liang Liu, Zhucun Xue, Boshen Zhang, Zhengkai Jiang, Tianxin Huang, Yabiao Wang, and Chengjie Wang. 2023. Rethinking mobile block for efficient attention-based models. In ICCV.
[45]
Siming Zheng and Xin Yuan. 2023. Unfolding framework with prior of convolution-transformer mixture and uncertainty estimation for video snapshot compressive imaging. In ICCV.

Index Terms

  1. Towards Real-time Video Compressive Sensing on Mobile Devices

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Check for updates

    Author Tags

    1. computational imaging
    2. mobile network
    3. mobile system
    4. real-time reconstruction
    5. snapshot compressive imaging

    Qualifiers

    • Research-article

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 145
      Total Downloads
    • Downloads (Last 12 months)145
    • Downloads (Last 6 weeks)58
    Reflects downloads up to 11 Feb 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media