Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Continuous Space-Time Video Super-Resolution with Multi-Stage Motion Information Reorganization

Published: 23 September 2024 Publication History

Abstract

Space-time video super-resolution (ST-VSR) aims to simultaneously expand a given source video to a higher frame rate and resolution. However, most existing schemes either consider fixed intermediate time and scale or fail to exploit long-range temporal information due to model design or inefficient motion estimation and compensation. To address these problems, we propose a continuous ST-VSR method to convert the given video to any frame rate and spatial resolution with Multi-stage Motion information reorganization (MsMr). To achieve time-arbitrary interpolation, we propose a forward warping guided frame synthesis module and an optical flow-guided context consistency loss to better approximate extreme motion and preserve similar structures among input and prediction frames. To realize continuous spatial upsampling, we design a memory-friendly cascading depth-to-space module. Meanwhile, with the sophisticated reorganization of optical flow, MsMr realizes more efficient motion estimation and motion compensation, making it possible to propagate information from long-range neighboring frames and achieve better reconstruction quality. Extensive experiments show that the proposed algorithm is flexible and performs better on various datasets than the state-of-the-art methods. The code will be available at https://github.com/hahazh/LD-STVSR.

References

[1]
Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, and Ming-Hsuan Yang. 2019. Depth-Aware Video Frame Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3703–3712.
[2]
Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4778–4787.
[3]
Kelvin C. K. Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2021. BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4947–4956.
[4]
Kelvin C.K. Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy. 2022. BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5972–5981.
[5]
Yinbo Chen, Sifei Liu, and Xiaolong Wang. 2021. Learning Continuous Image Representation with Local Implicit Image Function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8628–8638.
[6]
Yi-Hsin Chen, Si-Cun Chen, Yen-Yu Lin, and Wen-Hsiao Peng. 2023. MoTIF: Learning Motion Trajectories with Local Implicit Neural Functions for Continuous Space-Time Video Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 23131–23141.
[7]
Zeyuan Chen, Yinbo Chen, Jingwen Liu, Xingqian Xu, Vidit Goel, Zhangyang Wang, Humphrey Shi, and Xiaolong Wang. 2022. VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2047–2057.
[8]
Xianhang Cheng and Zhenzhong Chen. 2020. Video Frame Interpolation via Deformable Separable Convolution. In Proceedings of the AAAI Conference on Artificial Intelligence. 10607–10614.
[9]
Xianhang Cheng and Zhenzhong Chen. 2022. Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2022), 7029–7045.
[10]
Jiong Dong, Kaoru Ota, and Mianxiong Dong. 2023. Video Frame Interpolation: A Comprehensive Survey. ACM Transactions on Multimedia Computing Communications and Applications 19, 2s (2023), 78:1–78:31.
[11]
John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. Deepstereo: Learning to Predict New Views from the World's Imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5515–5524.
[12]
Zhicheng Geng, Luming Liang, Tianyu Ding, and Ilya Zharkov. 2022. Rstt: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17441–17451.
[13]
Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. 2019. Recurrent Back-Projection Network for Video Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3897–3906.
[14]
Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. 2020. Space-Time-Aware Multi-Resolution Video Enhancement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2859–2868.
[15]
Xuecai Hu, Haoyuan Mu, Xiangyu Zhang, Zilei Wang, Tieniu Tan, and Jian Sun. 2019. Meta-SR: A Magnification-Arbitrary Network for Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1575–1584.
[16]
Zhewei Huang, Ailin Huang, Xiaotao Hu, Chen Hu, Jun Xu, and Shuchang Zhou. 2024. Scale-Adaptive Feature Aggregation for Efficient Space-Time Video Super-Resolution. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 4228–4239.
[17]
Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi, and Shuchang Zhou. 2022. Real-Time Intermediate Flow Estimation for Video Frame Interpolation. In Proceedings of the European Conference on Computer Vision. 624–642.
[18]
Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik G. Learned-Miller, and Jan Kautz. 2018. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9000–9008.
[19]
Tarun Kalluri, Deepak Pathak, Manmohan Chandraker, and Du Tran. 2023. FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation. In IEEE/CVF Winter Conference on Applications of Computer Vision. 2070–2081.
[20]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv:1412.6980.
[21]
Lingtong Kong, Boyuan Jiang, Donghao Luo, Wenqing Chu, Xiaoming Huang, Ying Tai, Chengjie Wang, and Jie Yang. 2022. Ifrnet: Intermediate Feature Refine Network for Efficient Frame Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1969–1978.
[22]
Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2017. Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 624–632.
[23]
Jaewon Lee and Kyong Hwan Jin. 2022. Local Texture Estimator for Implicit Representation Function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1919–1928.
[24]
Chen Li, Li Song, Rong Xie, and Wenjun Zhang. 2023. Local Bidirection Recurrent Network for Efficient Video Deblurring with the Fused Temporal Merge Module. ACM Transactions on Multimedia Computing Communications and Applications 19, 5s (2023), 170:1–170:18.
[25]
Ce Liu and Deqing Sun. 2011. A Bayesian Approach to Adaptive Video Super Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 209–216.
[26]
Seungjun Nah, Sungyong Baik, Seokil Hong, Gyeongsik Moon, Sanghyun Son, Radu Timofte, and Kyoung Mu Lee. 2019. NTIRE 2019 Challenge on Video Deblurring and Super-Resolution: Dataset and Study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[27]
Simon Niklaus and Feng Liu. 2020. Softmax Splatting for Video Frame Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5437–5446.
[28]
Simon Niklaus, Long Mai, and Feng Liu. 2017. Video Frame Interpolation via Adaptive Separable Convolution. In Proceedings of the IEEE International Conference on Computer Vision. 261–270.
[29]
Simon Niklaus, Long Mai, and Feng Liu. 2018. Context-Aware Synthesis for Video Frame Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1701–1710.
[30]
Simon Niklaus, Long Mai, and Oliver Wang. 2021. Revisiting Adaptive Convolutions for Video Frame Interpolation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 1099–1109.
[31]
Junheum Park, Chul Lee, and Chang-Su Kim. 2021. Asymmetric Bilateral Motion Estimation for Video Frame Interpolation. In Proceedings of the IEEE International Conference on Computer Vision. 14539–14548.
[32]
Anurag Ranjan and Michael J. Black. 2017. Optical Flow Estimation Using a Spatial Pyramid Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4161–4170.
[33]
Wenzhe Shi, Jose Caballero, Ferenc Huszar, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. 2016. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1874–1883.
[34]
Zhihao Shi, Xiaohong Liu, Chengqi Li, Linhui Dai, Jun Chen, Timothy N. Davidson, and Jiying Zhao. 2021. Learning for Unconstrained Space-Time Video Super-Resolution. In Proceedings of the IEEE Transactions on Broadcasting. 345–358.
[35]
Hyeonjun Sim, Jihyong Oh, and Munchurl Kim. 2021. XVFI: Extreme Video Frame Interpolation. In Proceedings of the IEEE International Conference on Computer Vision. 14489–14498.
[36]
Shuochen Su, Mauricio Delbracio, Jue Wang, Guillermo Sapiro, Wolfgang Heidrich, and Oliver Wang. 2017a. Deep Video Deblurring for Hand-Held Cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1279–1288.
[37]
Shuochen Su, Mauricio Delbracio, Jue Wang, Guillermo Sapiro, Wolfgang Heidrich, and Oliver Wang. 2017b. Deep Video Deblurring for Hand-Held Cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 237–246.
[38]
Gary J. Sullivan, Jens-Rainer Ohm, Woojin Han, and Thomas Wiegand. 2012. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649–1668.
[39]
Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, and Jiaya Jia. 2017. Detail-Revealing Deep Video Super-Resolution. In Proceedings of the IEEE International Conference on Computer Vision. 4472–4480.
[40]
Hai Wang, Xiaoyu Xiang, Yapeng Tian, Wenming Yang, and Qingmin Liao. 2023. Deformable Attention Network for Space-Time Video Super-Resolution. In Proceedings of the IEEE Transactions on Neural Networks and Learning Systems, Early Access.
[41]
Longguang Wang, Yulan Guo, Li Liu, Zaiping Lin, Xinpu Deng, and Wei An. 2020. Deep Video Super-Resolution Using HR Optical Flow Estimation. IEEE Transactions on Image Processing 29 (2020), 4323–4336.
[42]
Longguang Wang, Yingqian Wang, Zaiping Lin, Jungang Yang, Wei An, and Yulan Guo. 2021. Learning a Single Network for Scale-Arbitrary Super-Resolution. In Proceedings of the IEEE International Conference on Computer Vision. 4801–4810.
[43]
Xintao Wang, Kelvin C. K. Chan, Ke Yu, Chao Dong, and Chen Change Loy. 2019. EDVR: Video Restoration with Enhanced Deformable Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[44]
Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, and Chenliang Xu. 2020. Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3370–3379.
[45]
Gang Xu, Jun Xu, Zhen Li, Liang Wang, Xing Sun, and Mingming Cheng. 2021. Temporal Modulation Network for Controllable Space-Time Video Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6388–6397.
[46]
Xiangyu Xu, Li Siyao, Wenxiu Sun, Qian Yin, and Ming-Hsuan Yang. 2019. Quadratic Video Interpolation. Advances in Neural Information Processing Systems 32 (2019), 1647–1656.
[47]
Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T. Freeman. 2019. Video Enhancement with Task-Oriented Flow. International Journal of Computer Vision 127, 8 (2019), 1106–1125.
[48]
Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, Tao Lu, Xin Tian, and Jiayi Ma. 2021. Omniscient Video Super-Resolution. In Proceedings of the IEEE International Conference on Computer Vision. 4429–4438.
[49]
Ramin Zabih and John Woodfill. 1994. Non-Parametric Local Transforms for Computing Visual Correspondence. In Proceedings of the European Conference on Computer Vision. 151–158.
[50]
Dengyong Zhang, Pu Huang, Xiangling Ding, Feng Li, Wenjie Zhu, Yun Song, and Gaobo Yang. 2023a. L\({}^{2}\) BEC\({}^{2}\): Local Lightweight Bidirectional Encoding and Channel Attention Cascade for Video Frame Interpolation. ACM Transactions on Multimedia Computing Communications and Applications 19, 2 (2023), 66:1–66:19.
[51]
Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. 2018. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In Proceedings of the European Conference on Computer Vision. 286–301.
[52]
Yuantong Zhang, Huairui Wang, and Zhenzhong Chen. 2022. Controllable Space-Time Video Super-Resolution via Enhanced Bidirectional Flow Warping. In Proceedings of the IEEE International Conference on Visual Communications and Image Processing.
[53]
Yuantong Zhang, Huairui Wang, Han Zhu, and Zhenzhong Chen. 2023b. Optical Flow Reusing for High-Efficiency Space-Time Video Super Resolution. IEEE Transactions on Circuits and Systems for Video Technology 33, 5 (2023), 2116–2128.
[54]
Kun Zhou, Wenbo Li, Xiaoguang Han, and Jiangbo Lu. 2023. Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 22169–22179.
[55]
Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable Convnets v2: More Deformable, Better Results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9308–9316.

Cited By

View all
  • (2024)Cross-Lingual Cross-Modal Retrieval With Noise-Robust Fine-TuningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.340006036:11(5860-5873)Online publication date: 1-Nov-2024

Index Terms

  1. Continuous Space-Time Video Super-Resolution with Multi-Stage Motion Information Reorganization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 9
    September 2024
    780 pages
    EISSN:1551-6865
    DOI:10.1145/3613681
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 September 2024
    Online AM: 21 May 2024
    Accepted: 14 May 2024
    Revised: 29 March 2024
    Received: 04 November 2023
    Published in TOMM Volume 20, Issue 9

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Video super-resolution
    2. video frame interpolation
    3. deep learning

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)246
    • Downloads (Last 6 weeks)43
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Cross-Lingual Cross-Modal Retrieval With Noise-Robust Fine-TuningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.340006036:11(5860-5873)Online publication date: 1-Nov-2024

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media