research-article

Continuous Space-Time Video Super-Resolution with Multi-Stage Motion Information Reorganization

Authors:

Yuantong Zhang,

Zhenzhong Chen,

Wenpeng DingAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 9

Article No.: 273, Pages 1 - 23

https://doi.org/10.1145/3665646

Published: 23 September 2024 Publication History

Abstract

Space-time video super-resolution (ST-VSR) aims to simultaneously expand a given source video to a higher frame rate and resolution. However, most existing schemes either consider fixed intermediate time and scale or fail to exploit long-range temporal information due to model design or inefficient motion estimation and compensation. To address these problems, we propose a continuous ST-VSR method to convert the given video to any frame rate and spatial resolution with Multi-stage Motion information reorganization (MsMr). To achieve time-arbitrary interpolation, we propose a forward warping guided frame synthesis module and an optical flow-guided context consistency loss to better approximate extreme motion and preserve similar structures among input and prediction frames. To realize continuous spatial upsampling, we design a memory-friendly cascading depth-to-space module. Meanwhile, with the sophisticated reorganization of optical flow, MsMr realizes more efficient motion estimation and motion compensation, making it possible to propagate information from long-range neighboring frames and achieve better reconstruction quality. Extensive experiments show that the proposed algorithm is flexible and performs better on various datasets than the state-of-the-art methods. The code will be available at https://github.com/hahazh/LD-STVSR.

References

[1]

Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, and Ming-Hsuan Yang. 2019. Depth-Aware Video Frame Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3703–3712.

[2]

Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4778–4787.

[3]

Kelvin C. K. Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2021. BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4947–4956.

[4]

Kelvin C.K. Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy. 2022. BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5972–5981.

[5]

Yinbo Chen, Sifei Liu, and Xiaolong Wang. 2021. Learning Continuous Image Representation with Local Implicit Image Function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8628–8638.

[6]

Yi-Hsin Chen, Si-Cun Chen, Yen-Yu Lin, and Wen-Hsiao Peng. 2023. MoTIF: Learning Motion Trajectories with Local Implicit Neural Functions for Continuous Space-Time Video Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 23131–23141.

[7]

Zeyuan Chen, Yinbo Chen, Jingwen Liu, Xingqian Xu, Vidit Goel, Zhangyang Wang, Humphrey Shi, and Xiaolong Wang. 2022. VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2047–2057.

[8]

Xianhang Cheng and Zhenzhong Chen. 2020. Video Frame Interpolation via Deformable Separable Convolution. In Proceedings of the AAAI Conference on Artificial Intelligence. 10607–10614.

[9]

Xianhang Cheng and Zhenzhong Chen. 2022. Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2022), 7029–7045.

Digital Library

[10]

Jiong Dong, Kaoru Ota, and Mianxiong Dong. 2023. Video Frame Interpolation: A Comprehensive Survey. ACM Transactions on Multimedia Computing Communications and Applications 19, 2s (2023), 78:1–78:31.

Digital Library

[11]

John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. Deepstereo: Learning to Predict New Views from the World's Imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5515–5524.

[12]

Zhicheng Geng, Luming Liang, Tianyu Ding, and Ilya Zharkov. 2022. Rstt: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17441–17451.

[13]

Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. 2019. Recurrent Back-Projection Network for Video Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3897–3906.

[14]

Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. 2020. Space-Time-Aware Multi-Resolution Video Enhancement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2859–2868.

[15]

Xuecai Hu, Haoyuan Mu, Xiangyu Zhang, Zilei Wang, Tieniu Tan, and Jian Sun. 2019. Meta-SR: A Magnification-Arbitrary Network for Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1575–1584.

[16]

Zhewei Huang, Ailin Huang, Xiaotao Hu, Chen Hu, Jun Xu, and Shuchang Zhou. 2024. Scale-Adaptive Feature Aggregation for Efficient Space-Time Video Super-Resolution. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 4228–4239.

[17]

Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi, and Shuchang Zhou. 2022. Real-Time Intermediate Flow Estimation for Video Frame Interpolation. In Proceedings of the European Conference on Computer Vision. 624–642.

Digital Library

[18]

Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik G. Learned-Miller, and Jan Kautz. 2018. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9000–9008.

[19]

Tarun Kalluri, Deepak Pathak, Manmohan Chandraker, and Du Tran. 2023. FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation. In IEEE/CVF Winter Conference on Applications of Computer Vision. 2070–2081.

[20]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv:1412.6980.

[21]

Lingtong Kong, Boyuan Jiang, Donghao Luo, Wenqing Chu, Xiaoming Huang, Ying Tai, Chengjie Wang, and Jie Yang. 2022. Ifrnet: Intermediate Feature Refine Network for Efficient Frame Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1969–1978.

[22]

Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2017. Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 624–632.

[23]

Jaewon Lee and Kyong Hwan Jin. 2022. Local Texture Estimator for Implicit Representation Function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1919–1928.

[24]

Chen Li, Li Song, Rong Xie, and Wenjun Zhang. 2023. Local Bidirection Recurrent Network for Efficient Video Deblurring with the Fused Temporal Merge Module. ACM Transactions on Multimedia Computing Communications and Applications 19, 5s (2023), 170:1–170:18.

Digital Library

[25]

Ce Liu and Deqing Sun. 2011. A Bayesian Approach to Adaptive Video Super Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 209–216.

Digital Library

[26]

Seungjun Nah, Sungyong Baik, Seokil Hong, Gyeongsik Moon, Sanghyun Son, Radu Timofte, and Kyoung Mu Lee. 2019. NTIRE 2019 Challenge on Video Deblurring and Super-Resolution: Dataset and Study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[27]

Simon Niklaus and Feng Liu. 2020. Softmax Splatting for Video Frame Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5437–5446.

[28]

Simon Niklaus, Long Mai, and Feng Liu. 2017. Video Frame Interpolation via Adaptive Separable Convolution. In Proceedings of the IEEE International Conference on Computer Vision. 261–270.

[29]

Simon Niklaus, Long Mai, and Feng Liu. 2018. Context-Aware Synthesis for Video Frame Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1701–1710.

[30]

Simon Niklaus, Long Mai, and Oliver Wang. 2021. Revisiting Adaptive Convolutions for Video Frame Interpolation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 1099–1109.

[31]

Junheum Park, Chul Lee, and Chang-Su Kim. 2021. Asymmetric Bilateral Motion Estimation for Video Frame Interpolation. In Proceedings of the IEEE International Conference on Computer Vision. 14539–14548.

[32]

Anurag Ranjan and Michael J. Black. 2017. Optical Flow Estimation Using a Spatial Pyramid Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4161–4170.

[33]

Wenzhe Shi, Jose Caballero, Ferenc Huszar, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. 2016. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1874–1883.

[34]

Zhihao Shi, Xiaohong Liu, Chengqi Li, Linhui Dai, Jun Chen, Timothy N. Davidson, and Jiying Zhao. 2021. Learning for Unconstrained Space-Time Video Super-Resolution. In Proceedings of the IEEE Transactions on Broadcasting. 345–358.

[35]

Hyeonjun Sim, Jihyong Oh, and Munchurl Kim. 2021. XVFI: Extreme Video Frame Interpolation. In Proceedings of the IEEE International Conference on Computer Vision. 14489–14498.

[36]

Shuochen Su, Mauricio Delbracio, Jue Wang, Guillermo Sapiro, Wolfgang Heidrich, and Oliver Wang. 2017a. Deep Video Deblurring for Hand-Held Cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1279–1288.

[37]

Shuochen Su, Mauricio Delbracio, Jue Wang, Guillermo Sapiro, Wolfgang Heidrich, and Oliver Wang. 2017b. Deep Video Deblurring for Hand-Held Cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 237–246.

[38]

Gary J. Sullivan, Jens-Rainer Ohm, Woojin Han, and Thomas Wiegand. 2012. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649–1668.

Digital Library

[39]

Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, and Jiaya Jia. 2017. Detail-Revealing Deep Video Super-Resolution. In Proceedings of the IEEE International Conference on Computer Vision. 4472–4480.

[40]

Hai Wang, Xiaoyu Xiang, Yapeng Tian, Wenming Yang, and Qingmin Liao. 2023. Deformable Attention Network for Space-Time Video Super-Resolution. In Proceedings of the IEEE Transactions on Neural Networks and Learning Systems, Early Access.

[41]

Longguang Wang, Yulan Guo, Li Liu, Zaiping Lin, Xinpu Deng, and Wei An. 2020. Deep Video Super-Resolution Using HR Optical Flow Estimation. IEEE Transactions on Image Processing 29 (2020), 4323–4336.

[42]

Longguang Wang, Yingqian Wang, Zaiping Lin, Jungang Yang, Wei An, and Yulan Guo. 2021. Learning a Single Network for Scale-Arbitrary Super-Resolution. In Proceedings of the IEEE International Conference on Computer Vision. 4801–4810.

[43]

Xintao Wang, Kelvin C. K. Chan, Ke Yu, Chao Dong, and Chen Change Loy. 2019. EDVR: Video Restoration with Enhanced Deformable Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[44]

Xiaoyu Xiang, Yapeng Tian, Yulun Zhang, Yun Fu, Jan P. Allebach, and Chenliang Xu. 2020. Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3370–3379.

[45]

Gang Xu, Jun Xu, Zhen Li, Liang Wang, Xing Sun, and Mingming Cheng. 2021. Temporal Modulation Network for Controllable Space-Time Video Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6388–6397.

[46]

Xiangyu Xu, Li Siyao, Wenxiu Sun, Qian Yin, and Ming-Hsuan Yang. 2019. Quadratic Video Interpolation. Advances in Neural Information Processing Systems 32 (2019), 1647–1656.

[47]

Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T. Freeman. 2019. Video Enhancement with Task-Oriented Flow. International Journal of Computer Vision 127, 8 (2019), 1106–1125.

Digital Library

[48]

Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, Tao Lu, Xin Tian, and Jiayi Ma. 2021. Omniscient Video Super-Resolution. In Proceedings of the IEEE International Conference on Computer Vision. 4429–4438.

[49]

Ramin Zabih and John Woodfill. 1994. Non-Parametric Local Transforms for Computing Visual Correspondence. In Proceedings of the European Conference on Computer Vision. 151–158.

[50]

Dengyong Zhang, Pu Huang, Xiangling Ding, Feng Li, Wenjie Zhu, Yun Song, and Gaobo Yang. 2023a. L \({}^{2}\) BEC \({}^{2}\) : Local Lightweight Bidirectional Encoding and Channel Attention Cascade for Video Frame Interpolation. ACM Transactions on Multimedia Computing Communications and Applications 19, 2 (2023), 66:1–66:19.

Digital Library

[51]

Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. 2018. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In Proceedings of the European Conference on Computer Vision. 286–301.

Digital Library

[52]

Yuantong Zhang, Huairui Wang, and Zhenzhong Chen. 2022. Controllable Space-Time Video Super-Resolution via Enhanced Bidirectional Flow Warping. In Proceedings of the IEEE International Conference on Visual Communications and Image Processing.

[53]

Yuantong Zhang, Huairui Wang, Han Zhu, and Zhenzhong Chen. 2023b. Optical Flow Reusing for High-Efficiency Space-Time Video Super Resolution. IEEE Transactions on Circuits and Systems for Video Technology 33, 5 (2023), 2116–2128.

Digital Library

[54]

Kun Zhou, Wenbo Li, Xiaoguang Han, and Jiangbo Lu. 2023. Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 22169–22179.

[55]

Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable Convnets v2: More Deformable, Better Results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9308–9316.

Index Terms

Continuous Space-Time Video Super-Resolution with Multi-Stage Motion Information Reorganization
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Patch-based spatio-temporal super-resolution for video with non-rigid motion

This paper presents a novel approach for spatio-temporal video super-resolution. Whereas the task of synthesizing high-frequency information on the spatial domain can be accomplished without introducing arbitrary priors on the image model (beyond the ...
Video Super-Resolution using Multi-scale Pyramid 3D Convolutional Networks
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Video super-resolution (SR) aims at generating high-resolution (HR) frames from consecutive low-resolution (LR) frames. The challenge is how to make use of temporal coherence among neighbouring LR frames. Most previous works use motion estimation and ...
A Robust Video Super-resolution Based on Adaptive Overlapped Block Motion Compensation
SITIS '13: Proceedings of the 2013 International Conference on Signal-Image Technology & Internet-Based Systems

This paper has proposed an improved video super-resolution algorithm to reconstruct high quality pictures in a low resolution video sequence from existing high resolution key frames. Firstly, UMHexagonS algorithm search pattern is performed to obtain ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 9

September 2024

780 pages

EISSN:1551-6865

DOI:10.1145/3613681

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 September 2024

Online AM: 21 May 2024

Accepted: 14 May 2024

Revised: 29 March 2024

Received: 04 November 2023

Published in TOMM Volume 20, Issue 9

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
204
Total Downloads

Downloads (Last 12 months)204
Downloads (Last 6 weeks)44

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents