Abstract
Downsampling is one of the most basic image processing operations. Improper spatio-temporal downsampling applied on videos can cause aliasing issues such as moiré patterns in space and the wagon-wheel effect in time. Consequently, the inverse task of upscaling a low-resolution, low frame-rate video in space and time becomes a challenging ill-posed problem due to information loss and aliasing artifacts. In this paper, we aim to solve the space-time aliasing problem by learning a spatio-temporal downsampler. Towards this goal, we propose a neural network framework that jointly learns spatio-temporal downsampling and upsampling. It enables the downsampler to retain the key patterns of the original video and maximizes the reconstruction performance of the upsampler. To make the downsamping results compatible with popular image and video storage formats, the downsampling results are encoded to uint8 with a differentiable quantization layer. To fully utilize the space-time correspondences, we propose two novel modules for explicit temporal propagation and space-time feature rearrangement. Experimental results show that our proposed method significantly boosts the space-time reconstruction quality by preserving spatial textures and motion patterns in both downsampling and upscaling. Moreover, our framework enables a variety of applications, including arbitrary video resampling, blurry frame reconstruction, and efficient video storage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allebach, J., Wong, P.W.: Edge-directed interpolation. In: IEEE International Conference on Image Processing, vol. 3, pp. 707–710. IEEE (1996)
Argaw, D.M., Kim, J., Rameau, F., Kweon, I.S.: Motion-blurred video interpolation and extrapolation. In: AAAI Conference on Artificial Intelligence (2021)
Bao, W., Lai, W.S., Ma, C., Zhang, X., Gao, Z., Yang, M.H.: Depth-aware video frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3703–3712 (2019)
Bao, W., Lai, W.S., Zhang, X., Gao, Z., Yang, M.H.: Memc-net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 43, 933–948 (2019)
Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
Brooks, T., Barron, J.T.: Learning to synthesize motion blur. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6840–6848 (2019)
Caballero, J., et al.: Real-time video super-resolution with spatio-temporal networks and motion compensation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4778–4787 (2017)
Chan, K.C., Wang, X., Yu, K., Dong, C., Loy, C.C.: Understanding deformable alignment in video super-resolution. arXiv preprint arXiv:2009.07265 4(3), 4 (2020)
Chan, K.C., Zhou, S., Xu, X., Loy, C.C.: Basicvsr++: improving video super-resolution with enhanced propagation and alignment. arXiv preprint arXiv:2104.13371 (2021)
Cooper, M., Liu, T., Rieffel, E.: Video segmentation via temporal pattern classification. IEEE Trans. Multimedia 9(3), 610–618 (2007)
Dachille, F., Kaufman, A.: High-degree temporal antialiasing. In: Proceedings Computer Animation, pp. 49–54. IEEE (2000)
Dai, J., et al.: Deformable convolutional networks. In: IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Duchon, C.E.: Lanczos filtering in one and two dimensions. J. Appl. Meteorol. Climatol. 18(8), 1016–1022 (1979)
Dutta, S., Shah, N.A., Mittal, A.: Efficient space-time video super resolution using low-resolution flow and mask upsampling. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 314–323 (2021)
Egan, K., Tseng, Y.T., Holzschuch, N., Durand, F., Ramamoorthi, R.: Frequency analysis and sheared reconstruction for rendering motion blur. ACM Trans. Graph. 28(3), 93–1 (2009)
Haris, M., Shakhnarovich, G., Ukita, N.: Space-time-aware multi-resolution video enhancement. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2859–2868 (2020)
Haris, M., Shakhnarovich, G., Ukita, N.: Recurrent back-projection network for video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3897–3906 (2019)
Holloway, J., Sankaranarayanan, A.C., Veeraraghavan, A., Tambe, S.: Flutter shutter video camera for compressive sensing of videos. In: International Conference on Computational Photography, pp. 1–9. IEEE (2012)
Huang, Y., Wang, W., Wang, L.: Video super-resolution via bidirectional recurrent convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 1015–1028 (2017)
Isobe, T., Jia, X., Gu, S., Li, S., Wang, S., Tian, Q.: Video super-resolution with recurrent structure-detail network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 645–660. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_38
Jiang, H., Sun, D., Jampani, V., Yang, M.H., Learned-Miller, E., Kautz, J.: Super slomo: high quality estimation of multiple intermediate frames for video interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9000–9008 (2018)
Jin, M., Hu, Z., Favaro, P.: Learning to extract flawless slow motion from blurry videos. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8112–8121 (2019)
Jin, M., Meishvili, G., Favaro, P.: Learning to extract a video sequence from a single motion-blurred image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6334–6342 (2018)
Jo, Y., Wug Oh, S., Kang, J., Joo Kim, S.: Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3224–3232 (2018)
Kalluri, T., Pathak, D., Chandraker, M., Tran, D.: Flavr: Flow-agnostic video representations for fast frame interpolation. arXiv preprint arXiv:2012.08512 (2020)
Keelan, B.: Handbook of image quality: characterization and prediction. CRC Press (2002)
Keys, R.: Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. 29(6), 1153–1160 (1981)
Kim, H., Choi, M., Lim, B., Mu Lee, K.: Task-aware image downscaling. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 419–434. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_25
Kim, S.Y., Oh, J., Kim, M.: Fisr: deep joint frame interpolation and super-resolution with a multi-scale temporal loss. In: AAAI Conference on Artificial Intelligence, pp. 11278–11286 (2020)
Kopf, J., Shamir, A., Peers, P.: Content-adaptive image downscaling. ACM Trans. Graph. 32(6), 1–8 (2013)
Korein, J., Badler, N.: Temporal anti-aliasing in computer generated animation. In: Annual Conference on Computer Graphics and Interactive Techniques. pp. 377–388 (1983)
Li, Y., Jin, P., Yang, F., Liu, C., Yang, M.H., Milanfar, P.: Comisr: Compression-informed video super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 2543–2552 (2021)
Liang, J., Cao, J., Fan, Y., Zhang, K., Ranjan, R., Li, Y., Timofte, R., Van Gool, L.: Vrt: A video restoration transformer. arXiv preprint arXiv:2201.12288 (2022)
Liang, J., Fan, Y., Xiang, X., Ranjan, R., Ilg, E., Green, S., Cao, J., Zhang, K., Timofte, R., Van Gool, L.: Recurrent video restoration transformer with guided deformable attention. arXiv preprint arXiv:2206.02146 (2022)
Lim, B., Lee, K.M.: Deep recurrent resnet for video super-resolution. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. pp. 1452–1455. IEEE (2017)
Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 136–144 (2017)
Liu, C., Sun, D.: A bayesian approach to adaptive video super resolution. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 209–216. IEEE (2011)
Liu, Z., Yeh, R.A., Tang, X., Liu, Y., Agarwala, A.: Video frame synthesis using deep voxel flow. In: IEEE International Conference on Computer Vision, pp. 4463–4471 (2017)
Long, G., Kneip, L., Alvarez, J.M., Li, H., Zhang, X., Yu, Q.: Learning image matching by simply watching video. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 434–450. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_26
Meyer, S., Wang, O., Zimmer, H., Grosse, M., Sorkine-Hornung, A.: Phase-based frame interpolation for video. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1410–1418 (2015)
Mitchell, D.P., Netravali, A.N.: Reconstruction filters in computer-graphics. ACM Siggraph Comput. Graph. 22(4), 221–228 (1988)
Mudenagudi, U., Banerjee, S., Kalra, P.K.: Space-time super-resolution using graph-cut optimization. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 995–1008 (2010)
Nah, S., Baik, S., Hong, S., Moon, G., Son, S., Timofte, R., Lee, K.M.: Ntire 2019 challenge on video deblurring and super-resolution: Dataset and study. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 2019
Niklaus, S., Liu, F.: Context-aware synthesis for video frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1710 (2018)
Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive convolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 670–679 (2017)
Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive separable convolution. In: IEEE International Conference on Computer Vision, pp. 261–270 (2017)
Niyogi, S.A., Adelson, E.H.: Analyzing gait with spatiotemporal surfaces. In: IEEE Workshop on Motion of Non-rigid and Articulated Objects, pp. 64–69. IEEE (1994)
Oeztireli, A.C., Gross, M.: Perceptually based downscaling of images. ACM Trans. Graph. 34(4), 1–10 (2015)
Zuckerman, L.P., Naor, E., Pisha, G., Bagon, S., Irani, M.: Across scales and across dimensions: temporal super-resolution using deep internal learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 52–68. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_4
Purohit, K., Shah, A., Rajagopalan, A.: Bringing alive blurred moments. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6830–6839 (2019)
Raskar, R., Agrawal, A., Tumblin, J.: Coded exposure photography: Motion deblurring using fluttered shutter. ACM Trans. Graph. 25(3), 795–804 (2006)
Ray, S.: Scientific photography and applied imaging. Routledge (1999)
Reinhard, E., Heidrich, W., Debevec, P., Pattanaik, S., Ward, G., Myszkowski, K.: High dynamic range imaging: acquisition, display, and image-based lighting. Morgan Kaufmann (2010)
Rengarajan, V., Zhao, S., Zhen, R., Glotzbach, J., Sheikh, H., Sankaranarayanan, A.C.: Photosequencing of motion blur using short and long exposures. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 510–511 (2020)
Research, M.: fvcore (2019). https://github.com/facebookresearch/fvcore
Rim, J., Kim, G., Kim, J., Lee, J., Lee, S., Cho, S.: Realistic blur synthesis for learning image deblurring. arXiv preprint arXiv:2202.08771 (2022)
Rogozhnikov, A.: Einops: clear and reliable tensor manipulations with einstein-like notation. In: International Conference on Learning Representations (2021)
Sajjadi, M.S., Vemulapalli, R., Brown, M.: Frame-recurrent video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6626–6634 (2018)
Shahar, O., Faktor, A., Irani, M.: Space-time super-resolution from a single video. IEEE (2011)
Shannon, C.: Communication in the presence of noise. Proc. IRE 37(1), 10–21 (1949). https://doi.org/10.1109/jrproc.1949.232969
Shechtman, E., Caspi, Y., Irani, M.: Increasing space-time resolution in video. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 753–768. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47969-4_50
Shen, W., Bao, W., Zhai, G., Chen, L., Min, X., Gao, Z.: Blurry video frame interpolation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5114–5123 (2020)
Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)
Shinya, M.: Spatial anti-aliasing for animation sequences with spatio-temporal filtering. In: Annual Conference on Computer Graphics and Interactive Techniques, pp. 289–296 (1993)
Sim, H., Oh, J., Kim, M.: Xvfi: extreme video frame interpolation. In: IEEE International Conference on Computer Vision (2021)
Sun, W., Chen, Z.: Learned image downscaling for upscaling using content adaptive resampler. IEEE Trans. Image Process. 29, 4027–4040 (2020)
Suzuki, T.: Optical low-pass filter (Jan 1987)
Takeda, H., Van Beek, P., Milanfar, P.: Spatiotemporal video upscaling using motion-assisted steering kernel (mask) regression. In: Mrak, M., Grgic, M., Kunt, M. (eds.) High-Quality Visual Experience, pp. 245–274. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12802-8_10
Talebi, H., Milanfar, P.: Learning to resize images for computer vision tasks. In: IEEE International Conference on Computer Vision, pp. 497–506, October 2021
Tao, X., Gao, H., Liao, R., Wang, J., Jia, J.: Detail-revealing deep video super-resolution. In: IEEE International Conference on Computer Vision, pp. 4472–4480 (2017)
Tian, Y., Zhang, Y., Fu, Y., Xu, C.: Tdan: Temporally-deformable alignment network for video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3360–3369 (2020)
Tomar, S.: Converting video formats with ffmpeg. Linux J. 2006(146), 10 (2006)
Trentacoste, M., Mantiuk, R., Heidrich, W.: Blur-aware image downsampling. In: Computer Graphics Forum, vol. 30, pp. 573–582. Wiley Online Library (2011)
Triggs, B.: Empirical filter estimation for subpixel interpolation and matching. In: IEEE International Conference on Computer Visio, vol. 2, pp. 550–557. IEEE (2001)
Wang, H., Xiang, X., Tian, Y., Yang, W., Liao, Q.: Stdan: deformable attention network for space-time video super-resolution. arXiv preprint arXiv:2203.06841 (2022)
Wang, L., Guo, Y., Lin, Z., Deng, X., An, W.: Learning for video super-resolution through hr optical flow estimation. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11361, pp. 514–529. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20887-5_32
Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Weber, N., Waechter, M., Amend, S.C., Guthe, S., Goesele, M.: Rapid, detail-preserving image downscaling. ACM Trans. Graph. 35(6), 1–6 (2016)
Wei, Y., Chen, L., Song, L.: Video compression based on jointly learned down-sampling and super-resolution networks. In: 2021 International Conference on Visual Communications and Image Processing (VCIP), pp. 1–5. IEEE (2021)
Wills, J., Agarwal, S., Belongie, S.: What went where [motion segmentation]. In: IEEE Conference on Computer Vision and Pattern Recognition (2003)
Xiang, X., Lin, Q., Allebach, J.P.: Boosting high-level vision with joint compression artifacts reduction and super-resolution. In: International Conference on Pattern Recognition. IEEE (2020)
Xiang, X., Tian, Y., Zhang, Y., Fu, Y., Allebach, J.P., Xu, C.: Zooming slow-mo: fast and accurate one-stage space-time video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3370–3379 (2020)
Xiang, X., Tian, Y., Zhang, Y., Fu, Y., Allebach, J.P., Xu, C.: Zooming slowmo: an efficient one-stage framework for space-time video super-resolution. arXiv preprint arXiv:2104.07473 (2021)
Xiao, Z., Xiong, Z., Fu, X., Liu, D., Zha, Z.J.: Space-time video super-resolution using temporal profiles. In: ACM International Conference on Multimedia, pp. 664–672 (2020)
Xu, G., Xu, J., Li, Z., Wang, L., Sun, X., Cheng, M.M.: Temporal modulation network for controllable space-time video super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6388–6397 (2021)
Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. Int. J. Comput. Vision 127(8), 1106–1125 (2019)
Yuan, L., Sun, J., Quan, L., Shum, H.Y.: Image deblurring with blurred/noisy image pairs. ACM Trans. Graph. 26(3) (2007)
Zelnik-Manor, L., Irani, M.: Event-based analysis of video. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. II-II. IEEE (2001)
Zhang, K., Luo, W., Stenger, B., Ren, W., Ma, L., Li, H.: Every moment matters: Detail-aware networks to bring a blurry image alive. In: ACM International Conference on Multimedia, pp. 384–392 (2020)
Zhang, R.: Making convolutional networks shift-invariant again. In: International Conference on Machine Learning, pp. 7324–7334. PMLR (2019)
Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2472–2481 (2018)
Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: more deformable, better results. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9308–9316 (2019)
Zou, X., Xiao, F., Yu, Z., Lee, Y.: Delving deeper into anti-aliasing in convnets. In: British Machine Vision Conference (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xiang, X., Tian, Y., Rengarajan, V., Young, L.D., Zhu, B., Ranjan, R. (2022). Learning Spatio-Temporal Downsampling for Effective Video Upscaling. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13678. Springer, Cham. https://doi.org/10.1007/978-3-031-19797-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-19797-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19796-3
Online ISBN: 978-3-031-19797-0
eBook Packages: Computer ScienceComputer Science (R0)