Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Learning temporal coherence via self-supervision for GAN-based video generation

Published: 12 August 2020 Publication History

Abstract

Our work explores temporal self-supervision for GAN-based video generation tasks. While adversarial training successfully yields generative models for a variety of areas, temporal relationships in the generated data are much less explored. Natural temporal changes are crucial for sequential generation tasks, e.g. video super-resolution and unpaired video translation. For the former, state-of-the-art methods often favor simpler norm losses such as L2 over adversarial training. However, their averaging nature easily leads to temporally smooth results with an undesirable lack of spatial detail. For unpaired video translation, existing approaches modify the generator networks to form spatio-temporal cycle consistencies. In contrast, we focus on improving learning objectives and propose a temporally self-supervised algorithm. For both tasks, we show that temporal adversarial learning is key to achieving temporally coherent solutions without sacrificing spatial detail. We also propose a novel Ping-Pong loss to improve the long-term temporal consistency. It effectively prevents recurrent networks from accumulating artifacts temporally without depressing detailed features. Additionally, we propose a first set of metrics to quantitatively evaluate the accuracy as well as the perceptual quality of the temporal evolution. A series of user studies confirm the rankings computed with these metrics. Code, data, models, and results are provided at https://github.com/thunil/TecoGAN.

Supplemental Material

MP4 File
Presentation video
Transcript for: Presentation video
ZIP File
Supplemental files.

References

[1]
Aayush Bansal, Shugao Ma, Deva Ramanan, and Yaser Sheikh. 2018. Recycle-GAN: Unsupervised Video Retargeting. In The European Conference on Computer Vision (ECCV).
[2]
Dina Bashkirova, Ben Usman, and Kate Saenko. 2018. Unsupervised video-to-video translation. arXiv preprint arXiv:1806.03698 (2018).
[3]
Yochai Blau and Tomer Michaeli. 2018. The perception-distortion tradeoff. In Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA. 6228--6237.
[4]
Ralph Allan Bradley and Milton E Terry. 1952. Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika 39, 3/4 (1952), 324--345.
[5]
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations. https://openreview.net/forum?id=B1xsqj09Fm
[6]
Jose Caballero, Christian Ledig, Andrew P Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation. In CVPR, Vol. 1. 7.
[7]
(CC) Blender Foundation | mango.blender.org. 2011. Tears of Steel. https://mango.blender.org/. Online; accessed 15 Nov. 2018.
[8]
Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, and Gang Hua. 2017. Coherent online video style transfer. In Proc. Intl. Conf. Computer Vision (ICCV).
[9]
Yang Chen, Yingwei Pan, Ting Yao, Xinmei Tian, and Tao Mei. 2019. Mocycle-GAN: Unpaired Video-to-Video Translation. arXiv preprint arXiv:1908.09514 (August 2019).
[10]
Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision. 2758--2766.
[11]
M-L Eckert, Wolfgang Heidrich, and Nils Thuerey. 2018. Coupled fluid density and motion from single views. In Computer Graphics Forum, Vol. 37(8). Wiley Online Library, 47--58.
[12]
Gunnar Farnebäck. 2003. Two-frame motion estimation based on polynomial expansion. In Scandinavian conference on Image analysis. Springer, 363--370.
[13]
Gustav Theodor Fechner and Wilhelm Max Wundt. 1889. Elemente der Psychophysik: erster Theil. Breitkopf & Härtel.
[14]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.
[15]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of wasserstein gans. In Advances in neural information processing systems. 5767--5777.
[16]
Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. 2019. Recurrent back-projection network for video super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3897--3906.
[17]
Mingming He, Dongdong Chen, Jing Liao, Pedro V. Sander, and Lu Yuan. 2018. Deep Exemplar-Based Colorization. ACM Trans. Graph. 37, 4, Article 47 (July 2018), 16 pages.
[18]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-To-Image Translation With Conditional Adversarial Networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19]
Ondřej Jamriška, Šárka Sochorová, Ondřej Texler, Michal Lukáč, Jakub Fišer, Jingwan Lu, Eli Shechtman, and Daniel Sýkora. 2019. Stylizing Video by Example. ACM Transactions on Graphics 38, 4, Article 107 (2019).
[20]
Younghyun Jo, Seoung Wug Oh, Jaeyeon Kang, and Seon Joo Kim. 2018. Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3224--3232.
[21]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision. Springer, 694--711.
[22]
Anton S Kaplanyan, Anton Sochenov, Thomas Leimkühler, Mikhail Okunev, Todd Goodall, and Gizem Rufo. 2019. DeepFovea: neural reconstruction for foveated rendering and video compression using learned statistics of natural videos. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1--13.
[23]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).
[24]
T Kelly, P Guerrero, A Steed, P Wonka, and NJ Mitra. 2018. FrankenGAN: guided detail synthesis for building mass models using style-synchonized GANs. ACM Transactions on Graphics 37, 6 (November 2018).
[25]
Byungsoo Kim, Vinicius C. Azevedo, Markus Gross, and Barbara Solenthaler. 2019. Transport-Based Neural Style Transfer for Smoke Simulations. ACM Trans. Graph. 38, 6, Article 188 (Nov. 2019), 11 pages.
[26]
Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. 2016. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1646--1654.
[27]
Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2017. Deep laplacian pyramid networks for fast and accurate superresolution. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 5.
[28]
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2016. Photo-realistic single image super-resolution using a generative adversarial network. arXiv:1609.04802 (2016).
[29]
Renjie Liao, Xin Tao, Ruiyu Li, Ziyang Ma, and Jiaya Jia. 2015. Video super-resolution via deep draft-ensemble learning. In Proceedings of the IEEE International Conference on Computer Vision. 531--539.
[30]
Ce Liu and Deqing Sun. 2011. A Bayesian approach to adaptive video super resolution. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 209--216.
[31]
Ding Liu, Zhaowen Wang, Yuchen Fan, Xianming Liu, Zhangyang Wang, Shiyu Chang, and Thomas Huang. 2017. Robust video super-resolution with learned temporal dynamics. In Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2526--2534.
[32]
Pengpeng Liu, Michael Lyu, Irwin King, and Jia Xu. 2019. Selflow: Self-supervised learning of optical flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4571--4580.
[33]
Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2794--2802.
[34]
Seungjun Nah, Sungyong Baik, Seokil Hong, Gyeongsik Moon, Sanghyun Son, Radu Timofte, and Kyoung Mu Lee. 2019. NTIRE 2019 Challenge on Video Deblurring and Super-Resolution: Dataset and Study. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
[35]
Suraj Nair, Mohammad Babaeizadeh, Chelsea Finn, Sergey Levine, and Vikash Kumar. 2018. Time reversal as self-supervision. arXiv preprint arXiv:1810.01128 (2018).
[36]
Kwanyong Park, Sanghyun Woo, Dahun Kim, Donghyeon Cho, and In So Kweon. 2019. Preserving Semantic and Temporal Consistency for Unpaired Video-to-Video Translation. In Proceedings of the 27th ACM International Conference on Multimedia (Nice, France) (MM '19). Association for Computing Machinery, New York, NY, USA, 1248--1257.
[37]
Eduardo Pérez-Pellitero, Mehdi SM Sajjadi, Michael Hirsch, and Bernhard Schölkopf. 2018. Photorealistic Video Super Resolution. arXiv preprint arXiv:1807.07930 (2018).
[38]
Ekta Prashnani, Hong Cai, Yasamin Mostofi, and Pradeep Sen. 2018. PieAPP: Perceptual Image-Error Assessment through Pairwise Preference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1808--1817.
[39]
Manuel Ruder, Alexey Dosovitskiy, and Thomas Brox. 2016. Artistic style transfer for videos. In German Conference on Pattern Recognition. Springer, 26--36.
[40]
Mehdi SM Sajjadi, Bernhard Schölkopf, and Michael Hirsch. 2017. Enhancenet: Single image super-resolution through automated texture synthesis. In Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 4501--4510.
[41]
Mehdi SM Sajjadi, Raviteja Vemulapalli, and Matthew Brown. 2018. Frame-Recurrent Video Super-Resolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018).
[42]
Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1874--1883.
[43]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[44]
Vincent Sitzmann, Steven Diamond, Yifan Peng, Xiong Dun, Stephen Boyd, Wolfgang Heidrich, Felix Heide, and Gordon Wetzstein. 2018. End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--13.
[45]
Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, and Jiaya Jia. 2017. Detail-Revealing Deep Video Super-Resolution. In The IEEE International Conference on Computer Vision (ICCV).
[46]
Kiwon Um, Xiangyu Hu, and Nils Thuerey. 2017. Perceptual evaluation of liquid simulation methods. ACM Transactions on Graphics (TOG) 36, 4 (2017), 143.
[47]
Chaoyue Wang, Chang Xu, Chaohui Wang, and Dacheng Tao. 2018b. Perceptual adversarial networks for image-to-image transformation. IEEE Transactions on Image Processing 27, 8 (2018), 4066--4079.
[48]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018a. Video-to-Video Synthesis. In Advances in Neural Information Processing Systems (NIPS).
[49]
Xintao Wang, Kelvin C.K. Chan, Ke Yu, Chao Dong, and Chen Change Loy. 2019a. EDVR: Video restoration with enhanced deformable convolutional networks. In The IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[50]
Xiaolong Wang, Allan Jabri, and Alexei A Efros. 2019b. Learning correspondence from the cycle-consistency of time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2566--2576.
[51]
Bartlomiej Wronski, Ignacio Garcia-Dorado, Manfred Ernst, Damien Kelly, Michael Krainin, Chia-Kai Liang, Marc Levoy, and Peyman Milanfar. 2019. Handheld Multi-Frame Super-Resolution. ACM Trans. Graph. 38, 4, Article 28 (July 2019), 18 pages.
[52]
Zhijie Wu, Xiang Wang, Di Lin, Dani Lischinski, Daniel Cohen-Or, and Hui Huang. 2019. SAGNet: Structure-aware Generative Network for 3D-Shape Modeling. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2019) 38, 4 (2019), 91:1--91:14.
[53]
You Xie, Erik Franz, Mengyu Chu, and Nils Thuerey. 2018. tempoGAN: A Temporally Coherent, Volumetric GAN for Super-resolution Fluid Flow. ACM Transactions on Graphics (TOG) 37, 4 (2018), 95.
[54]
Bo Zhang, Mingming He, Jing Liao, Pedro V Sander, Lu Yuan, Amine Bermak, and Dong Chen. 2019. Deep Exemplar-based Video Colorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8052--8061.
[55]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. arXiv preprint (2018).
[56]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223--2232.
[57]
Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9308--9316.

Cited By

View all
  • (2024)BasicVSR Model Filtering Study Using Squeeze and Excitation BlockJOURNAL OF BROADCAST ENGINEERING10.5909/JBE.2024.29.1.10529:1(105-108)Online publication date: 31-Jan-2024
  • (2024)Physics-Informed Computer Vision: A Review and PerspectivesACM Computing Surveys10.1145/3689037Online publication date: 20-Aug-2024
  • (2024)Environmental Condition Aware Super-Resolution Acceleration Framework in Server-Client HierarchiesACM Transactions on Architecture and Code Optimization10.1145/3678008Online publication date: 12-Jul-2024
  • Show More Cited By

Index Terms

  1. Learning temporal coherence via self-supervision for GAN-based video generation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 39, Issue 4
      August 2020
      1732 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3386569
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 August 2020
      Published in TOG Volume 39, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. generative adversarial network
      2. self-supervision
      3. temporal cycle-consistency
      4. unpaired video translation
      5. video super-resolution

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)204
      • Downloads (Last 6 weeks)17
      Reflects downloads up to 03 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)BasicVSR Model Filtering Study Using Squeeze and Excitation BlockJOURNAL OF BROADCAST ENGINEERING10.5909/JBE.2024.29.1.10529:1(105-108)Online publication date: 31-Jan-2024
      • (2024)Physics-Informed Computer Vision: A Review and PerspectivesACM Computing Surveys10.1145/3689037Online publication date: 20-Aug-2024
      • (2024)Environmental Condition Aware Super-Resolution Acceleration Framework in Server-Client HierarchiesACM Transactions on Architecture and Code Optimization10.1145/3678008Online publication date: 12-Jul-2024
      • (2024)Learning Images Across Scales Using Adversarial TrainingACM Transactions on Graphics10.1145/365819043:4(1-13)Online publication date: 19-Jul-2024
      • (2024)Deepfakes, Phrenology, Surveillance, and More! A Taxonomy of AI Privacy RisksProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642116(1-19)Online publication date: 11-May-2024
      • (2024)3D-Aware Talking-Head Video Motion Transfer2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00488(4942-4952)Online publication date: 3-Jan-2024
      • (2024)Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00154(1506-1516)Online publication date: 3-Jan-2024
      • (2024)Coarse- and Fine-Grained Fusion Hierarchical Network for Hole Filling in View SynthesisIEEE Transactions on Image Processing10.1109/TIP.2023.334130333(322-337)Online publication date: 1-Jan-2024
      • (2024)3DAttGAN: A 3D Attention-Based Generative Adversarial Network for Joint Space-Time Video Super-ResolutionIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2024.33699948:4(3117-3128)Online publication date: Aug-2024
      • (2024)Innovative Workflow for AI-Generated Video: Addressing Limitations, Impact and Implications2024 IEEE Symposium on Industrial Electronics & Applications (ISIEA)10.1109/ISIEA61920.2024.10607369(1-7)Online publication date: 6-Jul-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media