research-article

Temporal Denoising Mask Synthesis Network for Learning Blind Video Temporal Consistency

Authors:

Heng Tao ShenAuthors Info & Claims

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 475 - 483

https://doi.org/10.1145/3394171.3413788

Published: 12 October 2020 Publication History

Abstract

Recently, developing temporally consistent video-based processing techniques has drawn increasing attention due to the defective extend-ability of existing image-based processing algorithms (e.g., filtering, enhancement, colorization, etc). Generally, applying these image-based algorithms independently to each video frame typically leads to temporal flickering due to the global instability of these algorithms. In this paper, we consider enforcing temporal consistency in a video as a temporal denoising problem that removing the flickering effect in given unstable pre-processed frames. Specifically, we propose a novel model termed Temporal Denoising Mask Synthesis Network (TDMS-Net) that jointly predicts the motion mask, soft optical flow and the refining mask to synthesize the temporal consistent frames. The temporal consistency is learned from the original video and the learned temporal features are applied to reprocess the output frames that are agnostic (blind) to specific image-based processing algorithms. Experimental results on two datasets for 16 different applications demonstrate that the proposed TDMS-Net significantly outperforms two state-of-the-art blind temporal consistency approaches.

Supplementary Material

MP4 File (3394171.3413788.mp4)

Presentation Video.

Download
1111.67 MB

References

[1]

Simon Baker, Daniel Scharstein, J. P. Lewis, Stefan Roth, Michael J. Black, and Richard Szeliski. 2011. A Database and Evaluation Methodology for Optical Flow. International Journal of Computer Vision, Vol. 92 (2011), 1--31.

Digital Library

[2]

Sean Bell, Kavita Bala, and Noah Snavely. 2014. Intrinsic images in the wild. ACM Transactions on Graphics (TOG), Vol. 33 (2014), 1--12.

Digital Library

[3]

Nicolas Bonneel, Julien Rabin, Gabriel Peyré, and Hanspeter Pfister. 2015a. Sliced and radon wasserstein barycenters of measures. Journal of Mathematical Imaging and Vision, Vol. 51 (2015), 22--45.

Digital Library

[4]

Nicolas Bonneel, James Tompkin, Kalyan Sunkavalli, Deqing Sun, Sylvain Paris, and Hanspeter Pfister. 2015b. Blind Video Temporal Consistency. ACM Transactions on Graphics (TOG), Vol. 34 (2015).

Digital Library

[5]

Thomas Brox, Andrés Bruhn, Nils Papenberg, and Joachim Weickert. 2004. High accuracy optical flow estimation based on a theory for warping. In European conference on computer vision. 25--36.

[6]

Andrés Bruhn, Joachim Weickert, and Christoph Schnö rr. 2005.Lucas/Kanade Meets Horn/Schunck: Combining Local and Global Optic Flow Methods. International Journal of Computer Vision, Vol. 61 (2005), 211--231.

Digital Library

[7]

D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. 2012. A naturalistic open source movie for optical flow evaluation. In Proceedings of the European Conference on Computer Vision (ECCV). 611--625.

[8]

Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, and Gang Hua. 2017. Coherent online video style transfer. (2017), 1105--1114.

[9]

Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision. 2758--2766.

Digital Library

[10]

Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, and Andrew Zisserman. 2019. Temporal cycle-consistency learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1801--1810.

[11]

Michaë l Gharbi, Jiawen Chen, Jonathan T. Barron, Samuel W. Hasinoff, and Fré do Durand. 2017. Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics (TOG), Vol. 36 (2017), 118:1--118:12.

Digital Library

[12]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. (2014), 2672--2680.

[13]

Agrim Gupta, Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2017. Characterizing and improving stability in neural style transfer. In Proceedings of the IEEE International Conference on Computer Vision. 4067--4076.

[14]

Kaiming He, Jian Sun, and Xiaoou Tang. 2010. Single image haze removal using dark channel prior. IEEE transactions on pattern analysis and machine intelligence, Vol. 33 (2010), 2341--2353.

[15]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.

[16]

Berthold K. P. Horn and Brian G. Schunck. 1981. Determining Optical Flow. Artif. Intell., Vol. 17 (1981), 185--203.

Digital Library

[17]

Eugene Hsu, Tom Mertens, Sylvain Paris, Shai Avidan, and Frédo Durand. 2008. Light mixture estimation for spatially varying white balance. In ACM SIGGRAPH 2008 papers. 1--7.

Digital Library

[18]

Haozhi Huang, Hao Wang, Wenhan Luo, Lin Ma, Wenhao Jiang, Xiaolong Zhu, Zhifeng Li, and Wei Liu. 2017. Real-Time Neural Style Transfer for Videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7044--7052.

[19]

Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision. 1501--1510.

[20]

Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and textless1MB model size. CoRR, Vol. abs/1602.07360 (2016).

[21]

Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2016. Let there be color! Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics (ToG), Vol. 35 (2016), 1--11.

Digital Library

[22]

Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2462--2470.

[23]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5967--5976.

[24]

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In Proceedings of the European Conference on Computer Vision (ECCV). 694--711.

[25]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR 2015.

[26]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.

[27]

Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, and Ming-Hsuan Yang. 2018. Learning blind video temporal consistency. In Proceedings of the European Conference on Computer Vision (ECCV). 170--185.

Digital Library

[28]

Manuel Lang, Oliver Wang, Tunc Aydin, Aljoscha Smolic, and Markus Gross. 2012. Practical temporal consistency for image-based graphics applications. ACM Transactions on Graphics (ToG), Vol. 31 (2012), 1--8.

Digital Library

[29]

Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. 2017. Universal style transfer via feature transforms. In Advances in neural information processing systems. 386--396.

[30]

Etienne Mémin and Patrick Pérez. 1998. Dense estimation and object-based segmentation of the optical flow with robust techniques. IEEE Transactions on Image Processing, Vol. 7 (1998), 703--719.

Digital Library

[31]

Simon Niklaus and Feng Liu. 2018. Context-Aware Synthesis for Video Frame Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1701--1710.

[32]

Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially-Adaptive Normalization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2337--2346.

[33]

Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus H. Gross, and Alexander Sorkine-Hornung. 2016. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 724--732.

[34]

Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, and Cordelia Schmid. 2015. Epicflow: Edge-preserving interpolation of correspondences for optical flow. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1164--1172.

[35]

Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong Li, Alan Hanjalic, and Heng Tao Shen. 2019. From Deterministic to Generative: Multimodal Stochastic RNNs for Video Captioning. IEEE Transactions on Neural Networks and Learning Systems, Vol. 30, 10 (2019), 3047--3058.

[36]

Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, and Philip HS Torr. 2019 b. Fast online object tracking and segmentation: A unifying approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018a. High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 8798--8807.

[38]

Xiaolong Wang, Allan Jabri, and Alexei A. Efros. 2019 a. Learning Correspondence From the Cycle-Consistency of Time. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2566--2576.

[39]

Yang Wang, Yi Yang, Zhenheng Yang, Liang Zhao, Peng Wang, and Wei Xu. 2018b. Occlusion aware unsupervised learning of optical flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4884--4893.

[40]

Andreas Wedel, Daniel Cremers, Thomas Pock, and Horst Bischof. 2009. Structure-and motion-adaptive regularization for high accuracy optic flow. In International Conference on Computer Vision. 1663--1668.

[41]

Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, and Cordelia Schmid. 2013. DeepFlow: Large displacement optical flow with deep matching. In Proceedings of the IEEE international conference on computer vision. 1385--1392.

Digital Library

[42]

X. Xu, H. Lu, J. Song, Y. Yang, H. T. Shen, and X. Li. 2019. Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval. IEEE Transactions on Cybernetics (2019), 1--14.

[43]

Xing Xu, Fumin Shen, Yang Yang, Heng Tao Shen, and Xuelong Li. 2017. Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval. IEEE Trans. Image Processing, Vol. 26, 5 (2017), 2494--2507.

Digital Library

[44]

Wenhan Yang, Jiaying Liu, and Jiashi Feng. 2019. Frame-Consistent Recurrent Video Deraining With Dual-Level Flow. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1661--1670.

[45]

Chun-Han Yao, Chia-Yang Chang, and Shao-Yi Chien. 2017. Occlusion-aware Video Temporal Consistency. In Proceedings of the 2017 ACM on Multimedia Conference, MM 2017,. 777--785.

Digital Library

[46]

Zili Yi, Hao (Richard) Zhang, Ping Tan, and Minglun Gong. 2017. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation. In The IEEE International Conference on Computer Vision (ICCV). 2868--2876.

[47]

Jason J. Yu, Adam W. Harley, and Konstantinos G. Derpanis. 2016. Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness. In Proceedings of the European Conference on Computer Vision (ECCV). 3--10.

[48]

Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, and Youliang Yan. 2019. Exploiting temporal consistency for real-time video depth estimation. In Proceedings of the IEEE International Conference on Computer Vision. 1725--1734.

[49]

Richard Zhang, Phillip Isola, and Alexei A Efros. 2016. Colorful image colorization. In European conference on computer vision. Springer, 649--666.

[50]

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 586--595.

[51]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223--2232.

Cited By

Rota CBuzzelli MBianco SSchettini R(2024)A RNN for Temporal Consistency in Low-Light Videos Enhanced by Single-Frame MethodsIEEE Signal Processing Letters10.1109/LSP.2024.347596931(2795-2799)Online publication date: 2024
https://doi.org/10.1109/LSP.2024.3475969
Yun JKim MKim HYoo S(2024)Kernel adaptive memory network for blind video super-resolutionExpert Systems with Applications10.1016/j.eswa.2023.122252238(122252)Online publication date: Mar-2024
https://doi.org/10.1016/j.eswa.2023.122252
Buzzelli MRota CBianco SSchettini R(2024)Enforcing Temporal Consistency for Color Constancy in Video SequencesComputational Color Imaging10.1007/978-3-031-72845-7_20(274-288)Online publication date: 18-Oct-2024
https://doi.org/10.1007/978-3-031-72845-7_20
Show More Cited By

Index Terms

Temporal Denoising Mask Synthesis Network for Learning Blind Video Temporal Consistency
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction
  2. Computer graphics
    1. Image manipulation
      1. Image processing
      2. Image-based rendering

Recommendations

Occlusion-aware Video Temporal Consistency
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Image color editing techniques such as color transfer, HDR tone mapping, dehazing, and white balance have been widely used and investigated in recent decades. However, naively employing them to videos frame-by-frame often leads to flickering or color ...
Blind video temporal consistency

Extending image processing techniques to videos is a non-trivial task; applying processing independently to each video frame often leads to temporal inconsistencies, and explicitly encoding temporal consistency requires algorithmic changes. We describe a ...
Temporal Optimization for Face Swapping Video based on Consistency Inheritance
ACM-TURC '24: Proceedings of the ACM Turing Award Celebration Conference - China 2024

Applying existing face swapping algorithms independently to each video frame typically leads to temporal inconsistency. We analyze the inconsistency in the generated results and model inter-frame inconsistency as time-domain noise. We propose a face ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

October 2020

4889 pages

ISBN:9781450379885

DOI:10.1145/3394171

General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Fundamental Research Funds for the Central Universities under Project
Sichuan Science and Technology Program

Conference

MM '20

Sponsor:

SIGMM

MM '20: The 28th ACM International Conference on Multimedia

October 12 - 16, 2020

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
416
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)1

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rota CBuzzelli MBianco SSchettini R(2024)A RNN for Temporal Consistency in Low-Light Videos Enhanced by Single-Frame MethodsIEEE Signal Processing Letters10.1109/LSP.2024.347596931(2795-2799)Online publication date: 2024
https://doi.org/10.1109/LSP.2024.3475969
Yun JKim MKim HYoo S(2024)Kernel adaptive memory network for blind video super-resolutionExpert Systems with Applications10.1016/j.eswa.2023.122252238(122252)Online publication date: Mar-2024
https://doi.org/10.1016/j.eswa.2023.122252
Buzzelli MRota CBianco SSchettini R(2024)Enforcing Temporal Consistency for Color Constancy in Video SequencesComputational Color Imaging10.1007/978-3-031-72845-7_20(274-288)Online publication date: 18-Oct-2024
https://doi.org/10.1007/978-3-031-72845-7_20
Qiu XHan CZhang ZLi BGuo TWang PNie X(2024)BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video DeflickeringComputer Vision – ECCV 202410.1007/978-3-031-72643-9_3(37-53)Online publication date: 22-Nov-2024
https://doi.org/10.1007/978-3-031-72643-9_3
Zhao YPo LLiu KWang XYu WXian PZhang YLiu M(2023)SVCNet: Scribble-Based Video Colorization Network With Temporal AggregationIEEE Transactions on Image Processing10.1109/TIP.2023.329853732(4443-4458)Online publication date: 2023
https://doi.org/10.1109/TIP.2023.3298537
Meng JShen QHe YLiao J(2023)A Temporal Consistency Enhancement Algorithm Based on Pixel Flicker CorrectionNeural Information Processing10.1007/978-981-99-1639-9_6(65-78)Online publication date: 15-Apr-2023
https://doi.org/10.1007/978-981-99-1639-9_6
Ren JZhang ZHong RXu MZhang HZhao MWang MMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Robust Low-Rank Convolution Network for Image DenoisingProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3547954(6211-6219)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3547954
Tang HZhu JWang LZheng QZhang T(2022)Multi-Level Query Interaction for Temporal Language GroundingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2021.311071323:12(25479-25488)Online publication date: Dec-2022
https://doi.org/10.1109/TITS.2021.3110713
Lin HWang MLiu YKou J(2022)Correlation-based and content-enhanced network for video style transferPattern Analysis and Applications10.1007/s10044-022-01106-y26:1(343-355)Online publication date: 18-Sep-2022
https://doi.org/10.1007/s10044-022-01106-y

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten