Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3394171.3413788acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Temporal Denoising Mask Synthesis Network for Learning Blind Video Temporal Consistency

Published: 12 October 2020 Publication History

Abstract

Recently, developing temporally consistent video-based processing techniques has drawn increasing attention due to the defective extend-ability of existing image-based processing algorithms (e.g., filtering, enhancement, colorization, etc). Generally, applying these image-based algorithms independently to each video frame typically leads to temporal flickering due to the global instability of these algorithms. In this paper, we consider enforcing temporal consistency in a video as a temporal denoising problem that removing the flickering effect in given unstable pre-processed frames. Specifically, we propose a novel model termed Temporal Denoising Mask Synthesis Network (TDMS-Net) that jointly predicts the motion mask, soft optical flow and the refining mask to synthesize the temporal consistent frames. The temporal consistency is learned from the original video and the learned temporal features are applied to reprocess the output frames that are agnostic (blind) to specific image-based processing algorithms. Experimental results on two datasets for 16 different applications demonstrate that the proposed TDMS-Net significantly outperforms two state-of-the-art blind temporal consistency approaches.

Supplementary Material

MP4 File (3394171.3413788.mp4)
Presentation Video.

References

[1]
Simon Baker, Daniel Scharstein, J. P. Lewis, Stefan Roth, Michael J. Black, and Richard Szeliski. 2011. A Database and Evaluation Methodology for Optical Flow. International Journal of Computer Vision, Vol. 92 (2011), 1--31.
[2]
Sean Bell, Kavita Bala, and Noah Snavely. 2014. Intrinsic images in the wild. ACM Transactions on Graphics (TOG), Vol. 33 (2014), 1--12.
[3]
Nicolas Bonneel, Julien Rabin, Gabriel Peyré, and Hanspeter Pfister. 2015a. Sliced and radon wasserstein barycenters of measures. Journal of Mathematical Imaging and Vision, Vol. 51 (2015), 22--45.
[4]
Nicolas Bonneel, James Tompkin, Kalyan Sunkavalli, Deqing Sun, Sylvain Paris, and Hanspeter Pfister. 2015b. Blind Video Temporal Consistency. ACM Transactions on Graphics (TOG), Vol. 34 (2015).
[5]
Thomas Brox, Andrés Bruhn, Nils Papenberg, and Joachim Weickert. 2004. High accuracy optical flow estimation based on a theory for warping. In European conference on computer vision. 25--36.
[6]
Andrés Bruhn, Joachim Weickert, and Christoph Schnö rr. 2005.Lucas/Kanade Meets Horn/Schunck: Combining Local and Global Optic Flow Methods. International Journal of Computer Vision, Vol. 61 (2005), 211--231.
[7]
D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. 2012. A naturalistic open source movie for optical flow evaluation. In Proceedings of the European Conference on Computer Vision (ECCV). 611--625.
[8]
Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, and Gang Hua. 2017. Coherent online video style transfer. (2017), 1105--1114.
[9]
Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision. 2758--2766.
[10]
Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, and Andrew Zisserman. 2019. Temporal cycle-consistency learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1801--1810.
[11]
Michaë l Gharbi, Jiawen Chen, Jonathan T. Barron, Samuel W. Hasinoff, and Fré do Durand. 2017. Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics (TOG), Vol. 36 (2017), 118:1--118:12.
[12]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. (2014), 2672--2680.
[13]
Agrim Gupta, Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2017. Characterizing and improving stability in neural style transfer. In Proceedings of the IEEE International Conference on Computer Vision. 4067--4076.
[14]
Kaiming He, Jian Sun, and Xiaoou Tang. 2010. Single image haze removal using dark channel prior. IEEE transactions on pattern analysis and machine intelligence, Vol. 33 (2010), 2341--2353.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.
[16]
Berthold K. P. Horn and Brian G. Schunck. 1981. Determining Optical Flow. Artif. Intell., Vol. 17 (1981), 185--203.
[17]
Eugene Hsu, Tom Mertens, Sylvain Paris, Shai Avidan, and Frédo Durand. 2008. Light mixture estimation for spatially varying white balance. In ACM SIGGRAPH 2008 papers. 1--7.
[18]
Haozhi Huang, Hao Wang, Wenhan Luo, Lin Ma, Wenhao Jiang, Xiaolong Zhu, Zhifeng Li, and Wei Liu. 2017. Real-Time Neural Style Transfer for Videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7044--7052.
[19]
Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision. 1501--1510.
[20]
Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and textless1MB model size. CoRR, Vol. abs/1602.07360 (2016).
[21]
Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2016. Let there be color! Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics (ToG), Vol. 35 (2016), 1--11.
[22]
Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2462--2470.
[23]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5967--5976.
[24]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In Proceedings of the European Conference on Computer Vision (ECCV). 694--711.
[25]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR 2015.
[26]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
[27]
Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, and Ming-Hsuan Yang. 2018. Learning blind video temporal consistency. In Proceedings of the European Conference on Computer Vision (ECCV). 170--185.
[28]
Manuel Lang, Oliver Wang, Tunc Aydin, Aljoscha Smolic, and Markus Gross. 2012. Practical temporal consistency for image-based graphics applications. ACM Transactions on Graphics (ToG), Vol. 31 (2012), 1--8.
[29]
Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. 2017. Universal style transfer via feature transforms. In Advances in neural information processing systems. 386--396.
[30]
Etienne Mémin and Patrick Pérez. 1998. Dense estimation and object-based segmentation of the optical flow with robust techniques. IEEE Transactions on Image Processing, Vol. 7 (1998), 703--719.
[31]
Simon Niklaus and Feng Liu. 2018. Context-Aware Synthesis for Video Frame Interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1701--1710.
[32]
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially-Adaptive Normalization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2337--2346.
[33]
Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus H. Gross, and Alexander Sorkine-Hornung. 2016. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 724--732.
[34]
Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, and Cordelia Schmid. 2015. Epicflow: Edge-preserving interpolation of correspondences for optical flow. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1164--1172.
[35]
Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong Li, Alan Hanjalic, and Heng Tao Shen. 2019. From Deterministic to Generative: Multimodal Stochastic RNNs for Video Captioning. IEEE Transactions on Neural Networks and Learning Systems, Vol. 30, 10 (2019), 3047--3058.
[36]
Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, and Philip HS Torr. 2019 b. Fast online object tracking and segmentation: A unifying approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018a. High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 8798--8807.
[38]
Xiaolong Wang, Allan Jabri, and Alexei A. Efros. 2019 a. Learning Correspondence From the Cycle-Consistency of Time. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2566--2576.
[39]
Yang Wang, Yi Yang, Zhenheng Yang, Liang Zhao, Peng Wang, and Wei Xu. 2018b. Occlusion aware unsupervised learning of optical flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4884--4893.
[40]
Andreas Wedel, Daniel Cremers, Thomas Pock, and Horst Bischof. 2009. Structure-and motion-adaptive regularization for high accuracy optic flow. In International Conference on Computer Vision. 1663--1668.
[41]
Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, and Cordelia Schmid. 2013. DeepFlow: Large displacement optical flow with deep matching. In Proceedings of the IEEE international conference on computer vision. 1385--1392.
[42]
X. Xu, H. Lu, J. Song, Y. Yang, H. T. Shen, and X. Li. 2019. Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval. IEEE Transactions on Cybernetics (2019), 1--14.
[43]
Xing Xu, Fumin Shen, Yang Yang, Heng Tao Shen, and Xuelong Li. 2017. Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval. IEEE Trans. Image Processing, Vol. 26, 5 (2017), 2494--2507.
[44]
Wenhan Yang, Jiaying Liu, and Jiashi Feng. 2019. Frame-Consistent Recurrent Video Deraining With Dual-Level Flow. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1661--1670.
[45]
Chun-Han Yao, Chia-Yang Chang, and Shao-Yi Chien. 2017. Occlusion-aware Video Temporal Consistency. In Proceedings of the 2017 ACM on Multimedia Conference, MM 2017,. 777--785.
[46]
Zili Yi, Hao (Richard) Zhang, Ping Tan, and Minglun Gong. 2017. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation. In The IEEE International Conference on Computer Vision (ICCV). 2868--2876.
[47]
Jason J. Yu, Adam W. Harley, and Konstantinos G. Derpanis. 2016. Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness. In Proceedings of the European Conference on Computer Vision (ECCV). 3--10.
[48]
Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, and Youliang Yan. 2019. Exploiting temporal consistency for real-time video depth estimation. In Proceedings of the IEEE International Conference on Computer Vision. 1725--1734.
[49]
Richard Zhang, Phillip Isola, and Alexei A Efros. 2016. Colorful image colorization. In European conference on computer vision. Springer, 649--666.
[50]
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 586--595.
[51]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223--2232.

Cited By

View all
  • (2024)A RNN for Temporal Consistency in Low-Light Videos Enhanced by Single-Frame MethodsIEEE Signal Processing Letters10.1109/LSP.2024.347596931(2795-2799)Online publication date: 2024
  • (2024)Kernel adaptive memory network for blind video super-resolutionExpert Systems with Applications10.1016/j.eswa.2023.122252238(122252)Online publication date: Mar-2024
  • (2024)Enforcing Temporal Consistency for Color Constancy in Video SequencesComputational Color Imaging10.1007/978-3-031-72845-7_20(274-288)Online publication date: 18-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. blind video processing
  2. optical flow
  3. temporal consistency

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • Fundamental Research Funds for the Central Universities under Project
  • Sichuan Science and Technology Program

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A RNN for Temporal Consistency in Low-Light Videos Enhanced by Single-Frame MethodsIEEE Signal Processing Letters10.1109/LSP.2024.347596931(2795-2799)Online publication date: 2024
  • (2024)Kernel adaptive memory network for blind video super-resolutionExpert Systems with Applications10.1016/j.eswa.2023.122252238(122252)Online publication date: Mar-2024
  • (2024)Enforcing Temporal Consistency for Color Constancy in Video SequencesComputational Color Imaging10.1007/978-3-031-72845-7_20(274-288)Online publication date: 18-Oct-2024
  • (2024)BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video DeflickeringComputer Vision – ECCV 202410.1007/978-3-031-72643-9_3(37-53)Online publication date: 22-Nov-2024
  • (2023)SVCNet: Scribble-Based Video Colorization Network With Temporal AggregationIEEE Transactions on Image Processing10.1109/TIP.2023.329853732(4443-4458)Online publication date: 2023
  • (2023)A Temporal Consistency Enhancement Algorithm Based on Pixel Flicker CorrectionNeural Information Processing10.1007/978-981-99-1639-9_6(65-78)Online publication date: 15-Apr-2023
  • (2022)Robust Low-Rank Convolution Network for Image DenoisingProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3547954(6211-6219)Online publication date: 10-Oct-2022
  • (2022)Multi-Level Query Interaction for Temporal Language GroundingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2021.311071323:12(25479-25488)Online publication date: Dec-2022
  • (2022)Correlation-based and content-enhanced network for video style transferPattern Analysis and Applications10.1007/s10044-022-01106-y26:1(343-355)Online publication date: 18-Sep-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media