Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Video Decolorization Based on the CNN and LSTM Neural Network

Published: 22 July 2021 Publication History

Abstract

Video decolorization is the process of transferring three-channel color videos into single-channel grayscale videos, which is essentially the decolorization operation of video frames. Most existing video decolorization algorithms directly apply image decolorization methods to decolorize video frames. However, if we only take the single-frame decolorization result into account, it will inevitably cause temporal inconsistency and flicker phenomenon meaning that the same local content between continuous video frames may display different gray values. In addition, there are often similar local content features between video frames, which indicates redundant information. To solve the preceding problems, this article proposes a novel video decolorization algorithm based on the convolutional neural network and the long short-term memory neural network. First, we design a local semantic content encoder to learn and extract the same local content of continuous video frames, which can better preserve the contrast of video frames. Second, a temporal feature controller based on the bi-directional recurrent neural networks with Long short-term memory units is employed to refine the local semantic features, which can greatly maintain temporal consistency of the video sequence to eliminate the flicker phenomenon. Finally, we take advantages of deconvolution to decode the features to produce the grayscale video sequence. Experiments have indicated that our method can better preserve the local contrast of video frames and the temporal consistency over the state of the-art.

References

[1]
Codruta Orniana Ancuti, Cosmin Ancuti, and Phillipe Bekaert. 2011. Enhancing by saliency-guided decolorization. In Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 257–264.
[2]
Coduta O. Ancuti, Cosmin Ancuti, C. Hermans, and P. Bekaert. 2010. Fusion-based image and video decolorization. In Computer Vision—ACCV 2010. Lecture Notes in Computer Science, Vol. 6492. Springer, 79–92.
[3]
Codruta O. Ancuti, Cosmin Ancuti, Chris Hermans, and Philippe Bekaert. 2010. Image and video decolorization by fusion. In Proceedings of the Asian Conference on Computer Vision. 79–92.
[4]
Raja Bala and Reiner Eschbach. 2004. Spatial color-to-grayscale transform preserving chrominance edge information. In Proceedings of the Color and Imaging Conference, Vol. 2004. 82–86.
[5]
Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, and Gang Hua. 2017. Coherent online video style transfer. In Proceedings of the IEEE International Conference on Computer Vision. 1105–1114.
[6]
Kai Chen and Qiang Huo. 2016. Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 7 (2016), 1185–1193.
[7]
Lianli Gao, Zhao Guo, Hanwang Zhang, Xing Xu, and Heng Tao Shen. 2017. Video captioning with attention-based LSTM and semantic consistency. IEEE Transactions on Multimedia 19, 9 (2017), 2045–2055.
[8]
Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2414–2423.
[9]
Amy A. Gooch, Sven C. Olsen, Jack Tumblin, and Bruce Gooch. 2005. Color2Gray: Salience-preserving color removal. ACM Transactions on Graphics 24, 3 (2005), 634–639.
[10]
Mark Grundland and Neil A. Dodgson. 2007. Decolorize: Fast, contrast enhancing, color to grayscale conversion. Pattern Recognition 40, 11 (2007), 2891–2896.
[11]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. LSTM can solve hard long time lag problems. In Advances in Neural Information Processing Systems. 473–479.
[12]
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2012. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2012), 221–231.
[13]
Zhongping Ji, Mei-e Fang, Yigang Wang, and Weiyin Ma. 2016. Efficient decolorization preserving dominant distinctions. Visual Computer 32, 12 (2016), 1621–1631.
[14]
Cewu Lu, Li Xu, and Jiaya Jia. 2012. Real-time contrast preserving decolorization. In Proceedings of SIGGRAPH Asia 2012 Technical Briefs (SA’12). Article 34, 4 pages.
[15]
Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, Xiangyang Xue, and Shih-Fu Chang. 2018. Modeling multimodal clues in a hybrid deep learning framework for video classification. IEEE Transactions on Multimedia 20, 11 (2018), 3137–3147.
[16]
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725–1732.
[17]
Yongjin Kim, Cheolhun Jang, Julien Demouth, and Seungyong Lee. 2009. Robust color-to-gray via nonlinear global mapping. In Proceedings of ACM SIGGRAPH Asia 2009 Papers (SIGGRAPH Asia’09). Article 161, 4 pages.
[18]
Jung Gap Kuk, Jae Hyun Ahn, and Nam Ik Cho. 2010. A color to grayscale conversion considering local and global contrast. In Proceedings of the Asian Conference on Computer Vision. 513–524.
[19]
Qiegen Liu and Henry Leung. 2019. Variable augmented neural network for decolorization and multi-exposure fusion. Information Fusion 46 (2019), 114–127.
[20]
Qiegen Liu, Peter Xiaoping Liu, Yuhao Wang, and Henry Leung. 2016. Semiparametric decolorization with Laplacian-based perceptual quality metric. IEEE Transactions on Circuits and Systems for Video Technology 27, 9 (2016), 1856–1868.
[21]
Qiegen Liu, Peter X. Liu, Weisi Xie, Yuhao Wang, and Dong Liang. 2015. GcsDecolor: Gradient correlation similarity for efficient contrast preserving decolorization. IEEE Transactions on Image Processing 24, 9 (2015), 2889–2904.
[22]
Qiegen Liu, Guangpu Shao, Yuhao Wang, Junbin Gao, and Henry Leung. 2017. Log-Euclidean metrics for contrast preserving decolorization. IEEE Transactions on Image Processing 26, 12 (2017), 5772–5783.
[23]
Shiguang Liu and Xiaoli Zhang. 2019. Image decolorization combining local features and exposure features. IEEE Transactions on Multimedia 21, 10 (2019), 2461–2472.
[24]
Cewu Lu, Li Xu, and Jiaya Jia. 2012. Contrast preserving decolorization. In Proceedings of the 2012 IEEE International Conference on Computational Photography (ICCP’12). 1–7.
[25]
Cewu Lu, Li Xu, and Jiaya Jia. 2014. Contrast preserving decolorization with perception-based quality metrics. International Journal of Computer Vision 110, 2 (2014), 222–239.
[26]
Laszlo Neumann, Martin Čadik, and Antal Nemcsics. 2007. An efficient perception-based adaptive color to gray transformation. In Proceedings of the 3rd Eurographics Conference on Computational Aesthetics in Graphics, Visualization, and Imaging. 73–80.
[27]
Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 724–732.
[28]
Kaleigh Smith, Pierre-Edouard Landes, Joëlle Thollot, and Karol Myszkowski. 2008. Apparent greyscale: A simple and fast conversion to perceptually accurate images and video. Computer Graphics Forum 27 (2008), 193–200.
[29]
Mingli Song, Dacheng Tao, Chun Chen, Xuelong Li, and Chang Wen Chen. 2010. Color to gray: Visual cue preservation. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (2010), 1537–1552.
[30]
Yibing Song, Linchao Bao, Xiaobin Xu, and Qingxiong Yang. 2013. Decolorization: Is rgb2gray () out? In Proceedings of SIGGRAPH Asia 2013 Technical Briefs. 1–4.
[31]
Yibing Song, Linchao Bao, and Qingxiong Yang. 2014. Real-time video decolorization using bilateral filtering. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 159–166.
[32]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems 27 (2014), 3104–3112.
[33]
Yizhang Tao, Yiyi Shen, Bin Sheng, Ping Li, and Rynson W. H. Lau. 2017. Video decolorization using visual proximity coherence optimization. IEEE Transactions on Cybernetics 48, 5 (2017), 1406–1419.
[34]
Yizhang Tao, Yiyi Shen, Bin Sheng, Ping Li, and Enhua Wu. 2016. Temporal coherent video decolorization using proximity optimization. In Proceedings of the 33rd International Computer Graphics Conference. 41–44.
[35]
Xingxing Wei, Jun Zhu, Sitong Feng, and Hang Su. 2018. Video-to-video translation with global temporal consistency. In Proceedings of the 26th ACM International Conference on Multimedia. 18–25.
[36]
Jiajun Wu, Joshua B. Tenenbaum, and Pushmeet Kohli. 2017. Neural scene de-rendering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 699–707.
[37]
Hongchao Zhang and Shiguang Liu. 2017. Efficient decolorization via perceptual group difference enhancement. In Proceedings of the International Conference on Image and Graphics. 560–569.
[38]
Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, and Youliang Yan. 2019. Exploiting temporal consistency for real-time video depth estimation. In Proceedings of the IEEE International Conference on Computer Vision. 1725–1734.
[39]
Xiaoli Zhang and Shiguang Liu. 2018. Contrast preserving image decolorization combining global features and local semantic features. Visual Computer 34, 6-8 (2018), 1099–1108.
[40]
Hanli Zhao, Haining Zhang, and Xiaogang Jin. 2018. Efficient image decolorization with a multimodal contrast-preserving measure. Computers & Graphics 70 (2018), 251–260.

Cited By

View all
  • (2024)Towards Long Form Audio-visual Video UnderstandingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3672079Online publication date: 7-Jun-2024
  • (2024)Backdoor Two-Stream Video Models on Federated LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3651307Online publication date: 7-Mar-2024
  • (2024)Learning Nighttime Semantic Segmentation the Hard WayACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365003220:7(1-23)Online publication date: 16-May-2024
  • Show More Cited By

Index Terms

  1. Video Decolorization Based on the CNN and LSTM Neural Network

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 3
      August 2021
      443 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3476118
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 July 2021
      Accepted: 01 December 2020
      Revised: 01 November 2020
      Received: 01 May 2020
      Published in TOMM Volume 17, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Video decolorization
      2. convolution neural network
      3. RNN
      4. LSTM
      5. temporal consistency

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • Natural Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)21
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 12 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Towards Long Form Audio-visual Video UnderstandingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3672079Online publication date: 7-Jun-2024
      • (2024)Backdoor Two-Stream Video Models on Federated LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3651307Online publication date: 7-Mar-2024
      • (2024)Learning Nighttime Semantic Segmentation the Hard WayACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365003220:7(1-23)Online publication date: 16-May-2024
      • (2024)Multimodal Visual-Semantic Representations Learning for Scene Text RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364655120:7(1-18)Online publication date: 27-Mar-2024
      • (2024)SWRM: Similarity Window Reweighting and Margin for Long-Tailed RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364381620:6(1-18)Online publication date: 8-Mar-2024
      • (2024)Nonlocal Hybrid Network for Long-tailed Image ClassificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363025620:4(1-22)Online publication date: 11-Jan-2024
      • (2024)Ultrasound Brain Tomography: Comparison of Deep Learning and Deterministic MethodsIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2023.333022973(1-12)Online publication date: 2024
      • (2023)An Image Classification Method Based on Adaptive Attention Mechanism and Feature Extraction NetworkComputational Intelligence and Neuroscience10.1155/2023/43055942023Online publication date: 1-Jan-2023
      • (2023)Practical Charger Placement Scheme for Wireless Rechargeable Sensor Networks with ObstaclesACM Transactions on Sensor Networks10.1145/361443120:1(1-23)Online publication date: 20-Oct-2023
      • (2023)Double-Layer Search and Adaptive Pooling Fusion for Reference-Based Image Super-ResolutionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360493720:1(1-23)Online publication date: 25-Aug-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media