research-article

Video Decolorization Based on the CNN and LSTM Neural Network

Authors:

Xiaoli ZhangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 3

Article No.: 88, Pages 1 - 18

https://doi.org/10.1145/3446619

Published: 22 July 2021 Publication History

Abstract

Video decolorization is the process of transferring three-channel color videos into single-channel grayscale videos, which is essentially the decolorization operation of video frames. Most existing video decolorization algorithms directly apply image decolorization methods to decolorize video frames. However, if we only take the single-frame decolorization result into account, it will inevitably cause temporal inconsistency and flicker phenomenon meaning that the same local content between continuous video frames may display different gray values. In addition, there are often similar local content features between video frames, which indicates redundant information. To solve the preceding problems, this article proposes a novel video decolorization algorithm based on the convolutional neural network and the long short-term memory neural network. First, we design a local semantic content encoder to learn and extract the same local content of continuous video frames, which can better preserve the contrast of video frames. Second, a temporal feature controller based on the bi-directional recurrent neural networks with Long short-term memory units is employed to refine the local semantic features, which can greatly maintain temporal consistency of the video sequence to eliminate the flicker phenomenon. Finally, we take advantages of deconvolution to decode the features to produce the grayscale video sequence. Experiments have indicated that our method can better preserve the local contrast of video frames and the temporal consistency over the state of the-art.

References

[1]

Codruta Orniana Ancuti, Cosmin Ancuti, and Phillipe Bekaert. 2011. Enhancing by saliency-guided decolorization. In Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 257–264.

Digital Library

[2]

Coduta O. Ancuti, Cosmin Ancuti, C. Hermans, and P. Bekaert. 2010. Fusion-based image and video decolorization. In Computer Vision—ACCV 2010. Lecture Notes in Computer Science, Vol. 6492. Springer, 79–92.

Digital Library

[3]

Codruta O. Ancuti, Cosmin Ancuti, Chris Hermans, and Philippe Bekaert. 2010. Image and video decolorization by fusion. In Proceedings of the Asian Conference on Computer Vision. 79–92.

Digital Library

[4]

Raja Bala and Reiner Eschbach. 2004. Spatial color-to-grayscale transform preserving chrominance edge information. In Proceedings of the Color and Imaging Conference, Vol. 2004. 82–86.

[5]

Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, and Gang Hua. 2017. Coherent online video style transfer. In Proceedings of the IEEE International Conference on Computer Vision. 1105–1114.

[6]

Kai Chen and Qiang Huo. 2016. Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 7 (2016), 1185–1193.

Digital Library

[7]

Lianli Gao, Zhao Guo, Hanwang Zhang, Xing Xu, and Heng Tao Shen. 2017. Video captioning with attention-based LSTM and semantic consistency. IEEE Transactions on Multimedia 19, 9 (2017), 2045–2055.

Digital Library

[8]

Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2414–2423.

[9]

Amy A. Gooch, Sven C. Olsen, Jack Tumblin, and Bruce Gooch. 2005. Color2Gray: Salience-preserving color removal. ACM Transactions on Graphics 24, 3 (2005), 634–639.

Digital Library

[10]

Mark Grundland and Neil A. Dodgson. 2007. Decolorize: Fast, contrast enhancing, color to grayscale conversion. Pattern Recognition 40, 11 (2007), 2891–2896.

Digital Library

[11]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. LSTM can solve hard long time lag problems. In Advances in Neural Information Processing Systems. 473–479.

Digital Library

[12]

Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2012. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2012), 221–231.

Digital Library

[13]

Zhongping Ji, Mei-e Fang, Yigang Wang, and Weiyin Ma. 2016. Efficient decolorization preserving dominant distinctions. Visual Computer 32, 12 (2016), 1621–1631.

Digital Library

[14]

Cewu Lu, Li Xu, and Jiaya Jia. 2012. Real-time contrast preserving decolorization. In Proceedings of SIGGRAPH Asia 2012 Technical Briefs (SA’12). Article 34, 4 pages.

Digital Library

[15]

Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, Xiangyang Xue, and Shih-Fu Chang. 2018. Modeling multimodal clues in a hybrid deep learning framework for video classification. IEEE Transactions on Multimedia 20, 11 (2018), 3137–3147.

Digital Library

[16]

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725–1732.

Digital Library

[17]

Yongjin Kim, Cheolhun Jang, Julien Demouth, and Seungyong Lee. 2009. Robust color-to-gray via nonlinear global mapping. In Proceedings of ACM SIGGRAPH Asia 2009 Papers (SIGGRAPH Asia’09). Article 161, 4 pages.

Digital Library

[18]

Jung Gap Kuk, Jae Hyun Ahn, and Nam Ik Cho. 2010. A color to grayscale conversion considering local and global contrast. In Proceedings of the Asian Conference on Computer Vision. 513–524.

Digital Library

[19]

Qiegen Liu and Henry Leung. 2019. Variable augmented neural network for decolorization and multi-exposure fusion. Information Fusion 46 (2019), 114–127.

Digital Library

[20]

Qiegen Liu, Peter Xiaoping Liu, Yuhao Wang, and Henry Leung. 2016. Semiparametric decolorization with Laplacian-based perceptual quality metric. IEEE Transactions on Circuits and Systems for Video Technology 27, 9 (2016), 1856–1868.

[21]

Qiegen Liu, Peter X. Liu, Weisi Xie, Yuhao Wang, and Dong Liang. 2015. GcsDecolor: Gradient correlation similarity for efficient contrast preserving decolorization. IEEE Transactions on Image Processing 24, 9 (2015), 2889–2904.

Digital Library

[22]

Qiegen Liu, Guangpu Shao, Yuhao Wang, Junbin Gao, and Henry Leung. 2017. Log-Euclidean metrics for contrast preserving decolorization. IEEE Transactions on Image Processing 26, 12 (2017), 5772–5783.

Digital Library

[23]

Shiguang Liu and Xiaoli Zhang. 2019. Image decolorization combining local features and exposure features. IEEE Transactions on Multimedia 21, 10 (2019), 2461–2472.

Digital Library

[24]

Cewu Lu, Li Xu, and Jiaya Jia. 2012. Contrast preserving decolorization. In Proceedings of the 2012 IEEE International Conference on Computational Photography (ICCP’12). 1–7.

[25]

Cewu Lu, Li Xu, and Jiaya Jia. 2014. Contrast preserving decolorization with perception-based quality metrics. International Journal of Computer Vision 110, 2 (2014), 222–239.

Digital Library

[26]

Laszlo Neumann, Martin Čadik, and Antal Nemcsics. 2007. An efficient perception-based adaptive color to gray transformation. In Proceedings of the 3rd Eurographics Conference on Computational Aesthetics in Graphics, Visualization, and Imaging. 73–80.

Digital Library

[27]

Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 724–732.

[28]

Kaleigh Smith, Pierre-Edouard Landes, Joëlle Thollot, and Karol Myszkowski. 2008. Apparent greyscale: A simple and fast conversion to perceptually accurate images and video. Computer Graphics Forum 27 (2008), 193–200.

[29]

Mingli Song, Dacheng Tao, Chun Chen, Xuelong Li, and Chang Wen Chen. 2010. Color to gray: Visual cue preservation. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (2010), 1537–1552.

Digital Library

[30]

Yibing Song, Linchao Bao, Xiaobin Xu, and Qingxiong Yang. 2013. Decolorization: Is rgb2gray () out? In Proceedings of SIGGRAPH Asia 2013 Technical Briefs. 1–4.

Digital Library

[31]

Yibing Song, Linchao Bao, and Qingxiong Yang. 2014. Real-time video decolorization using bilateral filtering. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 159–166.

[32]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems 27 (2014), 3104–3112.

Digital Library

[33]

Yizhang Tao, Yiyi Shen, Bin Sheng, Ping Li, and Rynson W. H. Lau. 2017. Video decolorization using visual proximity coherence optimization. IEEE Transactions on Cybernetics 48, 5 (2017), 1406–1419.

[34]

Yizhang Tao, Yiyi Shen, Bin Sheng, Ping Li, and Enhua Wu. 2016. Temporal coherent video decolorization using proximity optimization. In Proceedings of the 33rd International Computer Graphics Conference. 41–44.

Digital Library

[35]

Xingxing Wei, Jun Zhu, Sitong Feng, and Hang Su. 2018. Video-to-video translation with global temporal consistency. In Proceedings of the 26th ACM International Conference on Multimedia. 18–25.

Digital Library

[36]

Jiajun Wu, Joshua B. Tenenbaum, and Pushmeet Kohli. 2017. Neural scene de-rendering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 699–707.

[37]

Hongchao Zhang and Shiguang Liu. 2017. Efficient decolorization via perceptual group difference enhancement. In Proceedings of the International Conference on Image and Graphics. 560–569.

[38]

Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, and Youliang Yan. 2019. Exploiting temporal consistency for real-time video depth estimation. In Proceedings of the IEEE International Conference on Computer Vision. 1725–1734.

[39]

Xiaoli Zhang and Shiguang Liu. 2018. Contrast preserving image decolorization combining global features and local semantic features. Visual Computer 34, 6-8 (2018), 1099–1108.

Digital Library

[40]

Hanli Zhao, Haining Zhang, and Xiaogang Jin. 2018. Efficient image decolorization with a multimodal contrast-preserving measure. Computers & Graphics 70 (2018), 251–260.

Cited By

Hou WLi GTian YHu D(2024)Towards Long Form Audio-visual Video UnderstandingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3672079Online publication date: 7-Jun-2024
https://dl.acm.org/doi/10.1145/3672079
Zhao JYang HHe HPeng JZhang WNi JSangaiah ACastiglione A(2024)Backdoor Two-Stream Video Models on Federated LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3651307Online publication date: 7-Mar-2024
https://doi.org/10.1145/3651307
Liu WCai JLi QLiao CCao JHe SYu Y(2024)Learning Nighttime Semantic Segmentation the Hard WayACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365003220:7(1-23)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3650032
Show More Cited By

Index Terms

Video Decolorization Based on the CNN and LSTM Neural Network
1. Computing methodologies
  1. Artificial intelligence
  2. Computer graphics

Recommendations

Enhancing by saliency-guided decolorization
CVPR '11: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

This paper introduces an effective decolorization algorithm that preserves the appearance of the original color image. Guided by the original saliency, the method blends the luminance and the chrominance information in order to conserve the initial ...
Research on advertising content recognition based on convolutional neural network and recurrent neural network

The problem to be solved in this paper is to identify the text advertisement information published by users in a medium-sized social networking website. First, the text is segmented and then the text is transformed into sequence tensor by using a word ...
Contrast preserving image decolorization combining global features and local semantic features

Image decolorization known as the process to transform a color image to a grayscale one is widely used in single-channel image processing, black and white printing, etc. It is a dimension reduction process which inevitably suffers from information loss. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 3

August 2021

443 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3476118

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2021

Accepted: 01 December 2020

Revised: 01 November 2020

Received: 01 May 2020

Published in TOMM Volume 17, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
145
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)2

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hou WLi GTian YHu D(2024)Towards Long Form Audio-visual Video UnderstandingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3672079Online publication date: 7-Jun-2024
https://dl.acm.org/doi/10.1145/3672079
Zhao JYang HHe HPeng JZhang WNi JSangaiah ACastiglione A(2024)Backdoor Two-Stream Video Models on Federated LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3651307Online publication date: 7-Mar-2024
https://doi.org/10.1145/3651307
Liu WCai JLi QLiao CCao JHe SYu Y(2024)Learning Nighttime Semantic Segmentation the Hard WayACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365003220:7(1-23)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3650032
Gao XPang YLiu YHan MYu JWang WChen Y(2024)Multimodal Visual-Semantic Representations Learning for Scene Text RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364655120:7(1-18)Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1145/3646551
Chen QHuang TLiu Q(2024)SWRM: Similarity Window Reweighting and Margin for Long-Tailed RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364381620:6(1-18)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3643816
Liang RZhang SZhang WZhang GTang J(2024)Nonlocal Hybrid Network for Long-tailed Image ClassificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363025620:4(1-22)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3630256
Soleimani MRymarczyk TKłosowski G(2024)Ultrasound Brain Tomography: Comparison of Deep Learning and Deterministic MethodsIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2023.333022973(1-12)Online publication date: 2024
https://doi.org/10.1109/TIM.2023.3330229
Luo JHu D(2023)An Image Classification Method Based on Adaptive Attention Mechanism and Feature Extraction NetworkComputational Intelligence and Neuroscience10.1155/2023/43055942023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/4305594
You WRen MMa YWu DYang JLiu XLiu T(2023)Practical Charger Placement Scheme for Wireless Rechargeable Sensor Networks with ObstaclesACM Transactions on Sensor Networks10.1145/361443120:1(1-23)Online publication date: 20-Oct-2023
https://dl.acm.org/doi/10.1145/3614431
Guo KChen LZhu XKui XZhang JShi H(2023)Double-Layer Search and Adaptive Pooling Fusion for Reference-Based Image Super-ResolutionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360493720:1(1-23)Online publication date: 25-Aug-2023
https://dl.acm.org/doi/10.1145/3604937
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents