Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3475445acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

I2V-GAN: Unpaired Infrared-to-Visible Video Translation

Published: 17 October 2021 Publication History

Abstract

Human vision is often adversely affected by complex environmental factors, especially in night vision scenarios. Thus, infrared cameras are often leveraged to help enhance the visual effects via detecting infrared radiation in the surrounding environment, but the infrared videos are undesirable due to the lack of detailed semantic information. In such a case, an effective video-to-video translation method from the infrared domain to the visible light counterpart is strongly needed by overcoming the intrinsic huge gap between infrared and visible fields. To address this challenging problem, we propose an infrared-to-visible (I2V) video translation method I2V-GAN to generate fine-grained and spatial-temporal consistent visible light videos by given unpaired infrared videos. Technically, our model capitalizes on three types of constraints: 1) adversarial constraint to generate synthetic frames that are similar to the real ones, 2) cyclic consistency with the introduced perceptual loss for effective content conversion as well as style preservation, and 3) similarity constraints across and within domains to enhance the content and motion consistency in both spatial and temporal spaces at a fine-grained level. Furthermore, the current public available infrared and visible light datasets are mainly used for object detection or tracking, and some are composed of discontinuous images which are not suitable for video tasks. Thus, we provide a new dataset for infrared-to-visible video translation, which is named IRVI. Specifically, it has 12 consecutive video clips of vehicle and monitoring scenes, and both infrared and visible light videos could be apart into 24352 frames. Comprehensive experiments on IRVI validate that I2V-GAN is superior to the compared state-of-the-art methods in the translation of infrared-to-visible videos with higher fluency and finer semantic details. Moreover, additional experimental results on the flower-to-flower dataset indicate I2V-GAN is also applicable to other video translation tasks. The code and IRVI dataset are available at https://github.com/BIT-DA/I2V-GAN.

References

[1]
Naofumi A., Akio H., Andrew S., and Takuya N. 2020. Reference-Based Video Colorization with Spatiotemporal Correspondence, Vol. abs/2011.12528. arXiv:2011.12528
[2]
Oord A., Li Y., and Vinyals O. 2018. Representation Learning with Contrastive Predictive Coding. In CoRR, Vol. abs/1807.03748. arXiv:1807.03748
[3]
Paszke A., Gross S., Massa F., Lerer A., Bradbury J., Chanan G., Killeen T., Lin Z., Gimelshein N., Antiga L., et al. 2019. PyTorch: An imperative style, high- performance deep learning library. In NIPS.
[4]
Ulhaq A., Yin X., He J., and Zhang Y. 2016. FACE: Fully Automated Context Enhancement for night-time video sequences. In VCIR.
[5]
Aayush B., Shugao M., Deva R., and Sheikh Y. 2018. Recycle-GAN: Unsupervised Video Retargeting. In ECCV.
[6]
Kancharagunta K. B. and Shiv R. D. 2020. PCSGAN: Perceptual cyclic-synthesized generative adversarial networks for thermal and NIR to visible image transformation. In Neurocomputing.
[7]
Sheng B., Sun H., Magnor M., and Li P. 2014. Video Colorization Using Parallel Optimization in Feature Space. In IEEE.
[8]
Szegedy C., Vanhoucke V., Ioffe S., Shlens J., and Wojna Z. 2016. Rethinking the inception architecture for computer vision. In CVPR.
[9]
Xinghao C., Yiman Z., Yunhe W., Han S., Chunjing X., and Chang X. 2020. Optical Flow Distillation: Towards Efficient and Stable Video Style Transfer. In ECCV.
[10]
Yunjey C., Minje C., Munyoung K., JungWoo H., Sunghun K., and Jaegul C. 2018. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In CVPR.
[11]
Yang C., Yu K. L., and Yong J. L. 2018. CartoonGAN: Generative Adversarial Networks for Photo Cartoonization. In CVPR.
[12]
Yunjey C., Youngjung U., Jaejun Y., and JungWoo H. 2020. StarGAN v2: Diverse Image Synthesis for Multiple Domains. In CVPR.
[13]
Y chen, Y Pan, T Yao, X Tian, and T Mei. 2019. Mocycle-GAN: Unpaired Video-to-Video Translation. In ACM MM.
[14]
Naser D., Fadi B., Khawla M., Florian K., Jean-Luc D., and Arjan K. 2019. Cascaded Generation of High-quality Color Visible Face Images from Thermal Captures. In CoRR, Vol. abs/1910.09524. arXiv:1910.09524
[15]
J. Davis and V. Sharma. 2007. Background-Subtraction using Contour-based Fusion of Thermal and Visible Imagery. In CVIU.
[16]
A. Dosovitskiy, P. Fischer, E. Ilg, P. Häusser, C., V. Golkov, P. v.d. Smagt, D. Cremers, and T. Brox. 2015. FlowNet: Learning Optical Flow with Convolutional Networks. In ICCV.
[17]
Reinhard E., Ashikhmin M., Gooch B., and Shirley P. 2002. Color transfer between images. In IEEE CGA.
[18]
FLIR. 2018. FREE FLIR Thermal Dataset for Algorithm Training. https://www.flir.com/oem/adas/adas-dataset-form/
[19]
Bhatnagar G. and Liu Z. 2015. A novel image fusion framework for night-vision navigation and surveillance. In SIVP.
[20]
Chang G., Derun G., Fangjun Z., and Yizhou Y. 2018. ReCoNet: Real-time Coherent Video Style Transfer Network. In ACCV.
[21]
Raj K. G., Alex Y. S. C., Deepu R., Ee S. N., and Zhiyong H. 2012. Image colorization using similar images. In ACM MM.
[22]
M. A. Hogervorst and A. Toet. 2007. Fast and true-to-life application of daytime colours to night-time imagery. In ICIF.
[23]
Phillip I., Jun Y. Z., Tinghui Z., and Alexei A. 2017. Image-to-image translation with conditional adversarial networks. In CVPR.
[24]
E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. 2017. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. In CVPR.
[25]
Goodfellow I. J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., and Bengio Y. 2014. Generative Adversarial Networks. In NIPS.
[26]
Li J., Chen E., Ding Z., Zhu L., Lu K., and Shen H. 2020. Maximum density divergence for domain adaptation. In IEEE TPAMI.
[27]
Li J., Jing M., Su H., Lu K., Zhu L., and Shen H. 2021. Faster Domain Adaptation Networks. In IEEE TKDE.
[28]
Zhang J., Cao Y., and Wang Z. 2016. Nighttime Haze Removal with Illumination Correction, Vol. abs/1606.01460. arXiv:1606.01460
[29]
He K., Zhang X., Ren S., and Sun J. 2016. Deep residual learning for image recognition. In CVPR.
[30]
Yash K., Karishma S., and Vandit G. 2017. Human Detection for Night Surveillance using Adaptive Background Subtracted Image, Vol. abs/1709.09389. arXiv:1709.09389
[31]
Kangning L., Shuhang G., Andres R., and Radu T. 2020. Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning. In CoRR, Vol. abs/2004.06502. arXiv:2004.06502
[32]
Heusel M., Ramsauer H., Unterthiner T., Nessler B., and Hochreiter S. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS.
[33]
Kristan M., Matas J., Leonardis A., Felsberg M., Pflugfelder R., Kamarainen J., Cehovin Z., Drbohlav O., Lukezic A., Berg A., et al. 2019. The seventh visual object tracking vot2019 challenge results. In ICCV Workshops.
[34]
Adam N., Abdelrahman E., David B., and David G. 2019. Unpaired Thermal to Visible Spectrum Transfer using Adversarial Training. In CoRR, Vol. abs/1904.02242. arXiv:1904.02242
[35]
Taesung P., Alexei A., Zhang R., and Zhu J. 2020. Contrastive Learning for Unpaired Image-to-Image Translation. In ECCV.
[36]
Hwang S., Park J., Kim N., Choi Y., and Kweon I. S. 2015. Multispectral Pedestrian Detection: Benchmark Dataset and Baselines. In CVPR.
[37]
Li S., Xie B., Wu J., Zhao Y., Liu C., and Ding Z. 2020. Simultaneous Semantic Alignment Network for Heterogeneous Domain Adaptation. In ACM MM.
[38]
Li S., Liu C., Xie B., Su L., Ding Z., and Huang G. 2019. Joint Adversarial Domain Adaptation. In ACM MM.
[39]
Li S., Liu C., Lin Q., Wen Q., Su L., Huang G., and Ding Z. 2021. Deep Residual Correction Network for Partial Domain Adaptation. IEEE TPAMI.
[40]
Li S., Song S., Gao H., Ding Z., and Cheng W. 2018. Domain Invariant and Class Discriminative Feature Learning for Visual Domain Adaptation. In IEEE TIP.
[41]
Liu S., John V., Blasch E., Liu Z., and Huang Y. 2018. IR2VI: Enhanced Night Environmental Perception by Unsupervised Thermal Image Translation. In CVPR Workshops.
[42]
Patricia L. S., Angel D. S., and Boris X. V. 2017. Infrared Image Colorization based on a Triplet DCGAN Architecture. In CVPR.
[43]
Patricia L. S., Angel D. S., and Boris X. V. 2017. Learning to Colorize Infrared Images. In PAAMS.
[44]
Chen T., Kornblith S., Norouzi M., and Hinton G. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In ICML.
[45]
Sergey T., Ming Y. L., Xiao D. Y., and Jan K. 2018. MoCoGAN: Decomposing Motion and Content for Video Generation. In CVPR.
[46]
Welsh T., Ashikhmin M., and Mueller K. 2002. Transferring Color to Greyscale Images. In TOG.
[47]
Wang T., Liu M., Zhu J., Liu G., Andrew T., Jan K., and Bryan C. 2018. Video-to-Video Synthesis. In NeurIPS.
[48]
Wang T., Liu M., Andrew T., Liu G., Jan K., and Bryan C. 2019. Few-shot Video-to-Video Synthesis. In NeurIPS.
[49]
Alexander Toet. 2003. Natural colour mapping for multiband nightvision imagery. In Information Fusion.
[50]
Ting C. W., Ming Y. L., Jun Y. Z., Andrew T., Jan K., and Bryan C. 2018. High- Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In CVPR.
[51]
Qu Y., Wong T., and Heng P. 2006. Manga Colorization. In TOG.
[52]
He Z., Benjamin S. R., Shuowen H., Nathaniel J. S., and Vishal M. P. 2018. Synthesis of High-Quality Visible Faces from Polarimetric Thermal Faces using Generative Adversarial Networks. In CoRR, Vol. abs/1812.05155. arXiv:1812.05155
[53]
Jun Y. Z., Taesung P., Phillip I., and Alexei A. E. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV.
[54]
Zhou Z., Dong M., Xie X., and Gao Z. 2016. Fusion of infrared and visible images for night-vision context enhancement. In Appl Opt.

Cited By

View all
  • (2025)Neural-Network-Enhanced Metalens Camera for High-Definition, Dynamic Imaging in the Long-Wave Infrared SpectrumACS Photonics10.1021/acsphotonics.4c01321Online publication date: 2-Jan-2025
  • (2025)CycleRegNet: A scale-aware and geometry-consistent cycle adversarial model for infrared and visible image registrationMeasurement10.1016/j.measurement.2024.116063242(116063)Online publication date: Jan-2025
  • (2025)LG-Diff: Learning to follow local class-regional guidance for nearshore image cross-modality high-quality translationInformation Fusion10.1016/j.inffus.2024.102870117(102870)Online publication date: May-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GANs
  2. infrared-to-visible
  3. video-to-video translation

Qualifiers

  • Research-article

Funding Sources

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)122
  • Downloads (Last 6 weeks)19
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Neural-Network-Enhanced Metalens Camera for High-Definition, Dynamic Imaging in the Long-Wave Infrared SpectrumACS Photonics10.1021/acsphotonics.4c01321Online publication date: 2-Jan-2025
  • (2025)CycleRegNet: A scale-aware and geometry-consistent cycle adversarial model for infrared and visible image registrationMeasurement10.1016/j.measurement.2024.116063242(116063)Online publication date: Jan-2025
  • (2025)LG-Diff: Learning to follow local class-regional guidance for nearshore image cross-modality high-quality translationInformation Fusion10.1016/j.inffus.2024.102870117(102870)Online publication date: May-2025
  • (2024)空间约束下异源图像误匹配特征点剔除算法Acta Optica Sinica10.3788/AOS24090844:20(2015002)Online publication date: 2024
  • (2024)DBSF-Net: Infrared Image Colorization Based on the Generative Adversarial Model with Dual-Branch Feature Extraction and Spatial-Frequency-Domain DiscriminationRemote Sensing10.3390/rs1620376616:20(3766)Online publication date: 10-Oct-2024
  • (2024)Nighttime Thermal Infrared Image Translation Integrating Visible ImagesRemote Sensing10.3390/rs1604066616:4(666)Online publication date: 13-Feb-2024
  • (2024)MappingFormer: Learning Cross-modal Feature Mapping for Visible-to-infrared Image TranslationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681375(10745-10754)Online publication date: 28-Oct-2024
  • (2024)Exploring Spatiotemporal Consistency of Features for Video Translation in Consumer Internet of ThingsIEEE Transactions on Consumer Electronics10.1109/TCE.2023.333100970:1(3077-3087)Online publication date: Feb-2024
  • (2024)Multimodal Fusion Induced Attention Network for Industrial VOCs DetectionIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.34360375:12(6385-6398)Online publication date: Dec-2024
  • (2024)Improving Resolution of Translated Infrared Images2024 IEEE SENSORS10.1109/SENSORS60989.2024.10785009(1-4)Online publication date: 20-Oct-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media