research-article

Free access

Just Accepted

Few-shot face sketch-to-photo synthesis via global-local asymmetric image-to-image translation

Authors:

Zhongyuan WangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications

Accepted on 20 May 2024

https://doi.org/10.1145/3672400

Online AM: 20 July 2024 Publication History

Abstract

Face sketch-to-photo synthesis is widely used in law enforcement and digital entertainment, which can be achieved by image-to-image (I2I) translation. Traditional I2I translation algorithms usually regard the bidirectional translation of two image domains as two symmetric processes, so the two translation networks adopt the same structure. However, due to the scarcity of face sketches and the abundance of face photos, the sketch-to-photo and photo-to-sketch processes are asymmetric. Considering this issue, we propose a few-shot face sketch-to-photo synthesis model based on asymmetric I2I translation, where the sketch-to-photo process uses a feature-embedded generating network, while the photo-to-sketch process uses a style transfer network. On this basis, a three-stage asymmetric training strategy with style transfer as the trigger is proposed to optimize the proposed model by utilizing the advantage that the style transfer network only needs few-shot face sketches for training. Additionally, we discover that stylistic differences between the global and local sketch faces lead to inconsistencies between the global and local sketch-to-photo processes. Thus, a dual branch of the global face and local face is adopted in the sketch-to-photo synthesis model to learn the specific transformation processes for global structure and local details. Finally, the high-quality synthetic face photo can be generated through the global-local face fusion sub-network. Extensive experimental results demonstrate that the proposed Global-Local ASymmetric image-to-image translation algorithm (GLAS) compared to SOTA methods, at least improves FSIM by 0.0126, and reduces LPIPS (alex), LPIPS (squeeze), and LPIPS (vgg) by 0.0610, 0.0883, and 0.0719, respectively.

References

[1]

Jie Cao, Luanxuan Hou, Ming-Hsuan Yang, Ran He, and Zhenan Sun. 2021. ReMix: Towards image-to-image translation with limited data. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 15013–15022.

[2]

Yuanqi Chen, Xiaoming Yu, Shan Liu, Wei Gao, and Ge Li. 2022. Zero-shot unsupervised image-to-image translation via exploiting semantic attributes. Image Vision Comput. 124, C (2022), 10.

[3]

Kun Cheng, Mingrui Zhu, Nannan Wang, Guozhang Li, Xiaoyu Wang, and Xinbo Gao. 2023. Controllable face sketch-photo synthesis with flexible generative priors. In Proceedings of the 31st ACM International Conference on Multimedia (MM’23). 6959–6968.

Digital Library

[4]

Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 8789–8797.

[5]

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive angular margin loss for deep face recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 4685–4694.

[6]

Shuchao Duan, Zhenxue Chen, Q. M. Jonathan Wu, Lei Cai, and Dan Lu. 2021. Multi-scale gradients self-attention residual learning for face photo-sketch transformation. IEEE Transactions on Information Forensics and Security 16 (2021), 1218–1230.

[7]

Hajar Emami, Majid Moradi Aliabadi, Ming Dong, and Ratna Babu Chinnam. 2021. SPA-GAN: Spatial attention GAN for image-to-image translation. IEEE Transactions on Multimedia 23 (2021), 391–401.

[8]

Junlin Han, Mehrdad Shoeiby, Lars Petersson, and Mohammad Ali Armin. 2021. Dual contrastive learning for unsupervised image-to-image translation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’21). 746–755.

[9]

Trang-Thi Ho, John Jethro Virtusio, Yung-Yao Chen, Chih-Ming Hsu, and Kai-Lung Hua. 2020. Sketch-guided deep portrait generation. ACM Trans. Multimedia Comput. Commun. Appl. 16, 3 (2020), 18.

Digital Library

[10]

Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. ArXiv abs/1602.07360 (2016).

[11]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5967–5976.

[12]

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision – ECCV 2016. 694–711.

[13]

Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. 2017. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 1857–1865.

[14]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.

Digital Library

[15]

Jianxin Lin, Yingxue Pang, Yingce Xia, Zhibo Chen, and Jiebo Luo. 2020. TuiGAN: Learning versatile image-to-image translation with two unpaired images. In Computer Vision – ECCV 2020. 18–35.

Digital Library

[16]

Jianxin Lin, Yijun Wang, Zhibo Chen, and Tianyu He. 2020. Learning to Transfer: Unsupervised domain translation via meta-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11507–11514.

[17]

Jianxin Lin, Yingce Xia, Sen Liu, Shuxin Zhao, and Zhibo Chen. 2021. ZstGAN: An adversarial approach for unsupervised zero-shot image-to-image translation. Neurocomput. 461, C (2021), 327–335.

[18]

Nie Lin, Lingbo Liu, Zhengtao Wu, and Wenxiong Kang. 2022. Unconstrained face sketch synthesis via perception-adaptive network and a new benchmark. Neurocomputing 494 (2022), 192–202.

[19]

Ye Lin, Shenggui Ling, Keren Fu, and Peng Cheng. 2020. An identity-preserved model for face sketch-photo synthesis. IEEE Signal Processing Letters 27 (2020), 1095–1099.

[20]

Yupei Lin, Sen Zhang, Tianshui Chen, Yongyi Lu, Guangping Li, and Yukai Shi. 2022. Exploring negatives in contrastive learning for unpaired image-to-image translation. In Proceedings of the 30th ACM International Conference on Multimedia (MM’22). 1186–1194.

Digital Library

[21]

Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019. Few-shot unsupervised image-to-image translation. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV’2019). 10550–10559.

[22]

Qingshan Liu, Xiaoou Tang, Hongliang Jin, Hanqing Lu, and Songde Ma. 2005. A nonlinear approach for face sketch synthesis and recognition. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. 1005–1010.

[23]

Aleix Martinez and Robert Benavente. 1998. The AR Face Database: CVC technical report, 24.

[24]

K. Messer, J. Matas, J. Kittler, Juergen Luettin, and Gilbert Maître. 1999. XM2VTSDB: The Extended M2VTS Database. Proc. Second International Conference on Audio- and Video-based Biometric Person Authentication (AVBPA’99), 965–966.

[25]

Taesung Park, Alexei A. Efros, Richard Zhang, and Jun-Yan Zhu. 2020. Contrastive learning for unpaired image-to-image translation. In Computer Vision – ECCV 2020. 319–345.

Digital Library

[26]

Chunlei Peng, Congyu Zhang, Decheng Liu, Nannan Wang, and Xinbo Gao. 2023. Face photo–sketch synthesis via intra-domain enhancement. Know.-Based Syst. 259, C (jan 2023), 12.

[27]

Chunlei Peng, Congyu Zhang, Decheng Liu, Nannan Wang, and Xinbo Gao. 2023. HiFiSketch: High Fidelity Face Photo-Sketch Synthesis and Manipulation. IEEE Transactions on Image Processing 32 (2023), 5865–5876.

Digital Library

[28]

P.J. Phillips, Hyeonjoon Moon, S.A. Rizvi, and P.J. Rauss. 2000. The FERET evaluation methodology for face-recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 10 (2000), 1090–1104.

Digital Library

[29]

Fabio Pizzati, Jean-François Lalonde, and Raoul de Charette. 2022. ManiFest: Manifold deformation for few-shot image translation. In Computer Vision – ECCV 2022. 440–456.

Digital Library

[30]

Xingqun Qi, Muyi Sun, Zijian Wang, Jiaming Liu, Qi Li, Fang Zhao, Shanghang Zhang, and Caifeng Shan. 2023. Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative Adversarial Network With Graph Representation Learning. IEEE Transactions on Neural Networks and Learning Systems (2023), 1–14.

[31]

Abduljalil Radman, Amer Sallam, and Shahrel Azmin Suandi. 2022. Deep residual network for face sketch synthesis. Expert Systems with Applications 190 (2022), 115980.

Digital Library

[32]

Abduljalil Radman and Shahrel Azmin Suandi. 2021. BiLSTM regression model for face sketch synthesis using sequential patterns. Neural Comput. Appl. 33, 19 (oct 2021), 12689–12702.

Digital Library

[33]

Sam T. Roweis and Lawrence K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 5500 (2000), 2323–2326.

[34]

Kuniaki Saito, Kate Saenko, and Ming-Yu Liu. 2020. COCO-FUNIT: Few-shot unsupervised image translation with a content conditioned style encoder. In Computer Vision – ECCV 2020. 382–398.

Digital Library

[35]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).

[36]

Hao Tang, Hong Liu, Dan Xu, Philip H. S. Torr, and Nicu Sebe. 2023. AttentionGAN: Unpaired image-to-image translation using attention-guided generative adversarial networks. IEEE Transactions on Neural Networks and Learning Systems 34, 4 (2023), 1972–1987.

[37]

Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, and Yan Yan. 2019. Multi-channel attention selection GAN with cascaded semantic guidance for cross-view image translation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 2412–2421.

[38]

Xiaoou Tang and Xiaogang Wang. 2003. Face sketch synthesis and recognition. In Proceedings Ninth IEEE International Conference on Computer Vision (ICCV’09). 687–694.

Digital Library

[39]

Dmitrii Torbunov, Yi Huang, Haiwang Yu, Jin Huang, Shinjae Yoo, Meifeng Lin, Brett Viren, and Yihui Ren. 2023. UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired image-to-image translation. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV’23). 702–712.

[40]

Nannan Wang, Xinbo Gao, Dacheng Tao, and Xuelong Li. 2011. Face sketch-photo synthesis under multi-dictionary sparse representation framework. In 2011 Sixth International Conference on Image and Graphics. 82–87.

Digital Library

[41]

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 8798–8807.

[42]

Yaxing Wang, Salman Khan, Abel Gonzalez-Garcia, Joost van de Weijer, and Fahad Shahbaz Khan. 2020. Semi-supervised learning for few-shot image-to-image translation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 4452–4461.

[43]

Yijun Wang, Tao Liang, and Jianxin Lin. 2022. CACOLIT: Cross-domain adaptive co-learning for imbalanced image-to-image translation. In Proceedings of the 30th ACM International Conference on Multimedia (MM’22). 1068–1076.

Digital Library

[44]

Shuai Yang, Zhangyang Wang, Jiaying Liu, and Zongming Guo. 2021. Controllable sketch-to-image translation for robust face synthesis. IEEE Transactions on Image Processing 30 (2021), 8797–8810.

Digital Library

[45]

Ran Yi, Yong-Jin Liu, Yu-Kun Lai, and Paul L. Rosin. 2019. APDrawingGAN: Generating artistic portrait drawings from face photos with hierarchical GANs. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 10735–10744.

[46]

Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017. DualGAN: Unsupervised dual learning for image-to-image translation. In 2017 IEEE International Conference on Computer Vision (ICCV’17). 2868–2876.

[47]

Jun Yu, Xingxin Xu, Fei Gao, Shengjie Shi, Meng Wang, Dacheng Tao, and Qingming Huang. 2021. Toward realistic face photo–sketch synthesis via composition-aided GANs. IEEE Transactions on Cybernetics 51, 9 (2021), 4350–4362.

[48]

Shikang Yu, Hu Han, Shiguang Shan, and Xilin Chen. 2023. CMOS-GAN: Semi-supervised generative adversarial model for cross-modality face image synthesis. IEEE Transactions on Image Processing 32 (2023), 144–158.

[49]

Wangbo Yu, Mingrui Zhu, Nannan Wang, Xiaoyu Wang, and Xinbo Gao. 2023. An efficient transformer based on global and local self-attention for face photo-sketch synthesis. IEEE Transactions on Image Processing 32 (2023), 483–495.

[50]

Masoumeh Zareapoor and Jie Yang. 2021. Equivariant adversarial network for image-to-image translation. ACM Trans. Multimedia Comput. Commun. Appl. 17, 2 (2021), 14.

Digital Library

[51]

Liliang Zhang, Liang Lin, Xian Wu, Shengyong Ding, and Lei Zhang. 2015. End-to-end photo-sketch generation via fully convolutional representation learning. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR’15). 627–634.

Digital Library

[52]

Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang. 2011. FSIM: A feature similarity index for image quality assessment. IEEE Transactions on Image Processing 20, 8 (2011), 2378–2386.

Digital Library

[53]

Mingjin Zhang, Jie Li, Nannan Wang, and Xinbo Gao. 2018. Compositional Model-Based Sketch Generator in Facial Entertainment. IEEE Transactions on Cybernetics 48, 3 (2018), 904–915.

[54]

Mingjin Zhang, Yunsong Li, Nannan Wang, Yuan Chi, and Xinbo Gao. 2020. Cascaded Face Sketch Synthesis Under Various Illuminations. IEEE Transactions on Image Processing 29 (2020), 1507–1521.

[55]

Mingjin Zhang, Nannan Wang, Yunsong Li, and Xinbo Gao. 2019. Deep Latent Low-Rank Representation for Face Sketch Synthesis. IEEE Transactions on Neural Networks and Learning Systems 30, 10 (2019), 3109–3123.

[56]

Mingjin Zhang, Nannan Wang, Yunsong Li, and Xinbo Gao. 2020. Bionic Face Sketch Generator. IEEE Transactions on Cybernetics 50, 6 (2020), 2701–2714.

[57]

Mingjin Zhang, Nannan Wang, Yunsong Li, and Xinbo Gao. 2020. Neural Probabilistic Graphical Model for Face Sketch Synthesis. IEEE Transactions on Neural Networks and Learning Systems 31, 7 (2020), 2623–2637.

[58]

Mingjin Zhang, Ruxin Wang, Xinbo Gao, Jie Li, and Dacheng Tao. 2019. Dual-transfer face sketch–photo synthesis. IEEE Transactions on Image Processing 28, 2 (2019), 642–657.

Digital Library

[59]

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 586–595.

[60]

Ziqiang Zheng, Zhibin Yu, Haiyong Zheng, Yang Yang, and Heng Tao Shen. 2022. One-shot image-to-image translation via part-global learning with a multi-adversarial framework. IEEE Transactions on Multimedia 24 (2022), 480–491.

Digital Library

[61]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In 2017 IEEE International Conference on Computer Vision (ICCV’17). 2242–2251.

[62]

Mingrui Zhu, Changcheng Liang, Nannan Wang, Xiaoyu Wang, Zhifeng Li, and Xinbo Gao. 2021. A sketch-transformer network for face photo-sketch synthesis. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. 1352–1358.

[63]

Mingrui Zhu, Zicheng Wu, Nannan Wang, Heng Yang, and Xinbo Gao. 2023. Dual conditional normalization pyramid network for face photo-sketch synthesis. IEEE Transactions on Circuits and Systems for Video Technology 33, 9 (2023), 5200–5211.

Digital Library

Index Terms

Few-shot face sketch-to-photo synthesis via global-local asymmetric image-to-image translation
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation
      1. Image processing

Recommendations

Face Photo-Sketch Synthesis and Recognition

In this paper, we propose a novel face photo-sketch synthesis and recognition method using a multiscale Markov Random Fields (MRF) model. Our system has three components: 1) given a face photo, synthesizing a sketch drawing; 2) given a face sketch ...
Face photo-sketch synthesis via full-scale identity supervision
Highlights
- To the best of our knowledge, we first employ full-scale identity supervision to synthesis model, which provides a comprehensive constraint.
Abstract
Face photo-sketch synthesis refers transforming a face image between photo domain and sketch domain. It plays a crucial role in law enforcement and digital entertainment. A great deal of efforts have been devoted on face photo-sketch ...
Identity-aware CycleGAN for face photo-sketch synthesis and recognition
Highlights
- Jointly solve synthesis and recognition problem by their close relationship.
- ...
Abstract
Face photo-sketch synthesis and recognition has many applications in digital entertainment and law enforcement. Recently, generative adversarial networks (GANs) based methods have significantly improved the quality of image synthesis, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Just Accepted

EISSN:1551-6865

Table of Contents

Copyright © 2024 Copyright held by the owner/author(s).

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Online AM: 20 July 2024

Accepted: 20 May 2024

Revised: 13 May 2024

Received: 03 January 2024

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
161
Total Downloads

Downloads (Last 12 months)161
Downloads (Last 6 weeks)161

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables