Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Few-Shot Face Sketch-to-Photo Synthesis via Global-Local Asymmetric Image-to-Image Translation

Published: 29 October 2024 Publication History

Abstract

Face sketch-to-photo synthesis is widely used in law enforcement and digital entertainment, which can be achieved by Image-to-Image (I2I) translation. Traditional I2I translation algorithms usually regard the bidirectional translation of two image domains as two symmetric processes, so the two translation networks adopt the same structure. However, due to the scarcity of face sketches and the abundance of face photos, the sketch-to-photo and photo-to-sketch processes are asymmetric. Considering this issue, we propose a few-shot face sketch-to-photo synthesis model based on asymmetric I2I translation, where the sketch-to-photo process uses a feature-embedded generating network, while the photo-to-sketch process uses a style transfer network. On this basis, a three-stage asymmetric training strategy with style transfer as the trigger is proposed to optimize the proposed model by utilizing the advantage that the style transfer network only needs few-shot face sketches for training. Additionally, we discover that stylistic differences between the global and local sketch faces lead to inconsistencies between the global and local sketch-to-photo processes. Thus, a dual branch of the global face and local face is adopted in the sketch-to-photo synthesis model to learn the specific transformation processes for global structure and local details. Finally, the high-quality synthetic face photo can be generated through the global-local face fusion sub-network. Extensive experimental results demonstrate that the proposed Global-Local Asymmetric (GLAS) I2I translation algorithm compared to SOTA methods, at least improves FSIM by 0.0126, and reduces LPIPS (alex), LPIPS (squeeze), and LPIPS (vgg) by 0.0610, 0.0883, and 0.0719, respectively.

References

[1]
Jie Cao, Luanxuan Hou, Ming-Hsuan Yang, Ran He, and Zhenan Sun. 2021. ReMix: Towards image-to-image translation with limited data. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’21), 15013–15022.
[2]
Yuanqi Chen, Xiaoming Yu, Shan Liu, Wei Gao, and Ge Li. 2022. Zero-shot unsupervised image-to-image translation via exploiting semantic attributes. Image Vision Computing 124, C (2022), 10, 0262–8856.
[3]
Kun Cheng, Mingrui Zhu, Nannan Wang, Guozhang Li, Xiaoyu Wang, and Xinbo Gao. 2023. Controllable face sketch-photo synthesis with flexible generative priors. In Proceedings of the 31st ACM International Conference on Multimedia (MM ’23), 6959–6968.
[4]
Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’18), 8789–8797.
[5]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive angular margin loss for deep face recognition. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’19), 4685–4694.
[6]
Shuchao Duan, Zhenxue Chen, Q. M. Jonathan Wu, Lei Cai, and Dan Lu. 2021. Multi-scale gradients self-attention residual learning for face photo-sketch transformation. IEEE Transactions on Information Forensics and Security 16 (2021), 1218–1230.
[7]
Hajar Emami, Majid Moradi Aliabadi, Ming Dong, and Ratna Babu Chinnam. 2021. SPA-GAN: Spatial attention GAN for image-to-image translation. IEEE Transactions on Multimedia 23 (2021), 391–401.
[8]
Junlin Han, Mehrdad Shoeiby, Lars Petersson, and Mohammad Ali Armin. 2021. Dual contrastive learning for unsupervised image-to-image translation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW ’21), 746–755.
[9]
Trang-Thi Ho, John Jethro Virtusio, Yung-Yao Chen, Chih-Ming Hsu, and Kai-Lung Hua. 2020. Sketch-guided deep portrait generation. ACM Transactions on Multimedia Computing and Communication Applications 16, 3 (2020), 1551–6857.
[10]
Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \({\lt}\)1MB model size. arXiv:1602.07360. Retrieved from https://arxiv.org/abs/1602.07360
[11]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’17), 5967–5976.
[12]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the Computer Vision (ECCV ’16), 694–711.
[13]
Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. 2017. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning (ICML ’17), 1857–1865.
[14]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84–90.
[15]
Jianxin Lin, Yingxue Pang, Yingce Xia, Zhibo Chen, and Jiebo Luo. 2020. TuiGAN: Learning versatile image-to-image translation with two unpaired images. In Proceedings of the Computer Vision (ECCV ’20), 18–35.
[16]
Jianxin Lin, Yijun Wang, Zhibo Chen, and Tianyu He. 2020. Learning to transfer: Unsupervised domain translation via meta-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 11507–11514.
[17]
Jianxin Lin, Yingce Xia, Sen Liu, Shuxin Zhao, and Zhibo Chen. 2021. ZstGAN: An adversarial approach for unsupervised zero-shot image-to-image translation. Neurocomputing 461, C (2021), 327–335.
[18]
Nie Lin, Lingbo Liu, Zhengtao Wu, and Wenxiong Kang. 2022. Unconstrained face sketch synthesis via perception-adaptive network and a new benchmark. Neurocomputing 494 (2022), 192–202.
[19]
Ye Lin, Shenggui Ling, Keren Fu, and Peng Cheng. 2020. An identity-preserved model for face sketch-photo synthesis. IEEE Signal Processing Letters 27 (2020), 1095–1099.
[20]
Yupei Lin, Sen Zhang, Tianshui Chen, Yongyi Lu, Guangping Li, and Yukai Shi. 2022. Exploring negatives in contrastive learning for unpaired image-to-image translation. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22), 1186–1194.
[21]
Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019. Few-shot unsupervised image-to-image translation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV ’19), 10550–10559.
[22]
Qingshan Liu, Xiaoou Tang, Hongliang Jin, Hanqing Lu, and Songde Ma. 2005. A nonlinear approach for face sketch synthesis and recognition. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’05), Vol. 1, 1005–1010.
[23]
Aleix Martinez and Robert Benavente. 1998. The AR Face Database: CVC Technical Report, 24.
[24]
K. Messer, J. Matas, J. Kittler, Juergen Luettin, and Gilbert Maître. 1999. XM2VTSDB: The Extended M2VTS Database. Proceedings of the 2nd International Conference on Audio- and Video-Based Biometric Person Authentication (AVBPA ’99), 965–966.
[25]
Taesung Park, Alexei A. Efros, Richard Zhang, and Jun-Yan Zhu. 2020. Contrastive learning for unpaired image-to-image translation. In Proceedings of the Computer Vision (ECCV ’20), 319–345.
[26]
Chunlei Peng, Congyu Zhang, Decheng Liu, Nannan Wang, and Xinbo Gao. 2023. Face photo–sketch synthesis via intra-domain enhancement. Knowledge-Based Systems. 259, C (Jan 2023), 12, 0950–7051
[27]
Chunlei Peng, Congyu Zhang, Decheng Liu, Nannan Wang, and Xinbo Gao. 2023. HiFiSketch: High fidelity face photo-sketch synthesis and manipulation. IEEE Transactions on Image Processing 32 (2023), 5865–5876.
[28]
P. J. Phillips, Hyeonjoon Moon, S. A. Rizvi, and P. J. Rauss. 2000. The FERET evaluation methodology for face-recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 10 (2000), 1090–1104.
[29]
Fabio Pizzati, Jean-François Lalonde, and Raoul de Charette. 2022. ManiFest: Manifold deformation for few-shot image translation. In Proceedings of the Computer Vision (ECCV ’22), 440–456.
[30]
Xingqun Qi, Muyi Sun, Zijian Wang, Jiaming Liu, Qi Li, Fang Zhao, Shanghang Zhang, and Caifeng Shan. 2023. Biphasic face photo-sketch synthesis via semantic-driven generative adversarial network with graph representation learning. IEEE Transactions on Neural Networks and Learning Systems, 1–14.
[31]
Abduljalil Radman, Amer Sallam, and Shahrel Azmin Suandi. 2022. Deep residual network for face sketch synthesis. Expert Systems with Applications 190 (2022), 115980.
[32]
Abduljalil Radman and Shahrel Azmin Suandi. 2021. BiLSTM regression model for face sketch synthesis using sequential patterns. Neural Computing Applications. 33, 19 (Oct 2021), 12689–12702.
[33]
Sam T. Roweis and Lawrence K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 5500 (2000), 2323–2326.
[34]
Kuniaki Saito, Kate Saenko, and Ming-Yu Liu. 2020. COCO-FUNIT: Few-shot unsupervised image translation with a content conditioned style encoder. In Proceedings of the Computer Vision (ECCV ’20), 382–398.
[35]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arxiv.org/abs/1409.1556. Retrieved from https://arxiv.org/abs/1409.1556
[36]
Hao Tang, Hong Liu, Dan Xu, Philip H. S. Torr, and Nicu Sebe. 2023. AttentionGAN: Unpaired image-to-image translation using attention-guided generative adversarial networks. IEEE Transactions on Neural Networks and Learning Systems 34, 4 (2023), 1972–1987.
[37]
Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, and Yan Yan. 2019. Multi-channel attention selection GAN with cascaded semantic guidance for cross-view image translation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’19), 2412–2421.
[38]
Xiaoou Tang and Xiaogang Wang. 2003. Face sketch synthesis and recognition. In Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV ’09), 687–694.
[39]
Dmitrii Torbunov, Yi Huang, Haiwang Yu, Jin Huang, Shinjae Yoo, Meifeng Lin, Brett Viren, and Yihui Ren. 2023. UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired image-to-image translation. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV ’23), 702–712.
[40]
Nannan Wang, Xinbo Gao, Dacheng Tao, and Xuelong Li. 2011. Face sketch-photo synthesis under multi-dictionary sparse representation framework. In Proceedings of the 2011 6th International Conference on Image and Graphics, 82–87.
[41]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’18), 8798–8807.
[42]
Yaxing Wang, Salman Khan, Abel Gonzalez-Garcia, Joost van de Weijer, and Fahad Shahbaz Khan. 2020. Semi-supervised learning for few-shot image-to-image translation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’20), 4452–4461.
[43]
Yijun Wang, Tao Liang, and Jianxin Lin. 2022. CACOLIT: Cross-domain adaptive co-learning for imbalanced image-to-image translation. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22), 1068–1076.
[44]
Shuai Yang, Zhangyang Wang, Jiaying Liu, and Zongming Guo. 2021. Controllable sketch-to-image translation for robust face synthesis. IEEE Transactions on Image Processing 30 (2021), 8797–8810.
[45]
Ran Yi, Yong-Jin Liu, Yu-Kun Lai, and Paul L. Rosin. 2019. APDrawingGAN: Generating artistic portrait drawings from face photos with hierarchical GANs. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’19), 10735–10744.
[46]
Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017. DualGAN: Unsupervised dual learning for image-to-image translation. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV ’17), 2868–2876.
[47]
Jun Yu, Xingxin Xu, Fei Gao, Shengjie Shi, Meng Wang, Dacheng Tao, and Qingming Huang. 2021. Toward realistic face photo–sketch synthesis via composition-aided GANs. IEEE Transactions on Cybernetics 51, 9 (2021), 4350–4362.
[48]
Shikang Yu, Hu Han, Shiguang Shan, and Xilin Chen. 2023a. CMOS-GAN: Semi-supervised generative adversarial model for cross-modality face image synthesis. IEEE Transactions on Image Processing 32 (2023), 144–158.
[49]
Wangbo Yu, Mingrui Zhu, Nannan Wang, Xiaoyu Wang, and Xinbo Gao. 2023b. An efficient transformer based on global and local self-attention for face photo-sketch synthesis. IEEE Transactions on Image Processing 32 (2023), 483–495.
[50]
Masoumeh Zareapoor and Jie Yang. 2021. Equivariant adversarial network for image-to-image translation. ACM Transactions on Multimedia Computing Communications, and Applications 17, 2 (2021), 14. 1551–6857
[51]
Liliang Zhang, Liang Lin, Xian Wu, Shengyong Ding, and Lei Zhang. 2015. End-to-end photo-sketch generation via fully convolutional representation learning. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR ’15), 627–634.
[52]
Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang. 2011. FSIM: A feature similarity index for image quality assessment. IEEE Transactions on Image Processing 20, 8 (2011), 2378–2386.
[53]
Mingjin Zhang, Jie Li, Nannan Wang, and Xinbo Gao. 2018b. Compositional model-based sketch generator in facial entertainment. IEEE Transactions on Cybernetics 48, 3 (2018), 904–915.
[54]
Mingjin Zhang, Yunsong Li, Nannan Wang, Yuan Chi, and Xinbo Gao. 2020. Cascaded face sketch synthesis under various illuminations. IEEE Transactions on Image Processing 29 (2020), 1507–1521.
[55]
Mingjin Zhang, Nannan Wang, Yunsong Li, and Xinbo Gao. 2019. Deep latent low-rank representation for face sketch synthesis. IEEE Transactions on Neural Networks and Learning Systems 30, 10 (2019), 3109–3123.
[56]
Mingjin Zhang, Nannan Wang, Yunsong Li, and Xinbo Gao. 2020. Bionic face sketch generator. IEEE Transactions on Cybernetics 50, 6 (2020), 2701–2714.
[57]
Mingjin Zhang, Nannan Wang, Yunsong Li, and Xinbo Gao. 2020. Neural probabilistic graphical model for face sketch synthesis. IEEE Transactions on Neural Networks and Learning Systems 31, 7 (2020), 2623–2637.
[58]
Mingjin Zhang, Ruxin Wang, Xinbo Gao, Jie Li, and Dacheng Tao. 2019. Dual-transfer face sketch–photosynthesis. IEEE Transactions on Image Processing 28, 2 (2019), 642–657.
[59]
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’18), 586–595.
[60]
Ziqiang Zheng, Zhibin Yu, Haiyong Zheng, Yang Yang, and Heng Tao Shen. 2022. One-shot image-to-image translation via part-global learning with a multi-adversarial framework. IEEE Transactions on Multimedia 24 (2022), 480–491.
[61]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV ’17), 2242–2251.
[62]
Mingrui Zhu, Changcheng Liang, Nannan Wang, Xiaoyu Wang, Zhifeng Li, and Xinbo Gao. 2021. A sketch-transformer network for face photo-sketch synthesis. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI-21), 1352–1358.
[63]
Mingrui Zhu, Zicheng Wu, Nannan Wang, Heng Yang, and Xinbo Gao. 2023. Dual conditional normalization pyramid network for face photo-sketch synthesis. IEEE Transactions on Circuits and Systems for Video Technology 33, 9 (2023), 5200–5211.

Cited By

View all
  • (2024)Correlation-aware Cross-modal Attention Network for Fashion Compatibility Modeling in UGC SystemsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3698772Online publication date: 5-Oct-2024
  • (2024)Efficiently Gluing Pre-trained Language and Vision Models for Image CaptioningACM Transactions on Intelligent Systems and Technology10.1145/3682067Online publication date: 29-Jul-2024
  • (2024)Dual-path Collaborative Generation Network for Emotional Video CaptioningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681603(496-505)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. Few-Shot Face Sketch-to-Photo Synthesis via Global-Local Asymmetric Image-to-Image Translation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 10
    October 2024
    729 pages
    EISSN:1551-6865
    DOI:10.1145/3613707
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2024
    Online AM: 20 July 2024
    Accepted: 20 May 2024
    Revised: 13 May 2024
    Received: 03 January 2024
    Published in TOMM Volume 20, Issue 10

    Check for updates

    Author Tags

    1. Face sketch-to-photo synthesis
    2. image-to-image translation
    3. global-local face fusion

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)254
    • Downloads (Last 6 weeks)53
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Correlation-aware Cross-modal Attention Network for Fashion Compatibility Modeling in UGC SystemsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3698772Online publication date: 5-Oct-2024
    • (2024)Efficiently Gluing Pre-trained Language and Vision Models for Image CaptioningACM Transactions on Intelligent Systems and Technology10.1145/3682067Online publication date: 29-Jul-2024
    • (2024)Dual-path Collaborative Generation Network for Emotional Video CaptioningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681603(496-505)Online publication date: 28-Oct-2024
    • (2024)Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657727(229-239)Online publication date: 10-Jul-2024

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media