Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access
Just Accepted

Few-shot face sketch-to-photo synthesis via global-local asymmetric image-to-image translation

Online AM: 20 July 2024 Publication History

Abstract

Face sketch-to-photo synthesis is widely used in law enforcement and digital entertainment, which can be achieved by image-to-image (I2I) translation. Traditional I2I translation algorithms usually regard the bidirectional translation of two image domains as two symmetric processes, so the two translation networks adopt the same structure. However, due to the scarcity of face sketches and the abundance of face photos, the sketch-to-photo and photo-to-sketch processes are asymmetric. Considering this issue, we propose a few-shot face sketch-to-photo synthesis model based on asymmetric I2I translation, where the sketch-to-photo process uses a feature-embedded generating network, while the photo-to-sketch process uses a style transfer network. On this basis, a three-stage asymmetric training strategy with style transfer as the trigger is proposed to optimize the proposed model by utilizing the advantage that the style transfer network only needs few-shot face sketches for training. Additionally, we discover that stylistic differences between the global and local sketch faces lead to inconsistencies between the global and local sketch-to-photo processes. Thus, a dual branch of the global face and local face is adopted in the sketch-to-photo synthesis model to learn the specific transformation processes for global structure and local details. Finally, the high-quality synthetic face photo can be generated through the global-local face fusion sub-network. Extensive experimental results demonstrate that the proposed Global-Local ASymmetric image-to-image translation algorithm (GLAS) compared to SOTA methods, at least improves FSIM by 0.0126, and reduces LPIPS (alex), LPIPS (squeeze), and LPIPS (vgg) by 0.0610, 0.0883, and 0.0719, respectively.

References

[1]
Jie Cao, Luanxuan Hou, Ming-Hsuan Yang, Ran He, and Zhenan Sun. 2021. ReMix: Towards image-to-image translation with limited data. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 15013–15022.
[2]
Yuanqi Chen, Xiaoming Yu, Shan Liu, Wei Gao, and Ge Li. 2022. Zero-shot unsupervised image-to-image translation via exploiting semantic attributes. Image Vision Comput. 124, C (2022), 10.
[3]
Kun Cheng, Mingrui Zhu, Nannan Wang, Guozhang Li, Xiaoyu Wang, and Xinbo Gao. 2023. Controllable face sketch-photo synthesis with flexible generative priors. In Proceedings of the 31st ACM International Conference on Multimedia (MM’23). 6959–6968.
[4]
Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 8789–8797.
[5]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive angular margin loss for deep face recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 4685–4694.
[6]
Shuchao Duan, Zhenxue Chen, Q. M. Jonathan Wu, Lei Cai, and Dan Lu. 2021. Multi-scale gradients self-attention residual learning for face photo-sketch transformation. IEEE Transactions on Information Forensics and Security 16 (2021), 1218–1230.
[7]
Hajar Emami, Majid Moradi Aliabadi, Ming Dong, and Ratna Babu Chinnam. 2021. SPA-GAN: Spatial attention GAN for image-to-image translation. IEEE Transactions on Multimedia 23 (2021), 391–401.
[8]
Junlin Han, Mehrdad Shoeiby, Lars Petersson, and Mohammad Ali Armin. 2021. Dual contrastive learning for unsupervised image-to-image translation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’21). 746–755.
[9]
Trang-Thi Ho, John Jethro Virtusio, Yung-Yao Chen, Chih-Ming Hsu, and Kai-Lung Hua. 2020. Sketch-guided deep portrait generation. ACM Trans. Multimedia Comput. Commun. Appl. 16, 3 (2020), 18.
[10]
Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. ArXiv abs/1602.07360 (2016).
[11]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5967–5976.
[12]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision – ECCV 2016. 694–711.
[13]
Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. 2017. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 1857–1865.
[14]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.
[15]
Jianxin Lin, Yingxue Pang, Yingce Xia, Zhibo Chen, and Jiebo Luo. 2020. TuiGAN: Learning versatile image-to-image translation with two unpaired images. In Computer Vision – ECCV 2020. 18–35.
[16]
Jianxin Lin, Yijun Wang, Zhibo Chen, and Tianyu He. 2020. Learning to Transfer: Unsupervised domain translation via meta-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11507–11514.
[17]
Jianxin Lin, Yingce Xia, Sen Liu, Shuxin Zhao, and Zhibo Chen. 2021. ZstGAN: An adversarial approach for unsupervised zero-shot image-to-image translation. Neurocomput. 461, C (2021), 327–335.
[18]
Nie Lin, Lingbo Liu, Zhengtao Wu, and Wenxiong Kang. 2022. Unconstrained face sketch synthesis via perception-adaptive network and a new benchmark. Neurocomputing 494 (2022), 192–202.
[19]
Ye Lin, Shenggui Ling, Keren Fu, and Peng Cheng. 2020. An identity-preserved model for face sketch-photo synthesis. IEEE Signal Processing Letters 27 (2020), 1095–1099.
[20]
Yupei Lin, Sen Zhang, Tianshui Chen, Yongyi Lu, Guangping Li, and Yukai Shi. 2022. Exploring negatives in contrastive learning for unpaired image-to-image translation. In Proceedings of the 30th ACM International Conference on Multimedia (MM’22). 1186–1194.
[21]
Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019. Few-shot unsupervised image-to-image translation. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV’2019). 10550–10559.
[22]
Qingshan Liu, Xiaoou Tang, Hongliang Jin, Hanqing Lu, and Songde Ma. 2005. A nonlinear approach for face sketch synthesis and recognition. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. 1005–1010.
[23]
Aleix Martinez and Robert Benavente. 1998. The AR Face Database: CVC technical report, 24.
[24]
K. Messer, J. Matas, J. Kittler, Juergen Luettin, and Gilbert Maître. 1999. XM2VTSDB: The Extended M2VTS Database. Proc. Second International Conference on Audio- and Video-based Biometric Person Authentication (AVBPA’99), 965–966.
[25]
Taesung Park, Alexei A. Efros, Richard Zhang, and Jun-Yan Zhu. 2020. Contrastive learning for unpaired image-to-image translation. In Computer Vision – ECCV 2020. 319–345.
[26]
Chunlei Peng, Congyu Zhang, Decheng Liu, Nannan Wang, and Xinbo Gao. 2023. Face photo–sketch synthesis via intra-domain enhancement. Know.-Based Syst. 259, C (jan 2023), 12.
[27]
Chunlei Peng, Congyu Zhang, Decheng Liu, Nannan Wang, and Xinbo Gao. 2023. HiFiSketch: High Fidelity Face Photo-Sketch Synthesis and Manipulation. IEEE Transactions on Image Processing 32 (2023), 5865–5876.
[28]
P.J. Phillips, Hyeonjoon Moon, S.A. Rizvi, and P.J. Rauss. 2000. The FERET evaluation methodology for face-recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 10 (2000), 1090–1104.
[29]
Fabio Pizzati, Jean-François Lalonde, and Raoul de Charette. 2022. ManiFest: Manifold deformation for few-shot image translation. In Computer Vision – ECCV 2022. 440–456.
[30]
Xingqun Qi, Muyi Sun, Zijian Wang, Jiaming Liu, Qi Li, Fang Zhao, Shanghang Zhang, and Caifeng Shan. 2023. Biphasic Face Photo-Sketch Synthesis via Semantic-Driven Generative Adversarial Network With Graph Representation Learning. IEEE Transactions on Neural Networks and Learning Systems (2023), 1–14.
[31]
Abduljalil Radman, Amer Sallam, and Shahrel Azmin Suandi. 2022. Deep residual network for face sketch synthesis. Expert Systems with Applications 190 (2022), 115980.
[32]
Abduljalil Radman and Shahrel Azmin Suandi. 2021. BiLSTM regression model for face sketch synthesis using sequential patterns. Neural Comput. Appl. 33, 19 (oct 2021), 12689–12702.
[33]
Sam T. Roweis and Lawrence K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 5500 (2000), 2323–2326.
[34]
Kuniaki Saito, Kate Saenko, and Ming-Yu Liu. 2020. COCO-FUNIT: Few-shot unsupervised image translation with a content conditioned style encoder. In Computer Vision – ECCV 2020. 382–398.
[35]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).
[36]
Hao Tang, Hong Liu, Dan Xu, Philip H. S. Torr, and Nicu Sebe. 2023. AttentionGAN: Unpaired image-to-image translation using attention-guided generative adversarial networks. IEEE Transactions on Neural Networks and Learning Systems 34, 4 (2023), 1972–1987.
[37]
Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, and Yan Yan. 2019. Multi-channel attention selection GAN with cascaded semantic guidance for cross-view image translation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 2412–2421.
[38]
Xiaoou Tang and Xiaogang Wang. 2003. Face sketch synthesis and recognition. In Proceedings Ninth IEEE International Conference on Computer Vision (ICCV’09). 687–694.
[39]
Dmitrii Torbunov, Yi Huang, Haiwang Yu, Jin Huang, Shinjae Yoo, Meifeng Lin, Brett Viren, and Yihui Ren. 2023. UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired image-to-image translation. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV’23). 702–712.
[40]
Nannan Wang, Xinbo Gao, Dacheng Tao, and Xuelong Li. 2011. Face sketch-photo synthesis under multi-dictionary sparse representation framework. In 2011 Sixth International Conference on Image and Graphics. 82–87.
[41]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 8798–8807.
[42]
Yaxing Wang, Salman Khan, Abel Gonzalez-Garcia, Joost van de Weijer, and Fahad Shahbaz Khan. 2020. Semi-supervised learning for few-shot image-to-image translation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 4452–4461.
[43]
Yijun Wang, Tao Liang, and Jianxin Lin. 2022. CACOLIT: Cross-domain adaptive co-learning for imbalanced image-to-image translation. In Proceedings of the 30th ACM International Conference on Multimedia (MM’22). 1068–1076.
[44]
Shuai Yang, Zhangyang Wang, Jiaying Liu, and Zongming Guo. 2021. Controllable sketch-to-image translation for robust face synthesis. IEEE Transactions on Image Processing 30 (2021), 8797–8810.
[45]
Ran Yi, Yong-Jin Liu, Yu-Kun Lai, and Paul L. Rosin. 2019. APDrawingGAN: Generating artistic portrait drawings from face photos with hierarchical GANs. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 10735–10744.
[46]
Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017. DualGAN: Unsupervised dual learning for image-to-image translation. In 2017 IEEE International Conference on Computer Vision (ICCV’17). 2868–2876.
[47]
Jun Yu, Xingxin Xu, Fei Gao, Shengjie Shi, Meng Wang, Dacheng Tao, and Qingming Huang. 2021. Toward realistic face photo–sketch synthesis via composition-aided GANs. IEEE Transactions on Cybernetics 51, 9 (2021), 4350–4362.
[48]
Shikang Yu, Hu Han, Shiguang Shan, and Xilin Chen. 2023. CMOS-GAN: Semi-supervised generative adversarial model for cross-modality face image synthesis. IEEE Transactions on Image Processing 32 (2023), 144–158.
[49]
Wangbo Yu, Mingrui Zhu, Nannan Wang, Xiaoyu Wang, and Xinbo Gao. 2023. An efficient transformer based on global and local self-attention for face photo-sketch synthesis. IEEE Transactions on Image Processing 32 (2023), 483–495.
[50]
Masoumeh Zareapoor and Jie Yang. 2021. Equivariant adversarial network for image-to-image translation. ACM Trans. Multimedia Comput. Commun. Appl. 17, 2 (2021), 14.
[51]
Liliang Zhang, Liang Lin, Xian Wu, Shengyong Ding, and Lei Zhang. 2015. End-to-end photo-sketch generation via fully convolutional representation learning. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR’15). 627–634.
[52]
Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang. 2011. FSIM: A feature similarity index for image quality assessment. IEEE Transactions on Image Processing 20, 8 (2011), 2378–2386.
[53]
Mingjin Zhang, Jie Li, Nannan Wang, and Xinbo Gao. 2018. Compositional Model-Based Sketch Generator in Facial Entertainment. IEEE Transactions on Cybernetics 48, 3 (2018), 904–915.
[54]
Mingjin Zhang, Yunsong Li, Nannan Wang, Yuan Chi, and Xinbo Gao. 2020. Cascaded Face Sketch Synthesis Under Various Illuminations. IEEE Transactions on Image Processing 29 (2020), 1507–1521.
[55]
Mingjin Zhang, Nannan Wang, Yunsong Li, and Xinbo Gao. 2019. Deep Latent Low-Rank Representation for Face Sketch Synthesis. IEEE Transactions on Neural Networks and Learning Systems 30, 10 (2019), 3109–3123.
[56]
Mingjin Zhang, Nannan Wang, Yunsong Li, and Xinbo Gao. 2020. Bionic Face Sketch Generator. IEEE Transactions on Cybernetics 50, 6 (2020), 2701–2714.
[57]
Mingjin Zhang, Nannan Wang, Yunsong Li, and Xinbo Gao. 2020. Neural Probabilistic Graphical Model for Face Sketch Synthesis. IEEE Transactions on Neural Networks and Learning Systems 31, 7 (2020), 2623–2637.
[58]
Mingjin Zhang, Ruxin Wang, Xinbo Gao, Jie Li, and Dacheng Tao. 2019. Dual-transfer face sketch–photo synthesis. IEEE Transactions on Image Processing 28, 2 (2019), 642–657.
[59]
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 586–595.
[60]
Ziqiang Zheng, Zhibin Yu, Haiyong Zheng, Yang Yang, and Heng Tao Shen. 2022. One-shot image-to-image translation via part-global learning with a multi-adversarial framework. IEEE Transactions on Multimedia 24 (2022), 480–491.
[61]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In 2017 IEEE International Conference on Computer Vision (ICCV’17). 2242–2251.
[62]
Mingrui Zhu, Changcheng Liang, Nannan Wang, Xiaoyu Wang, Zhifeng Li, and Xinbo Gao. 2021. A sketch-transformer network for face photo-sketch synthesis. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. 1352–1358.
[63]
Mingrui Zhu, Zicheng Wu, Nannan Wang, Heng Yang, and Xinbo Gao. 2023. Dual conditional normalization pyramid network for face photo-sketch synthesis. IEEE Transactions on Circuits and Systems for Video Technology 33, 9 (2023), 5200–5211.

Index Terms

  1. Few-shot face sketch-to-photo synthesis via global-local asymmetric image-to-image translation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications Just Accepted
    EISSN:1551-6865
    Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Online AM: 20 July 2024
    Accepted: 20 May 2024
    Revised: 13 May 2024
    Received: 03 January 2024

    Check for updates

    Author Tags

    1. Face sketch-to-photo synthesis
    2. image-to-image translation
    3. global-local face fusion

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 161
      Total Downloads
    • Downloads (Last 12 months)161
    • Downloads (Last 6 weeks)161
    Reflects downloads up to 02 Sep 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media