Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Generating Anime Faces From Human Faces With Adversarial Networks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

GENERATING ANIME FACES FROM HUMAN FACES WITH ADVERSARIAL

NETWORKS
1
Yu-Jing Lin (林裕景), 1 Chiou-Shann Fuh (傅楸善)
1
Department of Computer Science and Information Engineering,
National Taiwan University, Taipei, Taiwan

ABSTRACT images. Moreover, Li, Y. et al. [6] enhanced this kind of


photorealistic image style transfer to have a better inference
The generative adversarial network has achieved a huge suc-
in shorter processing time.
cess in the generative algorithm. Besides the generating hand-
However, this kind of methods above requires to tune pa-
written digits, human faces, indoor designs, and many other
rameters in order to find the best result. In 2017, Isola, P. et
images from noise, more and more researchers applied the
al. introduced the image-to-image translation [7], which is
adversarial techniques on style transfer task, which is also re-
also called pix2pix. By utilizing U-Net on paired data from
garded as domain transfer task, in the past three years. In this
two different domains, pix2pix transfers an image from one
work, we aim at generating anime-style faces from human
domain to the other and vice versa in a robust way. U-Net
faces in the real world. We construct a Face2Anime Dataset,
performs well on paired image transfer task.
performed the generative adversarial learning on it, and eval-
uate the result at the end. In the real world, however, it is not practical to collect a
bunch of paired data for one task. The situation is that we
Index Terms— Anime Face Generation, Style Transfer, usually have data in domain X and in domain Y respectively.
Generative Adversarial Network, IPPR, CVGIP 2018. In the same year of 2017, CycleGAN [8], DiscoGAN [9], and
DualGAN [10] all reveal the domain-to-domain transfer via
1. INTRODUCTION deep learning at the same time, although the Taigman, Y. et
al. [11] proposed an unsupervised adversarial domain trans-
In 2004, Goodfellow, I. et al. [1] introduced the genera- fer network in the previous year. The ideas of Zhu, J. et al.
tive adversarial network (GAN), which is a brilliant deep [8], Kim, T. et al. [9], Yi, Z. et al. [10] are basically the same
learning method generating spurious data of a given distri- and simple: generate images from images by GAN and re-
bution. Despite its huge difficulty of generation, people re- tain the consistency. By a pair of GANs, one converts images
alized that neural networks are able to create meaningful from domain X to domain Y and the other converts those from
things. Radford, A. et al. [2] proposed an improved architec- domain Y to domain X, these methods constructed a deep
ture called deep convolutional generative adversarial network learning-based style transfer system successfully. The most
(DCGAN), which generates much better images. Later, Ar- famous example is the zebra-to-horse. There are other exam-
jovsky, M. et al. [3] stabilized the training procedure of DC- ples of bidirectional style transfer depicted by the authors of
GAN by utilizing several tricks in training and named the new CycleGAN [8], such as Monet-to-photo, summer-to-winter,
architecture as Wasserstein GAN (WGAN). The blooming apple-to-orange, etc.
development of GAN began since these hopeful techniques For anime image generation, there are also several bril-
were introduced. liant methods via generative adversarial networks. Jin, Y. et
When it comes to style transfer via deep learning, Gatys, al. [12] proposed a conditional anime character GAN based
A. et al. [4] brought a deep learning-based algorithm into on DRAGAN [13], which is inspired by ACGAN [14]. Liu,
the world in 2015. By minimizing the content loss and the Y. et al. used conditional GAN to generate painted colorful
style loss of the activation outputs of inner layers between a images from hand-drawn sketches. Zhang, L. and Ji, Y. and
content image and a style image, we can transfer the style of Lin, X. also integrated residual U-Net with ACGAN [14] to
style image onto the content image. Not only had the authors paint gray-scale sketches. All these works show the power of
revealed the power of nonlinear multi-layer perceptrons, but generative adversarial networks on anime images.
this method was the first big success of style transfer in the In our Face2Anime, we are going to introduce a way to
field of deep learning. Two years later, The Luan, F. et al. generate anime-style faces from real human faces. We take
[5] improved the method as the delicate algorithm known as advantage of the ability of generalization from GAN for this
neural style, which achieves a much better result of styled kind of style transfer on unpaired images. We first gathered
numerous human faces from public face datasets and anime In the training of GAN, D iteratively takes real images
faces from the Internet. Then we applied the CycleGAN on from the dataset and fake images from G, and then we update
these data to train a pair of generators. The one from real to the network according to the above equation via backpropa-
anime is our target generator. And we will show the generated gating the gradients through trainable parameters in the whole
faces from faces in the datasets as well as unseen faces in the network.
section of experiments.
2.2. CycleGAN
2. RELATED WORKS
In CycleGAN, there are a pair of GAN, GXY , GY and GY X ,
Face2Anime is related to the generative adversarial network, GX , where X and Y are two different domains. CycleGAN
especially the CycleGAN. We are going to go through the works in the way Figure 2 shows. The GXY generates fake
architectures and the objective functions they try to minimize. images in domain Y from those in domain X and DY evalu-
ates the images in domain Y . For GY X and DX , they work
in an inverse way. CycleGAN takes not only one objective
2.1. Generative Adversarial Network
function into consideration in this respect.
The generative adversarial network (GAN) is comprised of a
pair of networks: generator (G) and discriminator (D). As de- 2.2.1. Adversarial Loss
picted in Figure 1, the generator outputs images from random
noise while the discriminator tries to determine whether the Firstly, since CycleGAN is a kind of generative adversarial
input image is real or fake, which is generated by G, by giving networks, we have the typical GAN loss called adversarial
a score to the image. The score from D is range from 0 (fake) loss:
to 1 (real). Generator and discriminator compete against each
other and get improved iteratively. In the end, the generator LGAN (GXY , DY , X, Y )
will be able to generate images similar to the images in the = Ey∼pdata (y) [log DY (y)] (4)
training dataset and the discriminator cannot tell them from + Ex∼pdata (x) [log (1 − DY (GXY (x)))]
the real ones.
The following equations show the objective function of LGAN is actually the same as the summation of Equa-
GAN. The discriminator wants to minimize both the discrim- tion 1 and Equation 2 described in Section 2.1.
inating loss LD (Equation 1) and the generating loss LG
(Equation 2) while the generator tries to maximize LG . 2.2.2. Cycle Consistency Loss

LD = Ex∼pdata (x) [log D(x)] (1) The critical part of unpaired domain transfer is to use a pair
of GAN. For a generated fake image in domain Y , the GY X
is supposed to have the ability of converting it back. To make
LG = Ez∼pz (z) [log (1 − D(G(z)))] (2) sure GXY and GY X converts an image to domain Y and back
The total objective function is the sum of discriminating to domain X. The cycle consistency loss is introduced as
loss and generating loss: Figure 3 shows and as the following equation:

min max V (D, G) = LD + LG (3) Lcyc (GXY , GY X )


G D
= Ex∼pdata (x) [||GY X (GXY (x))||1 ] (5)
+ Ey∼pdata (y) [||GXY (GY X (y))||1 ]

Fig. 1: Generative adversarial network. Fig. 2: CycleGAN.


Fig. 4: High level procedure of Anime2Face.

Fig. 3: Cycle-consistency loss.

The anime faces took us a lot of work to prepare because


2.2.3. Full Objective Function there are few suitable anime face dataset. We turned to collect
a bunch of images from anime image sites, such as Danbooru,
The full objective function of the whole network is, therefore, by Fährmann, M.’s tool gallery-dl [16]. Then we detect the
a summation of these two kinds of loss functions. The param- anime faces in each image and crop them in a proper size
eter λ controls the influence of cycle consistency in training. to form an anime face dataset. The last step was to clean the
In face, λ is the critical parameter in CycleGAN train- dataset because there were some misdetected faces, which are
ing. A network with too small λ is hard to generate samples just a part of a face or even not a face. In the end, we built a
consistent with the given data; a network with too large λ, dataset [15] with 5,025 anime faces of size 64x64.
however, will have difficulty to impose changes to the data.
For human faces, we utilize the existing public face
L(GXY , GY X , DX , DY ) dataset: CelebFaces Attributes Dataset [17] and SCUT-
= LGAN (GXY , DY , X, Y ) FBP5500 Dataset [18]. CelebFaces Attributes Dataset, or
(6) CelebA, is a large-scale dataset of face images with anno-
+ LGAN (GY X , DX , Y, X)
tated attributes. The size of images in CelebA is 178x218
+ λLcyc (GXY , GY X )
so we cropped only the faces and resized them as 64x64.
DX and DY attempt to maximize the total loss while Moreover, we only took 40,000 of 202,599 images as our
GXY and GY X’s goals are minimizing it. The parameters human face training data. Apart from CelebA, we also use
of the whole network are then updated according to the fol- SCUT-FBP5500 Dataset (FBP5500), which is a dataset for
lowing objective function: facial beauty prediction, collected by South China University
of Technology. FBP5500 is in 64x64 and the faces are located
in a proper location (center) in the 5,000 images. In our ex-
G∗XY , G∗Y X = arg min max L(GXY , GY X , DX , DY )
GXY , DX , (7) periments, we are going to not only evaluate the performance
GY X DY
of CycleGAN on style transfer but the difference between two
After numerous iterations, G∗XY and G∗Y X are the final human face datasets.
powerful domain transfer generators.

3. FACE2ANIME

We applied the well-designed CycleGAN on generating


anime faces from real faces. Although the CycleGAN works
for transferring images in both ways, it is still difficult to gen-
erate real faces from anime ones. Therefore, we focus on only
the one from real to anime.

3.1. Face2Anime Dataset


The data are the most important to this task. Without proper
data, there are no ways to learn a set of parameters for our
CycleGAN. As a result, we first constructed our Face2Anime
Dataset [15], including anime face dataset, cropped CelebA
dataset, and SCUT-FBP5500 dataset. Then we trained the
CycleGAN as Figure 5 shows. Fig. 5: Face2Anime dataset.
Fig. 7: U-Net.

Fig. 6: Residual block. 3.2.3. Discriminator

We construct a basic 3-layer convolutional neural network as


our discriminator. The output of the discriminator is a 4x4x1
3.2. Model Architecture
score map which represents the realistic scores of a given im-
For tasks such as image generating or style transfer via gener- age. By calculating the binary cross entropy loss function be-
ative adversarial networks, the architecture of generator and tween each score with the ground truth label (1 for real image;
discriminator also matters. A good model comprehends the 0 for fake image), the discriminator learns how to distinguish
data and thus is able to learn to generate plausible data. There images.
are also various parameters we can set to train a CycleGAN:
learning rate, cycle consistency factor λ, the hidden size of
each layer, etc. We in this work, however, not going to dis- 3.3. Training Details
cuss too many of them but only the difference between data
In order to have high-quality results, we studied several tech-
sources and between model architectures, which are the most
niques to improve GAN training [21] and utilized on our
important two factors in deep learning.
model, such as the dropout layer [22] and the normalization
layer. The dropout layers in generator prevent it overfit in
3.2.1. Residual Block Generator training data so as to create various fake images. While the
Kim, T. et al. [9] of DiscoGAN and Salimans, T. et al. [21]
We first chose the residual block generator, which consists of
of the technique report suggest using the batch normalization
9 residual blocks (Figure 6), proposed by Zhu, J. et al. [8] in
[23], we adopted the other normalization method - instance
the original paper.
normalization [24] since the Zhu, J. et al. [8] reached awe-
Residual block generator is basically a sequence of resid-
some results in CycleGAN. Besides, we randomly flipped the
ual blocks. A residual block in Figure 6 is inspired from the
images while training for data augmentation.
residual network proposed by He, K. et al. [19]. It takes a
nonlinear transformation F of input x and then sums the out-
put F (x) and original x, known as short-cut path, together. 4. EXPERIMENTS
The residual architecture keeps the information of inputs and
prevents hidden units from dying (always output zeros). In We conducted four experiments of the following settings:
our Face2Anime, we apply an instance normalization after
each convolutional layer.
1. Residual block generator on cropped CelabA

3.2.2. U-Net Generator 2. Residual block generator on SCUT-FBP5500


In the U-Net, described in Figure 7, the output features from
the decoder are passed to and concatenated with the input of 3. U-Net generator on cropped CelebA
the encoder. Although U-Net generator results in mode col-
lapse, which is stated by Jin, X. et al. [20] according to their 4. U-Net generator on SCUT-FBP5500
experiment. There are still some drastically successful cases
trained by U-Net generator in other works for style transfer, The following figures illustrate the result of each setting.
e.g. pix2pix [7], so we also utilized the U-Net with 256 hidden For each set of images, the upper row contains source images
units as our alternative generative model. and the lower row contains generated images.
Fig. 8: Faces generated by residual block generator trained on cropped CelebA. Residual block generator turns human faces
into anime faces. The results (in the second row) look like artistic style paints.

Fig. 9: Faces generated by residual block generator trained on SCUP-FBP5500. Residual block generator on SCUP-FBP5500
works as well as on CelebA, implying that it is feasible to do adversarial generating data on both human face datasets.

Fig. 10: Faces generated by U-Net generator trained on cropped CelebA. U-Net generator also generates artistic style images.
Moreover, we can see that the shapes of output images look more similar to the original ones, showing that the U-Net imposes
the constraints on shapes of the content and only change the texture of an image. To our surprising, the results look like statues,
which are made up of polyhedron.

Fig. 11: More generated samples on SCUT-FBP5500. Some of them look crazy but cool. The left faces are typical images from
FBP5500. The images on the right side are framed by round borders, which are probably from profile pictures. Although some
of the training images are not square, the Face2Anime CycleGAN still works well.
Fig. 12: Novel faces which are unseen during training. The generators are able to generate anime-style faces from unseen human
faces. This result shows the generalization of CycleGAN. However, some faces are only slightly changed in my experiment.

Fig. 13: Some samples of anime-to-human faces. Although it is quite difficult for a generator to generate real human faces from
anime faces. There are still some successful samples from the generated testing images. Figure 13 demonstrates some of them.
The generator from anime to real do learn some human face textures, such as smoother skin, smaller eyes, lower contrast hair
color, a straight nose, etc.

5. CONCLUSION Courville, and Yoshua Bengio, “Generative adversar-


ial nets,” in Advances in neural information processing
We demonstrate a human-to-anime style transfer on faces systems, 2014, pp. 2672–2680.
via CycleGAN called Face2Anime. We successfully cre-
ate a bunch of anime-style faces from human faces by [2] Alec Radford, Luke Metz, and Soumith Chintala, “Un-
Face2Anime. There are no much differences in training on supervised representation learning with deep convolu-
the CelebA dataset and anime face dataset compared to on tional generative adversarial networks,” arXiv preprint
the SCUT-FBP5500 dataset despite part of the images in arXiv:1511.06434, 2015.
FBP5500 are round-framed. The images created by the resid-
[3] Martin Arjovsky, Soumith Chintala, and Léon Bottou,
ual block generator diverge from those generated by the U-
“Wasserstein gan,” arXiv preprint arXiv:1701.07875,
Net generator. The former generated an artistic style paints
2017.
while the latter only changes the textures and preserve the
edges of original contents. [4] Leon A Gatys, Alexander S Ecker, and Matthias Bethge,
The quality of the generated images, however, is not ideal “A neural algorithm of artistic style,” arXiv preprint
enough. The results shown in Section 4 are merely a little arXiv:1508.06576, 2015.
part of all generated images, and most of them actually do
not look like a human face or an anime face. Also, the sta- [5] Fujun Luan, Sylvain Paris, Eli Shechtman, and
bility of training Face2Anime is poor since it is common that Kavita Bala, “Deep photo style transfer,” CoRR,
the CycleGAN generates only ”abstract paintings” at the end. abs/1703.07511, vol. 2, 2017.
Therefore, we will keep improving the quality and stability of [6] Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang,
Face2Anime in the future. and Jan Kautz, “A closed-form solution to photorealistic
On the other hand, in this work, we achieved the success image stylization,” arXiv preprint arXiv:1802.06474,
on style transfer between realistic faces and anime faces. The 2018.
next step is to generate real anime faces, which are not just
involved in changing textures. [7] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A
Efros, “Image-to-image translation with conditional ad-
REFERENCES versarial networks,” arXiv preprint, 2017.

[1] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, [8] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A
Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Efros, “Unpaired image-to-image translation using
cycle-consistent adversarial networks,” arXiv preprint [21] Tim Salimans, Ian Goodfellow, Wojciech Zaremba,
arXiv:1703.10593, 2017. Vicki Cheung, Alec Radford, and Xi Chen, “Improved
techniques for training gans,” in Advances in Neural
[9] Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jungkwon Information Processing Systems, 2016, pp. 2234–2242.
Lee, and Jiwon Kim, “Learning to discover cross-
domain relations with generative adversarial networks,” [22] Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky,
arXiv preprint arXiv:1703.05192, 2017. Ilya Sutskever, and Ruslan R Salakhutdinov, “Improv-
ing neural networks by preventing co-adaptation of fea-
[10] Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong, “Du- ture detectors,” arXiv preprint arXiv:1207.0580, 2012.
algan: Unsupervised dual learning for image-to-image
translation,” arXiv preprint, 2017. [23] Sergey Ioffe and Christian Szegedy, “Batch nor-
malization: Accelerating deep network training by
[11] Yaniv Taigman, Adam Polyak, and Lior Wolf, “Un- reducing internal covariate shift,” arXiv preprint
supervised cross-domain image generation,” arXiv arXiv:1502.03167, 2015.
preprint arXiv:1611.02200, 2016.
[24] Dmitry Ulyanov, Andrea Vedaldi, and Victor S. Lempit-
[12] Yanghua Jin, Jiakai Zhang, Minjun Li, Yingtao Tian, sky, “Instance normalization: The missing ingredient
Huachun Zhu, and Zhihao Fang, “Towards the au- for fast stylization,” CoRR, vol. abs/1607.08022, 2016.
tomatic anime characters creation with generative ad-
versarial networks,” arXiv preprint arXiv:1708.05509,
2017.

[13] Naveen Kodali, Jacob Abernethy, James Hays, and Zsolt


Kira, “On convergence and stability of gans,” arXiv
preprint arXiv:1705.07215, 2017.

[14] Augustus Odena, Christopher Olah, and Jonathon


Shlens, “Conditional image synthesis with auxiliary
classifier gans,” arXiv preprint arXiv:1610.09585, 2016.

[15] Yu-Jing Lin, “Face2Anime dataset,”


https://drive.google.com/open?id=
1X3QUrTI6629vSOJepbE3-8M2dKAkIbjp,
2018.

[16] Mike Fährmann, “gallery-dl: Command-line program


to download image galleries and collections from pixiv,
exhentai, danbooru and more,” https://github.
com/mikf/gallery-dl, 2014.

[17] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang,
“Deep learning face attributes in the wild,” in Proceed-
ings of International Conference on Computer Vision
(ICCV), 2015.

[18] Lingyu Liang, Luojun Lin, Lianwen Jin, Duorui Xie,


and Mengru Li, “Scut-fbp5500: A diverse bench-
mark dataset for multi-paradigm facial beauty predic-
tion,” 2018.

[19] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian


Sun, “Deep residual learning for image recognition,” in
Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.

[20] Xiaohan Jin, Ye Qi, and Shangxuan Wu, “Cyclegan


face-off,” arXiv preprint arXiv:1712.03451, 2017.

You might also like