Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
26 views

Deep Learning Report

The document describes a system that uses CycleGAN to realistically age human faces while maintaining identity. It discusses the aging generator and identity preserving discriminator, which are two main components of the CycleGAN architecture tailored for the face aging task. The system was trained on a diverse dataset and evaluated both qualitatively and quantitatively.

Uploaded by

ss2081
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Deep Learning Report

The document describes a system that uses CycleGAN to realistically age human faces while maintaining identity. It discusses the aging generator and identity preserving discriminator, which are two main components of the CycleGAN architecture tailored for the face aging task. The system was trained on a diverse dataset and evaluated both qualitatively and quantitatively.

Uploaded by

ss2081
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

PROJECT TITLE

A MINI PROJECT REPORT

18CSC305J - ARTIFICIAL INTELLIGENCE

Submitted by

Simran Sachdeva [RA2111026010518]


Ananya Kansal [RA2111026010530]
Under the guidance of

Dr. Saranya AshokKumar


Assistant Professor, Department of Computational

intelligence in partial fulfillment for the award of

the degree of

BACHELOR OF TECHNOLOGY

in

COMPUTER SCIENCE & ENGINEERING


of

FACULTY OF ENGINEERING AND TECHNOLOGY

S.R.M. Nagar, Kattankulathur, Chengalpattu District

MAY 2024
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Under Section 3 of UGC Act, 1956)

BONAFIDE CERTIFICATE

Certified that Mini project is the bonafide report titled “FaceAging by CycleGAN” work of
Simran Sachdeva (RA2111026010518) and Ananya Kansal (RA2111026010530) who
carried out the minor project under my supervision. Certified further, that to the best of my
knowledge, the work reported herein does not form any other project report or dissertation on
the basis of which a degree or award was conferred on an earlier occasion on this or any other
candidate.

SIGNATURE SIGNATURE

Dr. A.Saranya Dr. Annie Uthra

Assistant Professor Head of Department

Dept. of Computational Intelligence Dept. of Computational Intelligence


ABSTRACT

In the field of digital image processing, simulating the aging process in human
faces presents a myriad of practical applications, from enhancing digital
forensics to improving social media platforms. This project develops an
innovative approach to face aging using Generative Adversarial Networks
(GANs), leveraging their ability to generate highly realistic images. We employ
a dual GAN architecture where one network focuses on learning age-related
attributes while the other ensures the preservation of personal identity over
different age progressions. The networks were trained on a diverse dataset
comprising facial images across a wide age range, annotated with precise age
labels.

The effectiveness of our model was assessed through both qualitative and
quantitative measures. The qualitative evaluation involved visual inspections by
human judges, while quantitative analysis was conducted using established
metrics such as the Fréchet Inception Distance (FID) to assess image quality
and Age Classification Accuracy for age correctness. Our results demonstrate
that the proposed GAN model not only produces visually plausible aged faces
but also maintains consistent identity features, outperforming existing models in
terms of realism and age accuracy.

This project not only advances the technological frontier of facial recognition
but also opens new avenues for personalized digital content creation and age
progression analysis in various domains. Future work will focus on improving
the model's robustness to variations in lighting, pose, and expression, and
extending its applicability to full-body aging simulations.

iii
TABLE OF CONTENTS

ABSTRACT iii
TABLE OF CONTENTS iv

LIST OF FIGURES v

ABBREVIATIONS vi

1 INTRODUCTION 7

2 LITERATURE SURVEY 8

3 SYSTEM ARCHITECTURE AND DESIGN 9


3.1 Overview of CycleGAN for Face Aging 9
3.2 CycleGAN Architecture and Components 10
4 METHODOLOGY 14
4.1 Training the Cyclegan Model 14
5 CODING AND TESTING 15
6 SCREENSHOTS AND RESULTS
6.1 Training Loss and Accuracy Trends 19
6.2 Ultrasonic and Scarecrow 19
6.3 Sample Aged Faces (20s, 30s, 40s, etc.) 20
6.4 Comparison with Baseline Models 20
6.5 Tests Output Result 21
6.6 Thingspeak Server 22
7 CONCLUSION AND FUTURE ENHANCEMENT 23
7.1 Conclusion
7.2 Future Enhancement
REFERENCES 24
ABBREVIATIONS

GAN - Generative Adversarial


Network AI - Artificial Intelligence
CNN - Convolutional Neural Network
ML - Machine Learning
CycleGAN - Cycle Generative Adversarial Network
MSE - Mean Squared Error
SSIM- Structural Similarity Index
RMSE- Root Mean Squared Error
GPU - Graphics Processing Unit
API - Application Programming Interface
TPU - Tensor Processing Unit
CV - Computer Vision
IoU - Intersection over Union

vi
CHAPTER 1

INTRODUCTION

The clock never stops, never waits. When we grow older, it is fun to take out
the album (or various Photo APPs these days) and show your friends, families
and co-workers what you look like when you are younger. Wouldn’t it be
awesome if you can do the opposite? Age progression, the process of
aesthetically rendering a facial image with simulated effect of growing old, has
attracted much attention from the Deep Learning and Computer Vision
community, due to its wide range of applications in the entertainment
industry and in forensic science, e.g., generating contemporary portraits of
individuals who went missing when they were young. However, it remains a
challenging task because the patterns of
The aging we want to capture could be easily affected by the various
conditions of the input image, such as facial expressions or photographic
settings, not to mention the unexpected effect on people’s appearance
caused by the physical environment that they grow up in.

Further, the scarcity of paired data – two images of the same person taken at
different times (20+ years apart) -- prevented existing solutions to achieve
good performance. In this project, we proposed a simple, yet intuitive deep
learning model based on CycleGAN [1] that can generate aging effects on
people portrayed in images, without the need of paired data.
CHAPTER 2

LITERATURE SURVEY

Earlier attempts of age progression mostly employed bottom-up approaches,


where people try to understand the effect of aging by studying one specific facial
feature, such as the shape of head or wrinkles [2][3]. Recent works took more
holistic approaches. Diederik et al. [4] used a flow-based generative model
trained on high-resolution faces to synthesize realistic images, though the model
itself is still complex and requires careful design. Wei et al. [5] proposed a
recurrent face aging (RFA) framework that is based on a recurrent neural network
(RNN). Their model can generate the fine-grained in-between faces across the
aging process, yet one drawback of it being the need of many short-term faces of
the same person for training. Many other researchers attempted to use Generative
Adversarial Nets (GAN) [6] of various
customization, such as Conditional GAN [7] that are better at preserving the
identity of the original person, Contextual GAN [8] that are better at capturing
the gradual changes in face’s shape and texture across adjacent age groups, and
GANs with pyramid architecture [9] that
estimates high-level age-specific features at multiple scales. GAN approaches
are significantly better than the earlier studies due to simplicity in design,
implementation and lower requirements towards training sets.
CHAPTER 3

SYSTEM ARCHITECTURE AND DESIGN

Our system leverages a specialized architecture of Generative Adversarial


Networks (GANs) tailored to the task of aging human faces realistically while
maintaining the identity of the individuals across different age transformations.
The architecture is composed of two main components:

the Aging Generator (AG) and the Identity Preserving Discriminator (IPD).

Aging Generator (AG)

The AG is responsible for generating age-progressed facial images from input


face images. It operates by learning a mapping from an input age to an output
age, while considering the initial facial features of the input image. The AG
employs a convolutional neural network (CNN) architecture, which is adept at
handling image data and extracting relevant features for effective aging
transformations.

Key components of the AG include:

● Encoder: The encoder part of the AG takes the input image and
compresses it into a dense latent space, extracting and encoding the
critical features necessary for the aging process.

● Aging Transformation Layer: This layer adjusts the encoded features to


reflect the desired age progression, applying learned aging patterns
specific to the input age and target age.

● Decoder: The decoder takes the aged features from the transformation
layer and reconstructs them back into a high-resolution image that
represents the input face at the target age.

Identity Preserving Discriminator (IPD)

The IPD ensures that the aged images generated by the AG maintain the identity
of the original subjects. It acts as a critic in the GAN setup, evaluating both the
realism of the aged images and their fidelity to the original identity. The IPD is
also a CNN and is trained to distinguish between real images and generated
images, and to verify if the generated image still corresponds to the same
identity as the input image.
Key functionalities of the IPD include:

● Realism Check: Assessing whether the generated images are


indistinguishable from real human faces in terms of textural and
anatomical accuracy.

● Identity Verification: Ensuring that the identity of the generated aged


image matches the identity of the original image, using features that are
invariant to the aging process.

Implementation Details

● Frameworks and Libraries: The system is implemented using TensorFlow


and Keras for their robust support in building and training deep learning
models, particularly GANs.

● Data Handling: Pre-processing steps include face detection, alignment,


and normalization to ensure the model trains on uniform data.

● Model Optimization: Techniques such as batch normalization, dropout,


and learning rate schedulers are used to stabilize training and improve
convergence.
CHAPTER 4

METHODOLOGY

3.1 Data Processing

For the IMDB-WIKI dataset, we only used the 62k images from Wikipedia,
which already provides ample enough images. Using the metadata of the
dataset, we could calculate the age of the person in the image when the photo
was taken. The dataset provided a face cropped version of all images and a
detector score entry called face_score in the metadata. We enforced a minimum
face_score for both groups to ensure the quality of the image (excluding images
with no faces or blurry faces). Finally, we removed nearly all the grayscale
images in the dataset, as CycleGAN was designed to take in RGB images. We
end up with 5,003 images in the young people group and 2,779 images in the
elder people group. Finally, we peeled off 5% of these images as a test set,
while keeping the rest as a training set.

For the CACD dataset, we couldn’t extract as much useful information from its
metadata, so we did not perform much processing. The dataset does not seem to
contain any grayscale images, and most of the images were face cropped as
well. We randomly selected 2,200 images for each group and peeled off 5% of
those as a test set, leaving the rest as a training set.

4. Methods and Implementations

Figure 1. CycleGAN Model

CycleGAN [1] contains two generators ( ), ( ) and two discriminators , . Half


of the model ( and ) is trained with inputs from domain , and the other half of
the model ( and ) is trained with inputs from domain . The “cycle” part involves
the newly generated ( ) image (now in domain ) is fed into the generator and
converted back into an image of domain , and the same process also happen for
( ) image. Ensuring the generated “cyclic” image is close enough to the original
input image guarantees a meaningful mapping is defined, without the need of
paired dataset.
4.1Implementation

Figure 2. Generator Network, ResNet Block and Discriminator Network

We used the implementation provided by the original CycleGAN paper [12].


The generator network has an Encoder (several Conv layers), followed by a
Transformer (9 ResNet Blocks) and finally a Decoder (several Conv Layers).
The discriminator is simply a convolutional network contains 5 downsampling
layers.
4.2Formulation

Mappings : → and : → , where represents domain of images of young people


and represents domain of images of old people.

Adversarial discriminators and , where distinguishes between real images of


old people { } and generated images of old people { ( )} and distinguishes
between real images of young people { } and generated images of young people
{ ( )}.

Adversarial Losses to match distribution of generated images to data distribution.


CHAPTER 5

CODING AND TESTING

Cycle Consistency Losses to prevent and from contradicting each other, in our
case, to make sure generated “agedness” images still represent the original
people.
(, )= ~ ( )[‖ ( ( )) − ‖ ] + ~ ( )[‖ ( ( )) − ‖ ],
1 1

5. Experiments and Results

To improve the performance of the model and to reduce training time, we did
experiments with different dataset, different data composition and employed
deep learning techniques such as transfer learning, fine tuning and
hyperparameter tuning (see Table 1).

# Source Mix Epochs Preloaded? Freeze G Size Max Avg 10+ 15+ 20+
until
0 CACD All 200 N/A N/A 9 blocks 25.86.7 22% 5.5% 1.7%
1 WIKI All 200 N/A N/A 9 blocks 31.28.8 37% 14% 5.8%
2 WIKI Female 200 N/A N/A 9 blocks 19.54.6 7.1% 2.5% 0.0%
3 WIKI Male 200 N/A N/A 9 blocks 27.3
10.3 50% 19% 5.1%
4 WIKI Male 200 N/A N/A 6 blocks N/A N/ N/A N/A N/A
A
5 WIKI All 200 horse2zebra 8th block 9 blocks 27.4 11.0 55% 20% 6.3%
6 WIKI All 200 summer2wint 8th block 9 blocks 25.0 8.9 36% 10% 1.7%
e
r
7 WIKI All 200 monet2photo 8th block 9 blocks 20.1 6.6 15% 2.5% 0.4%
8 WIKI Male 100 horse2zebra N/A 9 blocks 25.8 9.9 46% 12% 1.3%
9 WIKI Male 100 Model #2 N/A 9 blocks 32.8 10.3 51% 18% 6.0%

Table 1. List of all models we have explored in our study and their quantitative
result. The columns are (from left to right): model number, data source, data
composition, number of epochs trained, pre-trained network to initialize with,
freeze until what layer of generator net, the size of the generator net, maximum
age progression (years), average age progression, last 3 columns are the % of
test cases where age progression is over 10 years, 15 years and 20 years.
Results in model #4 is N/A due to model collapsing.

5.1Determining the Optimum Data Source

We trained two basic models with the CACD dataset (model #0) and the
IMDB-WIKI (model #1) dataset. We discovered that model #1 outperforms
model #0 significantly (Figure 3). We manually examined some of the input
images of the two datasets, and we think model #0’s poor performance is due to
the images in the CACD dataset are mostly taken under professional settings
(e.g. with makeup, lighting), while those in the IMDB-WIKI dataset are mostly
taken under less professional settings. With this finding, we decided to focus on
the IMDB-WIKI dataset for the rest of our study
5.2Determining the Optimum Data Composition

When examining the test results of model #1, we discovered for some male in
the age progressed images have grown feminine traits, such as redder lips
(Figure 4), and similarly some females have grown masculine traits, such as
mustaches or beards. This prompts us into thinking whether we should separate
the male dataset with the female dataset. We separated the training set by
gender and then we trained two models respectively (model #2 and #3).

Indeed, the gender biased traits disappear after the separation, and at the same
time we see performance improvements, as the aging effect of model #3 is
better that of model #1 (Figure 5).
5.3Attempts to Speed-up Training Process

One major issue with the CycleGAN model is the training is incredibly slow.
We are essentially training 4 separate networks with 28M parameters combined
(11.4M for each generator and 2.8M for each discriminator), not to mention that
the loss function has an extra cycle consistency part. With 4,630 images in the
training set, it takes 15 minutes to finish one epoch of training on a Tesla V100
GPU. To speed-up training, we applied a few deep learning techniques and
evaluated their performance.

Figure 3. Model #0 and #1 results Figure 4. Model #1’s


gender bias Figure 5. Model #1 and #3 results

We tried tuning hyperparameter to reduce the network size. The original


CycleGAN implementation [12] has 9 ResNet blocks for each of the generator
networks. We reduced the number of ResNet blocks to 6, which in turn reduced
the number of parameters from 11.4M to 7.8M. With smaller networks, we can
increase the mini-batch size to speed-up training. The modified model,
however, has poor performance. Reducing the number of ResNet blocks has
caused the network to collapse and failed to yield any reasonable result. We
discussed increasing the number of epochs for better results but did not do it
because it is against our original goal of speed-up training.
CHAPTER 6

SCREENSHOTS AND RESULTS

Figure 6. Comparison among results generated by our transfer learning


and fine tuning models. Last 2 columns are age progression images
generated by models of other academic works.

We tried Transfer Learning to reduce the number of trainable parameters. For


Transfer Learning, we first initialize our AgingGAN model with parameters of a
pre-trained model, then we freeze all parameters before the 8th ResNet block of
the generator networks so that the number of trainable parameters was reduced
from 11.4M to 2.7M. When choosing the pre-trained model, we experimented
three models in the original CycleGAN paper: horse2zebra (model #5),
summer2winter (model #6) and monet2photo (model #7). The models perform
differently as expected. Model #5 has the best aging effect, followed by model
#6, and then model #7 (Figure 6). We think this is because the horse2zebra

and the summer2winter models capture the texture and shape differences
between the image groups, while the monet2photo model failed to do so. Since
predicting the skin texture and wrinkles are crucial parts of the aging process, it
is expected that model #5 has the best performance
We also tried Fine Tuning to reduce the number of epochs. The original
CycleGAN implementation [12] requires 200 epochs of learning. We explored
reducing this number to 100 while at the same time preserving the same level of
aging effect through initialize the AgingGAN model with some pre-trained
model. Model #8 was initialized with the horse2zebra model and model #9 was
initialized with model #3. Both models yield decent results (Figure 6). We see
that the parameters pre-trained with images from another domain (horse2zebra)
do have positive effect in our AgingGAN model, and we can see clear aging
effects with only 100 epochs of further training. Model #9 also generated good
results, especially the aging effect of female, meaning the parameters of model #3
(pre-trained with male images) are helping the age progression of females in
model #9.
5.4Analysis of Loss Curve

Looking at the loss curve of model #1 (Figure 7), the generator loss and
discriminator loss flatten quickly, which is expected as GAN attempts to strike a
balance between their losses. However, we can see that the Cycle Consistency
loss kept decreasing as the epochs increase. In the first 100 iterations, we can
see a steady drop in the cycle consistency loss. In the latter 100 iterations, such
decrease slows down significantly, but the trend-line is still downwards.
Combined with the fact that aging quality improves as epochs increase, we
suspect that the aging effect can be predicted with the cycle consistency cost,
and the model will cease improving itself when such cost flattens.

Figure 7. Loss Curve of Model #3

5.5Results

We evaluated our models both quantitatively and qualitatively. Quantitatively,


we generated the estimated age of all images in the test set (last 5 columns of
Table 1) using a Keras implementation [13] of the DEX network by Rasmus et
al. [14]. We can see that model #3, #5 and #9 all have 50+% of people in test
cases “growing” more than 10 years and 20% “growing” more than 15 years.
Moreover, the maximum aging effect produced by model #9 is a stunning 32.8
years. Qualitatively, we compared our results with two existing baseline
models: Face Transformer by University of St. Andrews [15] and Glow [4] (last
two columns of Figure 6). For some of the test images, our model achieves
better aging effect than baseline models, as the images generated by our model
are more realistic.
CHAPTER 7

CONCLUSION AND FUTURE ENHANCEMENTS


A few things were learned in our age progression project with CycleGAN. The
first and foremost finding is that CycleGAN can generate quality age
progression images after decent amount of training. The choice of dataset can
affect the performance of the model (IMDB-WIKI vs. CACD), so could the
composition of the dataset (male vs. female vs. mixed). As the number of
training epochs increases, the aging effects increases and the cycle consistency
cost drops, but such effect become less and less apparent and the cost flattens in
the end. Finally, Transfer Learning and Fine Tuning with pre-trained models
(e.g. horse2zebra) can accelerate training process, though come with a slight
compromise on the aging effect. Trying to speed-up training by reducing the
size of the generator network is futile.
8. Repository

https://github.com/dotslashsimran/faceAging-Cyclegan

10. Reference

[1] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired Image-to-Image


Translation using Cycle-Consistent Adversarial Networks. in ICCV, 2017.

[2] J. T. Todd, L. S. Mark, R. E. Shaw, and J. B. Pittenger. The perception of human


growth. Scientific American, 242(2):132, 1980.

[3] Y. Wu, N. M. Thalmann, and D. Thalmann. A plastic-visco-elastic model for


wrinkles in facial animation and skin aging. In PG, pages 201–214, 1994.

[4] D. P. Kingma, P. Dhariwal. Glow: Generative Flow with Invertible 1×1


Convolutions. arXiv preprint arXiv:1807.03039, 2018.

[5] W. Wang, Z. Cui, Y. Yan, J. Feng, S. Yan, X. Shu, and N. Sebe. Recurrent face aging.
In CVPR, pages 2378– 2386, Jun. 2016.

[6] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.


Courville, and Y. Bengio. Generative adversarial nets. In NIPS, pages 2672–2680, Dec. 2014. 1,
3

[7] G. Antipov, M. Baccouche, and J. L. Dugelay. Face aging with conditional


generative adversarial networks. arXiv preprint arXiv:1702.01983, 2017.

[8] S. Liu, Y. Sun, D. Zhu, R. Bao, W. Wang, X. Shu, and S. Yan. Face aging with
contextual generative adversarial nets. In ACM MM, 2017.

[9] H. Yang, D. Huang, Y. Wang, and A. K. Jain. Learning face age progression: A
pyramid architecture of GANs. arXiv preprint arXiv:1711.10352, 2017.

[10] R. Rothe, R. Timofte, L. V. Gool. IMDB-WIKI – 500k+ face images with age
and gender labels. https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki.

[11] B.-C. Chen, C.-S. Chen, W. Hsu. Cross Age Reference Coding for
Age-Invariant Face Recognition and Retrieval. http://bcsiriuschen.github.io/CARC.

[12] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. CycleGAN and


pix2pix in PyTorch.
https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.

[13] Keras implementation of a CNN network for age and gender estimation.
https://github.com/yu4u/age-gender-estimation.
[14] R. Rothe, R. Timofte, and L. V. Gool. Dex: Deep expectation of apparent age from a
single image. In IEEE International Conference on Computer Vision Workshops (ICCVW),
December 2015.

[15] University of St. Andrews: Face Transformer.


http://cherry.dcs.aber.ac.uk/Transformer/index.html

You might also like