Deep Learning Report
Deep Learning Report
Submitted by
the degree of
BACHELOR OF TECHNOLOGY
in
MAY 2024
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Under Section 3 of UGC Act, 1956)
BONAFIDE CERTIFICATE
Certified that Mini project is the bonafide report titled “FaceAging by CycleGAN” work of
Simran Sachdeva (RA2111026010518) and Ananya Kansal (RA2111026010530) who
carried out the minor project under my supervision. Certified further, that to the best of my
knowledge, the work reported herein does not form any other project report or dissertation on
the basis of which a degree or award was conferred on an earlier occasion on this or any other
candidate.
SIGNATURE SIGNATURE
In the field of digital image processing, simulating the aging process in human
faces presents a myriad of practical applications, from enhancing digital
forensics to improving social media platforms. This project develops an
innovative approach to face aging using Generative Adversarial Networks
(GANs), leveraging their ability to generate highly realistic images. We employ
a dual GAN architecture where one network focuses on learning age-related
attributes while the other ensures the preservation of personal identity over
different age progressions. The networks were trained on a diverse dataset
comprising facial images across a wide age range, annotated with precise age
labels.
The effectiveness of our model was assessed through both qualitative and
quantitative measures. The qualitative evaluation involved visual inspections by
human judges, while quantitative analysis was conducted using established
metrics such as the Fréchet Inception Distance (FID) to assess image quality
and Age Classification Accuracy for age correctness. Our results demonstrate
that the proposed GAN model not only produces visually plausible aged faces
but also maintains consistent identity features, outperforming existing models in
terms of realism and age accuracy.
This project not only advances the technological frontier of facial recognition
but also opens new avenues for personalized digital content creation and age
progression analysis in various domains. Future work will focus on improving
the model's robustness to variations in lighting, pose, and expression, and
extending its applicability to full-body aging simulations.
iii
TABLE OF CONTENTS
ABSTRACT iii
TABLE OF CONTENTS iv
LIST OF FIGURES v
ABBREVIATIONS vi
1 INTRODUCTION 7
2 LITERATURE SURVEY 8
vi
CHAPTER 1
INTRODUCTION
The clock never stops, never waits. When we grow older, it is fun to take out
the album (or various Photo APPs these days) and show your friends, families
and co-workers what you look like when you are younger. Wouldn’t it be
awesome if you can do the opposite? Age progression, the process of
aesthetically rendering a facial image with simulated effect of growing old, has
attracted much attention from the Deep Learning and Computer Vision
community, due to its wide range of applications in the entertainment
industry and in forensic science, e.g., generating contemporary portraits of
individuals who went missing when they were young. However, it remains a
challenging task because the patterns of
The aging we want to capture could be easily affected by the various
conditions of the input image, such as facial expressions or photographic
settings, not to mention the unexpected effect on people’s appearance
caused by the physical environment that they grow up in.
Further, the scarcity of paired data – two images of the same person taken at
different times (20+ years apart) -- prevented existing solutions to achieve
good performance. In this project, we proposed a simple, yet intuitive deep
learning model based on CycleGAN [1] that can generate aging effects on
people portrayed in images, without the need of paired data.
CHAPTER 2
LITERATURE SURVEY
the Aging Generator (AG) and the Identity Preserving Discriminator (IPD).
● Encoder: The encoder part of the AG takes the input image and
compresses it into a dense latent space, extracting and encoding the
critical features necessary for the aging process.
● Decoder: The decoder takes the aged features from the transformation
layer and reconstructs them back into a high-resolution image that
represents the input face at the target age.
The IPD ensures that the aged images generated by the AG maintain the identity
of the original subjects. It acts as a critic in the GAN setup, evaluating both the
realism of the aged images and their fidelity to the original identity. The IPD is
also a CNN and is trained to distinguish between real images and generated
images, and to verify if the generated image still corresponds to the same
identity as the input image.
Key functionalities of the IPD include:
Implementation Details
METHODOLOGY
For the IMDB-WIKI dataset, we only used the 62k images from Wikipedia,
which already provides ample enough images. Using the metadata of the
dataset, we could calculate the age of the person in the image when the photo
was taken. The dataset provided a face cropped version of all images and a
detector score entry called face_score in the metadata. We enforced a minimum
face_score for both groups to ensure the quality of the image (excluding images
with no faces or blurry faces). Finally, we removed nearly all the grayscale
images in the dataset, as CycleGAN was designed to take in RGB images. We
end up with 5,003 images in the young people group and 2,779 images in the
elder people group. Finally, we peeled off 5% of these images as a test set,
while keeping the rest as a training set.
For the CACD dataset, we couldn’t extract as much useful information from its
metadata, so we did not perform much processing. The dataset does not seem to
contain any grayscale images, and most of the images were face cropped as
well. We randomly selected 2,200 images for each group and peeled off 5% of
those as a test set, leaving the rest as a training set.
Cycle Consistency Losses to prevent and from contradicting each other, in our
case, to make sure generated “agedness” images still represent the original
people.
(, )= ~ ( )[‖ ( ( )) − ‖ ] + ~ ( )[‖ ( ( )) − ‖ ],
1 1
To improve the performance of the model and to reduce training time, we did
experiments with different dataset, different data composition and employed
deep learning techniques such as transfer learning, fine tuning and
hyperparameter tuning (see Table 1).
# Source Mix Epochs Preloaded? Freeze G Size Max Avg 10+ 15+ 20+
until
0 CACD All 200 N/A N/A 9 blocks 25.86.7 22% 5.5% 1.7%
1 WIKI All 200 N/A N/A 9 blocks 31.28.8 37% 14% 5.8%
2 WIKI Female 200 N/A N/A 9 blocks 19.54.6 7.1% 2.5% 0.0%
3 WIKI Male 200 N/A N/A 9 blocks 27.3
10.3 50% 19% 5.1%
4 WIKI Male 200 N/A N/A 6 blocks N/A N/ N/A N/A N/A
A
5 WIKI All 200 horse2zebra 8th block 9 blocks 27.4 11.0 55% 20% 6.3%
6 WIKI All 200 summer2wint 8th block 9 blocks 25.0 8.9 36% 10% 1.7%
e
r
7 WIKI All 200 monet2photo 8th block 9 blocks 20.1 6.6 15% 2.5% 0.4%
8 WIKI Male 100 horse2zebra N/A 9 blocks 25.8 9.9 46% 12% 1.3%
9 WIKI Male 100 Model #2 N/A 9 blocks 32.8 10.3 51% 18% 6.0%
Table 1. List of all models we have explored in our study and their quantitative
result. The columns are (from left to right): model number, data source, data
composition, number of epochs trained, pre-trained network to initialize with,
freeze until what layer of generator net, the size of the generator net, maximum
age progression (years), average age progression, last 3 columns are the % of
test cases where age progression is over 10 years, 15 years and 20 years.
Results in model #4 is N/A due to model collapsing.
We trained two basic models with the CACD dataset (model #0) and the
IMDB-WIKI (model #1) dataset. We discovered that model #1 outperforms
model #0 significantly (Figure 3). We manually examined some of the input
images of the two datasets, and we think model #0’s poor performance is due to
the images in the CACD dataset are mostly taken under professional settings
(e.g. with makeup, lighting), while those in the IMDB-WIKI dataset are mostly
taken under less professional settings. With this finding, we decided to focus on
the IMDB-WIKI dataset for the rest of our study
5.2Determining the Optimum Data Composition
When examining the test results of model #1, we discovered for some male in
the age progressed images have grown feminine traits, such as redder lips
(Figure 4), and similarly some females have grown masculine traits, such as
mustaches or beards. This prompts us into thinking whether we should separate
the male dataset with the female dataset. We separated the training set by
gender and then we trained two models respectively (model #2 and #3).
Indeed, the gender biased traits disappear after the separation, and at the same
time we see performance improvements, as the aging effect of model #3 is
better that of model #1 (Figure 5).
5.3Attempts to Speed-up Training Process
One major issue with the CycleGAN model is the training is incredibly slow.
We are essentially training 4 separate networks with 28M parameters combined
(11.4M for each generator and 2.8M for each discriminator), not to mention that
the loss function has an extra cycle consistency part. With 4,630 images in the
training set, it takes 15 minutes to finish one epoch of training on a Tesla V100
GPU. To speed-up training, we applied a few deep learning techniques and
evaluated their performance.
and the summer2winter models capture the texture and shape differences
between the image groups, while the monet2photo model failed to do so. Since
predicting the skin texture and wrinkles are crucial parts of the aging process, it
is expected that model #5 has the best performance
We also tried Fine Tuning to reduce the number of epochs. The original
CycleGAN implementation [12] requires 200 epochs of learning. We explored
reducing this number to 100 while at the same time preserving the same level of
aging effect through initialize the AgingGAN model with some pre-trained
model. Model #8 was initialized with the horse2zebra model and model #9 was
initialized with model #3. Both models yield decent results (Figure 6). We see
that the parameters pre-trained with images from another domain (horse2zebra)
do have positive effect in our AgingGAN model, and we can see clear aging
effects with only 100 epochs of further training. Model #9 also generated good
results, especially the aging effect of female, meaning the parameters of model #3
(pre-trained with male images) are helping the age progression of females in
model #9.
5.4Analysis of Loss Curve
Looking at the loss curve of model #1 (Figure 7), the generator loss and
discriminator loss flatten quickly, which is expected as GAN attempts to strike a
balance between their losses. However, we can see that the Cycle Consistency
loss kept decreasing as the epochs increase. In the first 100 iterations, we can
see a steady drop in the cycle consistency loss. In the latter 100 iterations, such
decrease slows down significantly, but the trend-line is still downwards.
Combined with the fact that aging quality improves as epochs increase, we
suspect that the aging effect can be predicted with the cycle consistency cost,
and the model will cease improving itself when such cost flattens.
5.5Results
https://github.com/dotslashsimran/faceAging-Cyclegan
10. Reference
[5] W. Wang, Z. Cui, Y. Yan, J. Feng, S. Yan, X. Shu, and N. Sebe. Recurrent face aging.
In CVPR, pages 2378– 2386, Jun. 2016.
[8] S. Liu, Y. Sun, D. Zhu, R. Bao, W. Wang, X. Shu, and S. Yan. Face aging with
contextual generative adversarial nets. In ACM MM, 2017.
[9] H. Yang, D. Huang, Y. Wang, and A. K. Jain. Learning face age progression: A
pyramid architecture of GANs. arXiv preprint arXiv:1711.10352, 2017.
[10] R. Rothe, R. Timofte, L. V. Gool. IMDB-WIKI – 500k+ face images with age
and gender labels. https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki.
[11] B.-C. Chen, C.-S. Chen, W. Hsu. Cross Age Reference Coding for
Age-Invariant Face Recognition and Retrieval. http://bcsiriuschen.github.io/CARC.
[13] Keras implementation of a CNN network for age and gender estimation.
https://github.com/yu4u/age-gender-estimation.
[14] R. Rothe, R. Timofte, and L. V. Gool. Dex: Deep expectation of apparent age from a
single image. In IEEE International Conference on Computer Vision Workshops (ICCVW),
December 2015.