Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

GAN CNN Ensemble

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Available online at www.sciencedirect.

com

ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2023) 000–000

ScienceDirect www.elsevier.com/locate/procedia

Procedia Computer Science 235 (2024) 948–960

InternationalConference
International Conference on
onMachine
MachineLearning
Learningand Data
and Engineering
Data (ICMLDE
Engineering 2023)
(ICMLDE 2023)

GAN-CNN Ensemble: A Robust Deepfake Detection Model of Social


Media Images Using Minimized Catastrophic Forgetting and Generative
Replay Technique
Preeti Sharma a*, Manoj Kumar b, c*, Hitesh Kumar Sharma a, b, c
a
Research Scholar, School of Computer Science, University of Petroleum and Energy Studies (UPES), Dehradun, 248007 India.
b
Associate Professor, Faculty of Engineering and Information Sciences, University of Wollongong in Dubai, UAE.
c
MEU Research Unit, Faculty of Information Technology, Middle East University, Amman 11831, Jordan.
a, b, c
Associate Professor, School of Computer Science, University of Petroleum and Energy Studies (UPES), Dehradun, 248007 India.

Abstract

Deep-fake photographs are difficult to discern from real ones, especially when utilized in social media platforms. Anyone can
willfully create disinformation about public personalities, politicians, and celebrities using these deep fake photographs. So, it is
an important need of society to work for an effective model for its detection. The models for deep fake detection commonly use
CNN-based detectors. These detectors experience a drop in performance when used for transfer learning or continual learning
techniques. A significant limitation in this process is CNN's catastrophe forgetting defect. For the solution of this problem, a
Generative replay technique in the form of a GAN-CNN model is implemented that works to minimize this catastrophe forgetting
issue that further helps for better detection. It involves generating and storing samples from previous tasks and then replaying them
during the training of new tasks which makes CNN more robust to identify deep fakes. The GAN model used in this work is
traditional DCGAN improved with necessary adjustments to achieve training stability. It is observed that the model attained a good
accuracy of 98.67%(training) ,70.08% (testing) and minimum loss with a value of 0. 0337 for 100 epochs. Also, it acquired good
precision values of 68% and 72%, Recall values are 74% and 66%, and F1 scores of 71% and 69% for classes 0 and 1 respectively.
The model outcome is found stable and reliable in deep fake detection under dynamic training conditions. Optimum values of
evaluation parameters ensure the model’s capacity to learn new tasks preserving the existing task-learning knowledge.
© 2024 The Authors. Published by ELSEVIER B.V.
© 2024 The Authors. Published by ELSEVIER B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
This is an open
Peer-review access
under article under
responsibility of thethe CC BY-NC-ND
scientific license
committee (https.//creativecommons.org/licenses/by-nc-nd/4.0)
of the International Conference on Machine Learning and Data
Peer-review
Engineering under responsibility of the scientific committee of the International Conference on Machine Learning and
Data Engineering
Keywords. Deep Learning; CNN;GAN;Catastrophe forgetting;CNN continual learning;Lifelong learning; GAN-CNN deep fake detector.

1. Introduction

Generative Adversarial Networks (GANs) are used to create realistic visual output from a probability distribution.
* Corresponding author. E-mail address. a preetiii.kashyup@gmail.com, b wssmanojkumar@gmail.com

1877-0509 © 2024 The Authors. Published by ELSEVIER B.V.


This is an open access article under the CC BY-NC-ND license (https.//creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the International Conference on Machine Learning and Data
Engineering
1877-0509 © 2024 The Authors. Published by ELSEVIER B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the International Conference on Machine Learning and
Data Engineering
10.1016/j.procs.2024.04.090
2 Preeti Sharma et al./Procedia Computer Science 00 (2019) 000–000

Preeti Sharma et al. / Procedia Computer Science 235 (2024) 948–960 949

The machine learning research community's involvement in GANs is expanding [1]. Generative adversarial networks
(GANs) are neural network-based models that use supervised learning and synthetic data to achieve unsupervised
learning without predetermined results [2]. They address the shortcomings of other generative models by introducing
adversarial learning. GANs aim to imply data structures instead of maximizing likelihood [3]. In GAN, the
discriminator and generator compete to create realistic samples. GAN is extensively used in deepfake techniques,
which can have negative effects if used unethically. Deep fake images, difficult to distinguish from real ones, can
create false information about public figures like politicians, and celebrities, including bribes, prejudice, and anti-
national radical gatherings [4]. Digital forensic experts are developing deepfake image detection methods to confirm
the reliability of digital images, using CNN models and dynamic learning environments to identify and separate
manipulated images [5].

Catastrophic forgetting is a common issue in CNN models where neural networks trained with traditional
backpropagation fail to retain previously learned content during distributed processing changes, hindering their ability
to adapt to Continual Learning (CL) or Lifelong Learning situations. The problem of catastrophe forgetting affects
neural networks, particularly CNN models used as deepfake detectors. This poses an obstacle to creating generalized
artificial learning systems. Continual learning is a characteristic of natural intelligence and a fruitful area of research
in machine learning [6]. It has resulted in the development of generative replay, which minimizes catastrophic
forgetting by producing pseudo-samples for previous tasks and learning them alongside new tasks. Generative replay,
also known as pseudo-rehearsal, is a successful and generic strategy for continual learning shown in Fig. 1.

Fig. 1. Model of Brain-Inspired Generative Replay Model of Continual Learning of Neural Network [7].
((a). Exact or experience replay, which views the hippocampus as a memory buffer in which experiences can simply
be stored, akin to traditional views of episodic memory. b. Generative replay with a separate generative model, which
views the hippocampus as a generative neural network and replays as a generative process).

This research proposes a GAN-CNN ensemble model to minimize catastrophic forgetting of CNN models and improve
Deepfake detection. The model uses adversarial training of the DCGAN model for continual learning, creating and
storing samples from previous tasks and replaying them during training of new tasks of the CNN model a generative
reply technique. This approach makes CNN adaptive in detecting deep fakes, allowing it to work in a dynamic
environment with variable datasets. The model uses a DCGAN model tuned for training stability, and optimal
evaluation parameter values enhance its ability to learn new tasks while retaining old task-learning expertise.
Contributions-
 An evolutionary GAN-CNN ensemble model significantly enhances deepfake detection accuracy, achieving
a 98.67% rate, thereby improving content verification reliability, and identifying manipulated media.
 A robust model that effectively combats catastrophic forgetting in CNN models by utilizing a GAN-based
approach, ensuring resilience in adapting to new data and retaining prior knowledge.
 The GAN-CNN ensemble effectively manages time, balancing computational efficiency and learning
effectiveness, making it a practical solution for real-time or large-scale deepfake detection scenarios.

The rest of the study in this paper is organized in a manner with Section 2 discussing the related studies. Section 3
950 Preeti Sharma et al. / Procedia Computer Science 235 (2024) 948–960
Preeti Sharma et. al. / Procedia Computer Science 00 (2019) 000–000 3

highlights the methodology used in deriving the model, including the experimentation details. Results and Discussion
are analysed in section 4 using different curves including Accuracy, Precision, Recall, and F1-score plots. Section 5
finally concludes the paper with future findings.

2. Related Studies

In this section, the related works on the significance of continual learning-based GAN approaches are reviewed.
Furthermore, literature on GAN deep fake detection is also scrutinized to align this work correctly with the scope of
integrating continual learning in GAN deep fake models.

Researchers compared generative models for sequential image creation challenges using rehearsal, regularization,
generative replay, and fine-tuning mechanisms. They calculated generation quality and memory capacity using two
metrics. They evaluated sequential tasks using MNIST, Fashion MNIST, and CIFAR10 [8]. Generative replay
outperformed all other continuous learning algorithms, and the original GAN performed best among all models. The
Lifelong GAN [9]is a novel model that uses knowledge distillation to transfer learned information from older networks
to new ones, enabling it to complete image-conditioned generating tasks in a lifelong learning context. The model was
validated for both image-conditioned and label-conditioned generation tasks, resulting in both qualitative and
quantitative results. A continuous learning technique for generative adversarial networks (GANs) was developed using
parameter-efficient feature map modifications and task-specific parameters [10]. The model's performance was
enhanced by incorporating residual bias and using Fisher information matrix-based task similarity for efficient
continual data generation. [11] Researcher rectification methods for GAN training difficulties and evaluated their
theoretical contributions with an emphasis on image/visual applications. Deepfake detection is a fine-grained
classification challenge, and a new multi-attentional deepfake discovery network was proposed [12]. The model
format includes multiple spatial attention heads, a textural feature boosting block, and an accumulation of low-level
textural and high-level semantic features directed by attention maps. This approach aims to focus on local regions and
detect subtle artifacts. A deepfake efficacy model was created using two GAN networks and a Cycle-GAN network
[13]. The loss was split into two parts and a flow loss was added. The model was used for object transformation and
style transfer, resulting in pleasing photos. [14] used Continual Representation utilizing the Distillation (CoReD)
technique to improve data efficiency in neural networks. CoReD outperforms state-of-the-art methods in detecting
low-quality deepfake videos and GAN-generated images from multiple datasets. [15] Proposes a continuous deep fake
detection benchmark (CDDB) for perpetual deepfake detection, utilizing various models, suitable measurements, and
multiclass incremental learning techniques, which are commonly used in continual visual recognition, to address the
problem.

3. Methodology

This section provides the conceptual framework for the continuous learning-based GAN model on deepfake detection,
outlining each element used to build the system designed to achieve the study's objective. Corrections for two key
problems were prioritized. (a) catastrophic forgetting of pixel-based neural net models; and (b) a straightforward
continuous learning strategy for a deep fake detection model based on GAN that yields accurate results. The GAN-
CNN model is a combination of the generating power of the GAN model and the detection power of the CNN model.
It generates synthetic samples of previously viewed tasks, which are then combined with the data from the current
task to form a combined dataset for CNN model training. The GAN-CNN ensemble model is used as a Generative
Replay technique in continuous learning, mitigating catastrophic forgetting by storing and replaying samples from
past tasks during task training. This process preserves prior task knowledge, prevents forgetting, and refines the
model's capabilities for deepfake detection in various scenarios. The model is exposed to the data from the prior task
and learns to retain information while adapting to the new task. The GAN model regenerates the original input and
anticipates the network's ability to restore any face. The architecture of the proposed GAN-CNN ensemble model is
described in Fig. 2 below.
Preeti Sharma et al. / Procedia Computer Science 235 (2024) 948–960 951
4 Preeti Sharma et al./Procedia Computer Science 00 (2019) 000–000

Fig. 2. The architecture of the proposed GAN-CNN ensemble model.

The basic steps involved in the GAN-CNN ensemble model execution are defined below.

Step 1. Data Collection. The research methodology begins with data collection, often involving assembling a dataset
of social media images, including both real and potentially manipulated images. In our case, we used the “Real and
Fake” dataset in a ratio of 70.30 for training and testing purposes.
Step 2. Preprocessing of Data using PCA. The data for GAN training undergoes preprocessing using Principal
Component Analysis (PCA) to enhance feature extraction by reducing noise and dimensionality and preserving
essential information. It converts original features into orthogonal principal components (PC). In our case PC=400 is
used as per facial data used. PCs are linear combinations of original features, ordered by variance, with the first
component capturing the most data variance.
Step 3. GAN training. In the initial phase, the GAN model (DCGAN) is trained using a dataset of real images. The
model's GAN component learns to generate synthetic data that closely mimics the original data.
Step 4. Creating Synthetic Samples. Once trained, the GAN is used to produce synthetic samples that mirror real-
world images and is provided with dimensionality-reduced features derived from PCA. To avoid catastrophic
forgetfulness, these synthetic samples are saved for future use.
Step 5. Generative Replay using DCGAN (Synthetic sample reintroduction). Before addressing a new assignment,
the model applies a Generative Replay approach. Prior tasks are reintroduced with synthetic samples created by the
GAN that feature dimensionality-reduced data obtained through PCA preprocessing. These synthetic samples are
integrated with the data from the present job, resulting in a consolidated dataset containing both historical and new
data.
Step 6. Training of CNN on New Task. When the model is given a new task, the synthetic samples from the previous
job are combined with the data from the current task to produce a mixed dataset. The CNN model is then trained on
this merged dataset ensuring its adaptability and robustness in detecting deepfakes in dynamic environments.
Step 7. Continual Learning. As the model is exposed to more tasks, the generative replay process for each prior task
is repeated, replaying its synthetic samples throughout the training of new tasks. This continuous exposure to previous
tasks aids in the retention of previous task knowledge and prevents catastrophic forgetting.

3.1 Training Parameters

Parameters used in the experiment are given below mentioned with the values applied during building the model in
Table 1.
952 Preeti Sharma et al. / Procedia Computer Science 235 (2024) 948–960
Preeti Sharma et. al. / Procedia Computer Science 00 (2019) 000–000 5

Table 1. Parameters used in the training of the proposed Model.

Parameters Value
Batch Size 128 pictures
Learning Rate 0.05
Momentum (Beta) 0.5
Optimizer Adam's Optimizer
Epochs 100
Dropout 0.2
Batch normalization [0,1]
Weight Initialisation Random Values
Activation function ReLU
Evaluation Metrics Accuracy, Precision, Recall, F1-Score, Generator Loss,
Discriminator Loss, and Computational Time

The batch size of 128 pictures is chosen to balance training speed with memory constraints, efficiently processing a
moderate number of images in each pass. A high learning rate of 0.05 is suggested for quicker convergence, especially
for complex tasks and architectures. A moderate momentum of 0.5 balances training stability and momentum term
influence on gradient descent, enhancing model efficiency. Adam optimizer is chosen for its effectiveness for training
deep models and handling varying learning rates. The model uses 100 epochs to manage computational resources and
avoid overfitting. A dropout rate of 0.2 indicates modest regularization to prevent overfitting. Batch normalization
allows for experimentation with and without normalization, providing flexibility during training. Weight initialization
with random values ensures a reasonable starting point. ReLU is a popular activation function for efficiency and
mitigating the vanishing gradient problem. Evaluation metrics are standard and comprehensive, covering accuracy,
precision, recall, F1-Score, and loss values, providing a comprehensive assessment of model performance.

3.2 GAN Model (DCGAN)

In this research, we have employed DCGAN as the basis model to achieve generative deepfake detection. It is made
up of two neural networks. the generator and the discriminator. The generator's goal is to create synthetic data that
looks like actual data, while the discriminator's job is to discern between real data and the generator's phony data.
These two networks are trained in a competitive manner, with the generator attempting to make realistic samples to
fool the discriminator, and the discriminator attempting to improve its ability to distinguish between real and false
samples. In its architecture, it consists of convolution layers with no max pooling or completely linked layers.
Convolutional stride and transposed convolution are used for down sampling and up sampling. DCGAN employs
convolutional and convolutional-transpose layers in its generator and discriminator, respectively. Two loss functions—
one for the generator and one for the discriminator—are considered in this approach, which uses two neural networks.
The distance between the two estimated values being contrasted with one produced by the generator model and the
one based on the real data—is what gives rise to the two loss functions. The cross-entropy-based logic of min-max
loss is used to work with the loss function. The generator attempts to minimize it (creating artificial samples that are
extremely like the real ones), as the title indicates, while the discriminator seeks to maximize it (by distinguishing
between fake and real instances).
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝐸𝐸𝐸𝐸, 𝑦𝑦[𝑙𝑙𝑜𝑜𝑜𝑜𝑜𝑜(𝑥𝑥, 𝐺𝐺(𝑥𝑥))] + 𝐸𝐸𝐸𝐸, 𝑦𝑦[𝑙𝑙𝑙𝑙𝑙𝑙(1 − 𝐷𝐷(𝐺𝐺(𝑥𝑥, 𝑦𝑦)))] (1)

Fig. 3. DC GAN generator's network architecture by Radford et al. [16].


Preeti Sharma et al. / Procedia Computer Science 235 (2024) 948–960 953
6 Preeti Sharma et al./Procedia Computer Science 00 (2019) 000–000

The architecture is modified with necessary adjustments to achieve training stability [17]. The two main functions
of GAN with generator and discriminator functions are defined below.

 D uses stride convolutions instead of pooling layers, increasing model expressiveness and trainable
parameters, but also increasing the number of trainable parameters.
 G uses transpose convolutions to reduce feature maps and spatial dimensions by projecting sampled noise
onto a 2D grid and reshaping it, resulting in a smaller dataset.
 Batch normalization stabilizes preparation and enhances the learning rate. Randomly cropped snapshots are
normalized, with center crop and border crops used for the final forecast during the test phase.
 All layers in G except for the output, which generates values in the picture range [-1,1], use ReLU activation.
 The discriminator employs leaky ReLU activation across all layers. The model can learn more complicated
representations thanks to the nonlinearity of both activation functions.

3.2 Dataset

For this study, we also used the Kaggle Real and Fake Face Detection dataset [18], which contains 1,081 real face
images and 960 fake face images generated by a Generative Adversarial Network (GAN). The fake face images are
generated by training the GAN on the real face images and then using the trained GAN to generate new fake face
images as shown in Fig. 4. The experimental data set is divided into a training dataset and a verification dataset. Each
experiment was randomly assigned to generate them, corresponding to 70% and 30% respectively. The model is
trained on the training dataset and verified on the verification dataset.

Fig. 4. Description of Real and Fake Images Dataset with (a) showing real faces; and (b) showing fake faces.

3.3 Deepfake Detector with Continual Learning Approach (VGG-16)

VGG-16 served as our deepfake detector. Convolution neural net (CNN) architecture as our deepfake detector is
employed in the system, which took first place in the 2014 ILSVR (ImageNet) project. It is viewed as one of the top
vision model architectures available today. The most remarkable thing about VGG16 is that they are made to give 3x3
filter fully connected layers with a stride 1 but still use the same stride 2 padding and maxpool layer of 2x2 filter [19]
rather than creating several hyper-parameters. Convolution and max pool layers are arranged in this manner
throughout the entire architecture. After two FC (completely connected) layers, a softmax is used as the output. The
16 in VGG16 stands for having 16 layers and a weight of 16. We used the generative replay technique in our Continual
Learning scheme to work our CNN detector model on a series of tasks without being affected by the problem of
catastrophe forgetting. The DC GAN Rehearsal method involves storing a portion of the training data from the past
and fusing it with fresh patterns for upcoming training. The final technique is called "generative replay," which
employs a dual-model approach in which a "current" generative model Gt is used to learn both the current distribution
and the Gt1 distribution while a "frozen" generative model Gt1 is used to sample from previously learned distributions.
A duplicate of Gt is substituted for the Gt1 after a job is finished, allowing learning to continue. We also test three
954 Preeti Sharma et al. / Procedia Computer Science 235 (2024) 948–960
Preeti Sharma et. al. / Procedia Computer Science 00 (2019) 000–000 7

baselines. A unique generator is trained for each job for Upper bound Data, where the main deepfake architecture is
fine-tuned using data from all previous tasks combined.

3.4 Algorithm

The algorithm for the proposed GAN-CNN Ensemble is shown in Table 2. It consists of three major sections. The
section I shows algorithm about GAN model used to generate new samples using generative replay technique. Section
II defines the CNN model used for the detection of deepfakes. And finally, GAN-CNN ensemble model used for
implementation of generative replay technique for removing catastrophic forgetting issue of CNN to make a more
robust model for deepfake detection.
Table 2. Proposed algorithm for the implementation of GAN-CNN Ensemble model.

(I) GAN (Generative Adversarial Network).


Generator (G) and Discriminator (D) components of the GAN.
G(zᵢ). Synthetic data generated by the generator G for the noise vector zᵢ.
D(xⱼ). The output of the discriminator D for the input data xⱼ, representing the probability that xⱼ is real (as opposed to fake).
The GAN's optimization objective involves the following loss functions.
Generator Loss (L_G).
L_G = -log(D(G(zᵢ)))
Discriminator Loss (L_D).
L_D = -log(D(xⱼ)) - log (1 - D(G(zᵢ)))

(II) CNN (Convolutional Neural Network).


The CNN is used for deep fake detection.
P(xⱼ). The predicted probability by the CNN that the input xⱼ is genuine (real).
The CNN's loss function for deep fake detection is.
L_CNN = -yⱼ * log(P(xⱼ)) - (1 - yⱼ) * log (1 - P(xⱼ))
where yⱼ = 1 for genuine images and yⱼ = 0 for deep fake images.

(III) GAN-CNN Ensemble Model.


The GAN-CNN Ensemble model combines the GAN's synthetic data and the CNN's predictions.
Training Data for CNN.
De_train = {(x₁, y₁), (x₂, y₂), ..., (xᴺ, yᴺ)} (N samples)
Generated Synthetic Data from GAN.
D_synthetic = {G(z₁), G(z₂), ..., G(zₘ)} (M samples)
Augmented Training Data.
D_augmented = D_train ∪ D_synthetic (N + M samples)
The CNN is then trained on the augmented dataset D_augmented to learn to distinguish between real and synthetic data while minimizing
catastrophic forgetting.
Ensemble Prediction.
The ensemble prediction combines the predictions from both the GAN and the CNN.
Ensemble Prediction (Eⱼ).
Eⱼ = Ensemble_Method (P_CNN (xⱼ), D(G(zⱼ)))
where P_CNN (xⱼ) is the output probability from the CNN for input xⱼ, and D(G(zⱼ)) represents the probability output from the GAN's
discriminator for the synthetic data G(zⱼ). The ensemble method combines these probabilities to obtain the final decision Eⱼ on deepfake
detection for each input xⱼ.

4. Results and Discussion

The training procedure reveals that the performance of the deepfake detection model improves dramatically as the
number of epochs increases. The model improves in correctly detecting genuine and deep fake images, and its
predictions match the true labels well, resulting in high accuracy and low loss. Table 3 below displays the
performance of a deepfake detection model as it is trained across many epochs. Each row represents the outcomes
Preeti Sharma et al. / Procedia Computer Science 235 (2024) 948–960 955
8 Preeti Sharma et al./Procedia Computer Science 00 (2019) 000–000

gained throughout the training process at a certain epoch. On a validation or test dataset, the model's performance is
measured in terms of accuracy and loss. The accuracy is 63.13% and the loss is 0.6482 at the start of training (Epoch
1). This demonstrates that the model's performance is low, and the forecasts are not particularly accurate. Both
accuracy and loss begin to increase as training advances. At Epoch 10, the accuracy rises to 88.5%, demonstrating
that the model is becoming more adept at distinguishing between genuine and deep fake images. The loss also drops
dramatically to 0.286, indicating that the model's predictions are improving.

Table 3. Table showing accuracy and loss values obtained at different epochs.
EPOCHS ACCURACY (%) LOSS
1 63.13 0.6482
10 88.5 0.286
20 94.4 0.157
30 96.44 0.1102
40 96.54 0.1012
50 98.35 0.0572
60 98.83 0.04
70 97.42 0.0777
80 98.08 0.056
90 98.75 0.0381
100 98.67 0.0337

The model's performance improves significantly with more training, with an increase in accuracy and decrease in loss
values over time. By Epoch 50, the model's accuracy reached 98.35%, and its loss decreased to 0.0572. It performs
well in identifying genuine and fake images, with predictions close to actual labels. The model's performance
fluctuates slightly in subsequent epochs, but this is normal when parameters are fine-tuned to fit the training data. At
Epoch 100, the model's ultimate performance was 98.67% with a loss of 0.0337, indicating its robustness and ability
to generalize well to new data shown in Fig. 5 and 6.

Fig. 5. Accuracy Curve of The Proposed GAN_CNN Ensemble

Fig. 6. Loss Curve of the Proposed GAN_CNN Ensemble.

Table 4 presents the evaluation parameters for a binary classification model for deep fake detection, focusing on two
classes (0 and 1). Class 0 and class 1. The model achieved a precision of 0.68 for class 0 and a recall of 0.74 for class
956 Preeti Sharma et al. / Procedia Computer Science 235 (2024) 948–960
Preeti Sharma et. al. / Procedia Computer Science 00 (2019) 000–000 9

0 items, with a balanced performance. The dataset contains 598 class 1 samples, with a precision of 0.72 and a recall
of 0.66 for class 1 items. The model's F1-score is 0.69, indicating a balanced performance.
Table 4. Table showing values of different evaluation parameters
(Precision, Recall, F1-score).
Evaluation precision recall f1- support
Parameters score
0(Fake) 0.68 0.74 0.71 598

1(Real) 0.72 0.66 0.69 602

Fig. 7 shows the graphical representation of accuracy and recall values for classes 0 and 1, with lower false positive
rates indicating higher accuracy and lower false negative rates indicating higher recall. The F1 score measures
precision and recall, particularly useful in class imbalances, and the support value indicates sample count.

Fig. 7. Evaluation Parameters Curve of the Proposed GAN_CNN Ensemble

Table 5 shows a confusion matrix assessing a binary classification model's performance by counting true positive
(TP), true negative (TN), false positive (FP), and false negative (FN) predictions. The high TP value indicates a
significant detection of deep fake samples, while the high TN score indicates a significant detection of authentic
samples. The FP value indicates the number of false positives, indicating the model misclassified 155 actual data as
fakes, and the FN value indicates the number of false negatives, misclassifying 204 samples as genuine.
Table 5. Table showing values of True Positive, False Positive, True Negative and False negative of Confusion
matrix for proposed model.

Confusion Matrix Actual Positive Actual Negative

Predicted Positive 444 155

Predicted Negative 204 398


The graphical representation in Fig. 8 provides a more thorough evaluation of the model's performance in binary
classification tasks for deep fake detection. Furthermore, the confusion matrix shows that the model properly
recognizes a significant proportion of real and deep fake samples (True Positives and True Negatives). However, there
are certain cases of misclassification (False Positives and False Negatives), indicating that the model needs to be
improved further.
Preeti Sharma et al. / Procedia Computer Science 235 (2024) 948–960 957
10 Preeti Sharma et al./Procedia Computer Science 00 (2019) 000–000

(a) (b)
Fig. 8. Confusion Matrix with Actual Positives (a); and Actual Negatives(b) of the Proposed model

The GAN model's training progress is shown in table 6, demonstrating its effectiveness in reducing catastrophic
forgetting issues encountered by CNN models in deepfake detection. The model's performance is reflected in data,
including epochs, time required, generator loss, and discriminator loss. The generator loss decreases from 2.5187 at
epoch 100 to 0.0777452, indicating an improvement in the generator's ability to generate data like real datasets.
Table 6. Performance of Generator and Discriminator modules used in GAN-CNN Ensemble

Epochs Time(sec) Generator Loss Discriminator Loss

0 0.086946 2.5187 0.09220361


10 0.082465 3.2298 0.839503
20 0.0761921 3.9640 0.02143725
30 0.0961961 4.5883 0.014482
40 0.1040106 4.9379 0.0160632
50 0.0943568 5.2372 0.007435
60 0.0836747 5.5832 0.005152
70 0.1040618 5.9138 0.004010
80 0.0726762 5.9825 0.0027085
90 0.0832026 6.10523 0.00272559
100 0.0777452 6.0917 0.00233806

The discriminator loss decreases from 0.09220361 to 0.00233806, indicating better discrimination of genuine data
from produced data. This decrease in loss values for both the generator and discriminator indicates successful training.
As the generator loss decreases and the model improves at creating data, the discriminator loss decreases as well,
making it harder for the discriminator to distinguish between actual and produced data. The model's success is
measured by the balance between generator and discriminator losses and how well it generalizes to fresh data after
training shown in Fig. 9 and 10 below.

Fig. 9. Generator Loss Plot of the Proposed GAN-CNN Ensemble model


958 Preeti Sharma et al. / Procedia Computer Science 235 (2024) 948–960
Preeti Sharma et. al. / Procedia Computer Science 00 (2019) 000–000 11

The GAN model's generator and discriminator loss begin at 2.5187 and 0.09220361, respectively, at the start of
training. As training progresses, the generator loss decreases, indicating the model's ability to generate synthetic data
that closely resembles real data. By Epoch 100, the generator loss drops to 0.0777452, while the discriminator loss
decreases to 0.00233806, indicating the discriminator's difficulty in distinguishing between actual and created data.
These dynamic highlights the GAN's effective adversarial learning process, where the generator and discriminator
continuously improve their performance.

Fig. 10. Discriminator Loss Plot of the Proposed GAN-CNN Ensemble model

Fig. 11 shows the model's time complexity during training. The first 30 epochs have stable time, but a spike around
the 40th indicates a temporary surge in processing. Despite this, the model maintains efficient time management
throughout 100 epochs.

Fig. 11. Generator Loss Plot of the Proposed GAN-CNN Ensemble model

Time management is critical for a model's effectiveness, particularly in deepfake detection tasks. It enables the model
to efficiently analyze and respond to data, especially during volatility. Balancing time efficiency and learning efficacy
is critical for effective implementation in scenarios that need rapid and accurate decision-making. This performance
reflects the model's ability to overcome catastrophic forgetting, optimize deepfake detection, and balance efficient
time management with adversarial learning. GAN-CNN Ensemble model is found to have a significant potential for
deepfake detection, with high accuracy and effective handling of catastrophic forgetting in comparison with existing
models as shown in Table 7 below. However, further analysis and fine-tuning may be required to minimize
misclassifications and optimize the model's performance in real-world circumstances.
In comparison to previous techniques for detecting real and fake faces, our GAN-CNN Ensemble model exceeds the
competition with an astonishing 98.67% accuracy. This outperforms Afchar et al.'s Face Forensics++ Voting
Ensemble (94.05%), Nugen et al.'s Stacking Ensemble in the Deepfake Detection Challenge (DFDC) (97.97%), Yang
et al.'s AdaBoost Ensemble in DFDC (97.30%), Li et al.'s UADFV Voting Ensemble (98.03%) as well as Wang et
al.'s DFDC Ensemble with Voting (96.40%). The GAN-CNN Ensemble model outperforms deepfake detection with
high accuracy, demonstrating its robustness in this challenging context. These results demonstrate the model's
effectiveness in pushing the boundaries of face detection, making it a significant advancement. Our approach's
Preeti Sharma et al. / Procedia Computer Science 235 (2024) 948–960 959
12 Preeti Sharma et al./Procedia Computer Science 00 (2019) 000–000

superior performance in deepfake detection is a promising solution for real-world applications, offering a strong
foundation for future research to explore its potential across diverse datasets and assess its adaptability to changing
data distributions.
Table 7. Comparison of the Proposed Model with Existing Models

Author Dataset Approach Result


Ours Real and Fake Face Detection GAN-CNN 98.67%
Afchar et al. [20] FaceForensics++ Voting Ensemble 94.05%
Nugen et al. [21] DeepFake Detection Challenge Stacking Ensemble 97.97%
Yang et al. [22] DFDC AdaBoost 97.30%
Ensemble
Li et al. [23] UADFV Voting Ensemble 98.03%
Wang et al. [24] DFDC Voting Ensemble 96.40%

In a nutshell, the proposed GAN-CNN Ensemble model has shown significant improvements in deepfake detection
performance, with an impressive accuracy rate of 98.67% after 100 epochs. The model's ability to minimize errors
and improve precision has also improved. Assessment parameters show significant values for precision and recall,
with F1-scores reaching 0.71 for the 'Fake' class and 0.69 for the 'Real' class. The model consistently manages its
computing time during training, ensuring practical applicability. The effectiveness of the model depends on a
compromise between efficient time management and enhancing accuracy and loss values. It outperforms various
alternatives in comparison to existing methods, providing solid evidence for its usefulness in deepfake identification.
Key improvements include increased accuracy, robustness, effective time management, and improved performance,
making it an important contribution to the field.

5. Conclusion

The research successfully addressed the CNN detector's catastrophic forgetting issue in deepfake detection for social
media images by implementing a Generative Replay technique using a GAN-CNN ensemble model. This method
includes generating and saving samples from previous tasks, then replaying them during new task training,
considerably improving CNN’s robustness in detecting deepfakes. It has been observed that the ensemble model
exhibited excellent performance by achieving an outstanding accuracy of 98.67% (training data), and 70.08% (testing
data) with low loss (0.0337). Furthermore, for classes 0 and 1, it displayed strong precision values of 68% and 72%,
recall values of 74% and 66%, and F1 scores of 71% and 69%, respectively. These findings emphasize the model's
ability to manage catastrophic forgetting and its effectiveness under dynamic training conditions. The future scope
involves evaluating the model's performance on diverse datasets and ensuring its adaptability to changing data
distributions and potential adversarial threats in real-world applications.

References

[1]. Hardy, C., Le Merrer, E., & Sericola, B. (2019, May). Md-gan. Multi-discriminator generative adversarial networks for
distributed datasets. In 2019 IEEE international parallel and distributed processing symposium (IPDPS) (pp. 866-877). IEEE.
[2]. Hong, Y., Hwang, U., Yoo, J., & Yoon, S. (2019). How generative adversarial networks and their variants work. An overview.
ACM Computing Surveys (CSUR), 52(1), 1-43.
[3]. Wiggers, K. (2019). Generative adversarial networks. What GANs are and how they've evolved. VentureBeat, December, 26.
[4]. Aggarwal, A., Mittal, M., & Battineni, G. (2021). Generative adversarial network. An overview of theory and applications.
International Journal of Information Management Data Insights, 1(1), 100004.
[5]. Cong, Y., Zhao, M., Li, J., Wang, S., & Carin, L. (2020). Gan memory with no forgetting. Advances in Neural Information
Processing Systems, 33, 16481-16494.
[6]. McCaffary, D. (2021). Towards continual task learning in artificial neural networks. current approaches and insights from
neuroscience. arXiv preprint arXiv.2112.14146. [Accessed. 20-Jan-2023].
960 Preeti Sharma et al. / Procedia Computer Science 235 (2024) 948–960
Preeti Sharma et. al. / Procedia Computer Science 00 (2019) 000–000 13

[7]. Van de Ven, G. M., Siegelmann, H. T., & Tolias, A. S. (2020). Brain-inspired replay for continual learning with artificial neural
networks. Nature communications, 11(1), 4069..
[8]. Lesort, T., Stoian, A., Goudou, J. F., & Filliat, D. (2019). Training discriminative models to evaluate generative ones. In
Artificial Neural Networks and Machine Learning–ICANN 2019. Image Processing. 28th International Conference on
Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, Part III 28 (pp. 604-619). Springer
International Publishing. [Accessed. 19-Jan-2023].
[9]. Zhai, M., Chen, L., Tung, F., He, J., Nawhal, M., & Mori, G. (2019). Lifelong gan. Continual learning for conditional image
generation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 2759-2768).
[10]. Zhai, M., Chen, L., He, J., Nawhal, M., Tung, F., & Mori, G. (2020). Piggyback gan. Efficient lifelong learning for image
conditioned generation. In Computer Vision–ECCV 2020. 16th European Conference, Glasgow, UK, August 23–28, 2020,
Proceedings, Part XXI 16 (pp. 397-413). Springer International Publishing. [Accessed. 19-Jan-2023].
[11]. Varshney, S., Verma, V. K., Srijith, P. K., Carin, L., & Rai, P. (2021). Cam-GAN. Continual adaptation modules for generative
adversarial networks. Advances in Neural Information Processing Systems, 34, 15175-15187. [Accessed. 19-Jan-2023].
[12]. Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., & Yu, N. (2021). Multi-attentional deepfake detection. In Proceedings of
the IEEE/CVF conference on computer vision and pattern recognition (pp. 2185-2194). [Accessed. 20-Jan-2023].
[13]. Manisha, P., & Gujar, S. (2018). Generative Adversarial Networks (GANs). What it can generate and What it cannot?. arXiv
preprint arXiv.1804.00140.
[14]. Kim, M., Tariq, S., & Woo, S. S. (2021, October). Cored. Generalizing fake media detection with continual representation
using distillation. In Proceedings of the 29th ACM International Conference on Multimedia (pp. 337-346) [Accessed. 19-Jan-
2023].
[15]. Li, C., Huang, Z., Paudel, D. P., Wang, Y., Shahbazi, M., Hong, X., & Van Gool, L. (2023). A continual deepfake detection
benchmark. Dataset, methods, and essentials. In Proceedings of the IEEE/CVF Winter Conference on Applications of
Computer Vision (pp. 1339-1349). [Accessed. 20-Jan-2023].
[16]. Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative
adversarial networks. arXiv preprint arXiv.1511.06434..
[17]. N. Adaloglou, “Gans in Computer Vision - introduction to Generative Learning,” AI Summer, 10-Apr-2020. [Online].
Available. https.//theaisummer.com/gan-computer-vision/. [Accessed. 20-Jan-2023].
[18]. S.Nam et al., (2019, January). Real and Fake Face Detection, Version 1. Retrieved [15-Oct-2023] from
https.//www.kaggle.com/datasets/ciplab/real-and-fake-face-detection.
[19]. Shen, T., Liu, R., Bai, J., & Li, Z. (2018). “deep fakes” using generative adversarial networks (gan). Noiselab, University of
California, San Diego.
[20]. Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). MesoNet. A compact facial video forgery detection network. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2122-2131.
https.//openaccess.thecvf.com/content_cvpr_2018_workshops/papers/w46/Afchar_MesoNet_A_Compact_CVPR_2018_pape
r.pdf
[21]. Nguyen, H., Nguyen, T., Nguyen, D., Do, T., & Van Nguyen, T. (2019). On the effectiveness of local binary patterns in detecting
face morphing attacks. In Proceedings of the IEEE International Conference on Multimedia & Expo Workshops, 501-506.
https.//ieeexplore.ieee.org/document/8803839
[22]. Yang, Y., Zhang, X., Sun, L., Tan, T., & Zhou, J. (2020). Exposing deepfake videos by detecting face warping artifacts. IEEE
Transactions on Information Forensics and Security, 15, 1252-1267. https.//ieeexplore.ieee.org/document/8947691
[23]. Li, Y., Li, X., Tian, Y., Long, M., & Li, F. (2021). Deepfake video detection based on temporal consistency and quality
assessment. IEEE Transactions on Multimedia, 23, 1266-1280. https.//ieeexplore.ieee.org/document/9303275
[24]. Wang, Z., Liu, C., Chen, Y., Huang, X., Zhang, J., & Wang, J. (2021). A fusion strategy based on spatial and temporal
consistency for detecting deepfake videos. IEEE Transactions on Image Processing, 30, 1028-1041.
https.//ieeexplore.ieee.org/document/9252377.

You might also like