Generative Model For Image Classification
Generative Model For Image Classification
Abstract Generative model used widely in machine learning finally in trained model you have trained Generator which can
due to its ability to learn the feature from the input and able to generate data samples from random vector given to it.
regenerate the same. However, for different application different The proposed custom model in this manuscript uses
models are used. Hybrid model gives more precise results in some Adversarial Autoencoders (AAEs) as the reference model. In
of the area. One such model is designed for specific application of
anomaly detection. Here adversarial auto encoder model is used AAEs, the compressed vector has a known distribution that
as base model and added adversarial effect from generative depends on one of its inputs, which is a noise sample. The
adversarial network to generate image from features. The results compressed vector in AAEs is generated by the encoder
show that the proposed model can leverage less reconstruction network using both the data sample and the noise s ample, and
loss in image regeneration and its compressed representation as its distribution is learned during the training process and
latent space can classify each class differently. This research influenced by both inputs. In our model, we have used two
study discusses about the proposed model architecture and discriminators. The second discriminator is for images and
various methods used for hyper parameter tuning. discriminates between the original image and the regenerated
image. We have considered the reconstruction error as the
Keywords Auto encoder; Generative model; Anomaly detection.
performance parameter, and our model learns from the input
I. INT RODUCT ION to keep the error as low as possible while ensuring that images
of different classes have distinct distributions.
The paper is organized as follow: In section II we present the
Autoencoders are unsupervised neural networks in deep related work and concept of unsupervised generative models.
learning that were first introduced in [1]. They consist of an In section III we have described the proposed architecture, in
encoder and decoder that work together to learn a compressed section IV we have described experimental results and in
representation of input data, which can be used for vario us section V we conclude the paper.
applications such as data compression [2], feature extraction
[3], and denoising [4]. Autoencoders have the advantage of II. RELATED WORK
being an unsupervised network, which means they can be
There are many variations of autoencoders, including
trained without any labeled data, making them suitable for
denoising autoencoders [6], variational autoencoders [7], and
unsupervised tasks . The key idea is to train the model in such adversarial autoencoders [8], each with its own strengths and
a way that it can regenerate the original data with minimal
weaknesses. In autoencoders , the size of the latent vector is a
losses.
key parameter, and it needs careful tuning for effective
Reconstruction is a fundamental aspect of autoencoder regeneration of the original data back. However, autoencoders
models, where the encoded latent representation is decoded to
are used in many different applications like compression [9],
produce the original input. Achieving lossless regeneration is a
recommendation systems for collaborative filtering [10],
challenging problem, and limited research has been conducted anomaly detection [11], compressive sensing [12], etc. They
in this area [5].
are also used in natural language processing [13], object
Compare to Auto encoder, Generative Adversarial Networks
detection [14], and image analysis [15]. The popularity of
(GANs) is generative model, in which two networks are there variational autoencoders stems from their capacity to
called generator and discriminator. Generator generates the
regenerate data using a latent representation that is
new samples based on trained data while descriminator
probabilistically distributed [16]. Adversarial autoencoders
attempt to identify the original sample from mixed samples of use a combination of generative and discriminative models to
original and fake samples. Basically it inspired by game
learn a compressed representation of input data that can be
theory in which both the model compete with each other and
used in various applications [8].
Generative adversarial networks (GANs) have also become Sometimes, it happens that a neural network-based model can
popular in computer vision due to their ability to generate new reconstruct certain data that were not present during training.
sample data, as compared to autoencoders, which simply This is an example of poor performance and may occur due to
reconstruct them. Due to this, they are successfully used in the similarity in structure or closeness with other class labels.
computer vision tasks like image inpainting, image super- In other words, the model may generalize poorly on unseen
resolution, image compression, anomaly detection, etc. data due to overfitting. Overfitting occurs when the model
[17,18]. GANs have many advantages, but they are hard to memorizes the training data instead of learning the underlying
train. They consider feature-wise error during training rather patterns, resulting in poor generalization performance.
than element-wise error. Hybrid networks called VAE-GANs Similarly, the image reconstruction loss, also known as the
are models that combine the reconstruction loss from VAEs reconstruction error, measures the difference between the
and the adversarial loss from GANs to generate high -quality original input image and the output image generated by the
and diverse samples while preserving the disentanglement of decoder. The goal of the autoencoder is to minimize this loss,
the latent variables [19]. Few other models like BiGA N [20], which encourages the network to learn a compact
EBGAN [21] are also having similar advantages of having representation of the input data that can accurately reconstruct
hybrid characteristics. Model effectiveness can be measured the original input image. .
by how precisely the model can regenerate original data and
how the model can classify different class data with proper III. A RCHIT ECT URE
distance between them. Below table I summarizes different General architecture of the proposed model is as in fig 1. It
types of autoencoders , including their descriptions and includes two network. 1. Adversarial auto encoder. 2.
applications. Discriminator. In case of variational autoencoder the
T ABLE 1 SUMMARY OF VARIOUS AUTOENCODERS distribution of latent vector would be normal distribution
because of KL divergence term in loss function. While
Auto Encoder Description Applications regardless reconstruction loss in adversarial auto encoder the
Variational Uses probabilistic Image and speech latent distribution could be any distribution and it depends on
Autoencoders distribution for generation, noise vector p (z) . In AAE it uses adversarial concept due to
latent vector dimensionality latent vector q (z) has better distribution compare to VAE. We
reduction are considering this advantage while choosing base model for
Adversarial Auto Combination of Image and speech our network. Second Network in our model is discriminator
encoders generative and generation, data which discriminate the input image and generated image from
discriminative compression AAE, because of which network will jointly trained by two
models models which results in to better reconstruction.
VAE-GANs Hybrid model that Image and speech AAE is having three components. First encoder will encode
combines VAEs generation, data the input image and generates the latent vector which is
and GANs compression compressed representation of input data. Decoder will takes
BiGAN Uses generative Image and speech the input from this latent vector and regenerates the input data
and discriminative generation, by minimizing reconstruction loss between generated image
models anomaly detection and original image. Third component is discriminator 1 which
EBGAN Uses energy-based takes two input, one from vector with known distribution P
model for learning and one from latent vector Q and this discriminator will force
compressed latent vector Q to have its data distribution close to known
representation distribution P. This will allow user to generate any desired
distribution from Q. Discriminator 2 is other network which
of discriminator 1 is to force the encoder to generate a latent T ABLE 3 PARAMETER FOR GIVEN MODEL
vector that has the same distribution as the prior distribution,
which is given as the second input to discriminator 1. Parameter Value
Discriminator 2: Initial learning rate 0.001
This network will work as generative adversarial network Batch Size 16
which will discriminate the input image with output image of Epoch Random batch 20000 times
the auto encoder, this will force auto encoder to generate Prior distribution Gaussian Normal
original image due to training with the custom loss function. Latent Space 8
Trained auto encoder will be able to generate original image to Dimension
fool the discriminator while discriminator will classify original
as true and generated as fake. The function of this network is In Figure 3, the decoder uses a randomly generated vector
similar to GAN. from the prior distribution to produce a reconstructed sample,
as shown. Figure 4 presents t-SNE visualizations that illustrate
IV. EXPERIMENTS the clustering of the MNIST dataset. The AAE and our model
Here we have compared our model with auto encoder and exhibit virtually identical data distributions for the
Adversarial auto encoder. MNIST dataset is selected for reconstructed images. However, the AE shows a gap between
performing experiments. It contain handwritten digits from 0 different samples, which could indicate that the model is
to 9, each of 28x28 size. capable of generating diverse types of images when the input
For the Experiment we have considered following
parameter as shown in Table 3. Fig 2 shows regenerated
images of A. using different models (B,C,D). Table 1 shows
recorded peak signal to noise for the generated images in Fig
2. Comparison of Peak Signal to Noise ratio for reconstructed
image is as shown in Table 2.