Autoencoder
Autoencoder
1. Encoder
Input layer take raw input data
The hidden layers progressively reduce the dimensionality of
the input, capturing important features and patterns. These
layer compose the encoder.
The bottleneck layer (latent space) is the final hidden layer,
where the dimensionality is significantly reduced. This layer
represents the compressed encoding of the input data.
2. Decoder
The bottleneck layer takes the encoded representation and
expands it back to the dimensionality of the original input.
The hidden layers progressively increase the dimensionality
and aim to reconstruct the original input.
The output layer produces the reconstructed output, which
ideally should be as close as possible to the input data.
3. The loss function used during training is typically a reconstruction loss,
measuring the difference between the input and the reconstructed
output. Common choices include mean squared error (MSE) for
continuous data or binary cross-entropy for binary data.
4. During training, the autoencoder learns to minimize the reconstruction
loss, forcing the network to capture the most important features of the
input data in the bottleneck layer.
After the training process, only the encoder part of the autoencoder is retained
to encode a similar type of data used in the training process. The different ways
to constrain the network are: –
Keep small Hidden Layers: If the size of each hidden layer is kept as
small as possible, then the network will be forced to pick up only the
representative features of the data thus encoding the data.
Regularization: In this method, a loss term is added to the cost
function which encourages the network to train in ways other than
copying the input.
Denoising: Another way of constraining the network is to add noise to
the input and teach the network how to remove the noise from the
data.
Tuning the Activation Functions: This method involves changing the
activation functions of various nodes so that a majority of the nodes
are dormant thus, effectively reducing the size of the hidden layers.
Types of Autoencoders
There are diverse types of autoencoders and analyze the advantages and
disadvantages associated with different variation:
Denoising Autoencoder
Denoising autoencoder works on a partially corrupted input and trains to
recover the original undistorted image. As mentioned above, this method is an
effective way to constrain the network from simply copying the input and thus
learn the underlying structure and important features of the data.
Advantages
1. This type of autoencoder can extract important features and reduce
the noise or the useless features.
2. Denoising autoencoders can be used as a form of data augmentation,
the restored images can be used as augmented data thus generating
additional training samples.
Disadvantages
1. Selecting the right type and level of noise to introduce can be
challenging and may require domain knowledge.
2. Denoising process can result into loss of some information that is
needed from the original input. This loss can impact accuracy of the
output.
Sparse Autoencoder
This type of autoencoder typically contains more hidden units than the input but
only a few are allowed to be active at once. This property is called the sparsity of
the network. The sparsity of the network can be controlled by either manually
zeroing the required hidden units, tuning the activation functions or by adding a
loss term to the cost function.
Advantages
1. The sparsity constraint in sparse autoencoders helps in filtering out
noise and irrelevant features during the encoding process.
2. These autoencoders often learn important and meaningful features
due to their emphasis on sparse activations.
Disadvantages
1. The choice of hyperparameters play a significant role in the
performance of this autoencoder. Different inputs should result in the
activation of different nodes of the network.
2. The application of sparsity constraint increases computational
complexity.
Variational Autoencoder
Variational Autoencoder makes strong assumptions about the distribution of
latent variables and uses the Stochastic Gradient Variational Bayes estimator in
the training process. It assumes that the data is generated by a Directed
Graphical Model and tries to learn an approximation to the conditional
property where and are the parameters of the encoder and the decoder
respectively.
Advantages
1. Variational Autoencoders are used to generate new data points that
resemble the original training data. These samples are learned from
the latent space.
2. Variational Autoencoder is probabilistic framework that is used to learn
a compressed representation of the data that captures its underlying
structure and variations, so it is useful in detecting anomalies and data
exploration.
Disadvantages
1. Variational Autoencoder use approximations to estimate the true
distribution of the latent variables. This approximation introduces
some level of error, which can affect the quality of generated samples.
2. The generated samples may only cover a limited subset of the true
data distribution. This can result in a lack of diversity in generated
samples.