7& 9 Autoencoder and Variational Autoencoder
7& 9 Autoencoder and Variational Autoencoder
AUTOENCODERS
An autoencoder is a type of artificial neural network used to learn data encodings in an
unsupervised manner.
The aim of an autoencoder is to learn a lower-dimensional representation (encoding) for a higher-
dimensional data, typically for dimensionality reduction, by training the network to capture the
most important parts of the input image.
1. Encoder: A module that compresses the train-validate-test set input data into an encoded
representation that is typically several orders of magnitude smaller than the input data.
2. Bottleneck: A module that contains the compressed knowledge representations and is therefore
the most important part of the network.
3. Decoder: A module that helps the network“decompress” the knowledge representations and
reconstructs the data back from its encoded form. The output is then compared with a ground truth.
1
Types of autoencoders
1. Undercomplete autoencoders
The way it works is very straightforward— Undercomplete autoencoder takes in an image and
tries to predict the same image as output, thus reconstructing the image from the compressed
bottleneck region.
Undercomplete autoencoders are truly unsupervised as they do not take any form of label, the
target being the same as the input.
The primary use of autoencoders like such is the generation of the latent space or the bottleneck,
which forms a compressed substitute of the input data and can be easily decompressed back with
the help of the network when needed.
This form of compression in the data can be modeled as a form of dimensionality reduction
2
2. Sparse autoencoders
Sparse autoencoders are similar to the undercomplete autoencoders in that they use the same image
as input and ground truth.
The means via which encoding of information is regulated is significantly different.
While undercomplete autoencoders are regulated and fine-tuned by regulating the size of the
bottleneck, the sparse autoencoder is regulated by changing the number of nodes at each hidden
layer.
Since it is not possible to design a neural network that has a flexible number of nodes at its hidden
layers, sparse autoencoders work by penalizing the activation of some neurons in hidden layers.
3
In other words, the loss function has a term that calculates the number of neurons that have been
activated and provides a penalty that is directly proportional to that.
This penalty, called the sparsity function, prevents the neural network from activating more
neurons and serves as a regularizer.
While typical regularizers work by creating a penalty on the size of the weights at the nodes,
sparsity regularizer works by creating a penalty on the number of nodes activated.
There are two primary ways in which the sparsity regularizer term can be incorporated into the loss
function.
L1 Loss: In here, we add the magnitude of the sparsity regularizer as we do for general
regularizers:
KL-Divergence:
In this case, we consider the activations over a collection of samples at once rather than
summing them as in the L1 Loss4 method. We constrain the average activation of each
neuron over this collection.
Considering the ideal distribution as a Bernoulli distribution, we include KL divergence
within the loss to reduce the difference between the current distribution of the activations
and the ideal (Bernoulli) distribution:
3. Contractive autoencoders
Contractive autoencoders perform task of learning a representation of the image while passing it
through a bottleneck and reconstructing it in the decoder.
The contractive autoencoder also has a regularization term to prevent the network from learning
the identity function and mapping input into the output.
Contractive autoencoders work on the basis that similar inputs should have similar encodings and a
similar latent space representation. It means that the latent space should not vary by a huge amount
for minor variations in the input.
To train a model that works along with this constraint, we have to ensure that the derivatives of the
hidden layer activations are small with respect to the input data.
6
4. Denoising autoencoders
Denoising autoencoders, as the name suggests, are autoencoders that remove noise from an
image.
As opposed to autoencoders we’ve already covered, this is the first of its kind that does not
have the input image as its ground truth.
In denoising autoencoders, we feed a noisy version of the image, where noise has been added
via digital alterations.
The noisy image is fed to the encoder-decoder architecture, and the output is compared with
the ground truth image.
7
The denoising autoencoder gets rid of noise by learning a representation of the input where the
noise can be filtered out easily.
While removing noise directly from the image seems difficult, the autoencoder performs this
by mapping the input data into a lower-dimensional manifold (like in undercomplete
autoencoders), where filtering of noise becomes much easier.
Essentially, denoising autoencoders work with the help of non-linear dimensionality reduction.
The loss function generally used in these types of networks is L2 or L1 loss.
5. Variational Autoencoders
To generate a sample from the model, the VAE first draws a sample z from the
code distribution pmodel (z).
The sample is then run through a differentiable generator network g(z ). Finally,
x is sampled from a distribution pmodel (x; g (z)) = pmodel (x | z).
However, during training, the approximate inference network (or encoder) q(z |
x) is used to obtain z and pmodel(x | z) is then viewed as a decoder network.
The key insight behind variational autoencoders is that they may be trained
by maximizing the variational lower bound L(q) associated with data point x:
8
When q is chosen to be a Gaussian distribution, with noise added to a
predicted mean value, maximizing this entropy term encourages increasing
the standard deviation of this noise.
The second term tries to make the approximate posterior distribution q(z
| x) and the model prior pmodel(z) approach each other.
These approaches are slow and often require the ability to compute Ez∼q
log p model(z, x) in closed form.
It also obtains excellent results and is among the state of the art
approaches to generative modeling.
The causes of this phenomenon are not yet known. One possibility is
that the blurriness is an intrinsic effect of maximum likelihood, which
minimizes DKL(pdatapmodel).
VAE Framework
Applications of autoencoders
1. Dimensionality reduction
o Undercomplete autoencoders are those that are used for dimensionality reduction.
o These can be used as a pre-processing step for dimensionality reduction as they can
perform fast and accurate dimensionality reductions without losing much information.
2. Image denoising
o Autoencoders like the denoising autoencoder can be used for performing efficient and
highly accurate image denoising.
o Unlike traditional methods of denoising, autoencoders do not search for noise, they extract
the image from the noisy data that has been fed to them via learning a representation of it.
The representation is then decompressed to form a noise-free image.
o Denoising autoencoders thus can denoise complex images that cannot be denoised via
traditional methods.
11
3. Generation of image and time series data
o Variational Autoencoders can be used to generate both image and time series data.
o VAEs can also be used to model time series data like music.
4. Anomaly detection
o For example—consider an autoencoder that has been trained on a specific dataset P. For
any image sampled for the training dataset, the autoencoder is bound to give a low
reconstruction loss and is supposed to reconstruct the image as is.
12
13