Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

7& 9 Autoencoder and Variational Autoencoder

Autoencoders are artificial neural networks designed for unsupervised learning of data encodings, primarily for dimensionality reduction. They consist of three main components: an encoder, a bottleneck, and a decoder, and come in various types including undercomplete, sparse, contractive, denoising, and variational autoencoders. Applications of autoencoders include dimensionality reduction, image denoising, generation of data, and anomaly detection.

Uploaded by

Kavitha
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

7& 9 Autoencoder and Variational Autoencoder

Autoencoders are artificial neural networks designed for unsupervised learning of data encodings, primarily for dimensionality reduction. They consist of three main components: an encoder, a bottleneck, and a decoder, and come in various types including undercomplete, sparse, contractive, denoising, and variational autoencoders. Applications of autoencoders include dimensionality reduction, image denoising, generation of data, and anomaly detection.

Uploaded by

Kavitha
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

9.

AUTOENCODERS
 An autoencoder is a type of artificial neural network used to learn data encodings in an
unsupervised manner.
 The aim of an autoencoder is to learn a lower-dimensional representation (encoding) for a higher-
dimensional data, typically for dimensionality reduction, by training the network to capture the
most important parts of the input image.

Architecture of Auto encoders

Auto encoders consist of 3 parts:

1. Encoder: A module that compresses the train-validate-test set input data into an encoded
representation that is typically several orders of magnitude smaller than the input data.
2. Bottleneck: A module that contains the compressed knowledge representations and is therefore
the most important part of the network.
3. Decoder: A module that helps the network“decompress” the knowledge representations and
reconstructs the data back from its encoded form. The output is then compared with a ground truth.

1
Types of autoencoders

Five popular autoencoders that we will discuss:


1. Undercomplete autoencoders
2. Sparse autoencoders
3. Contractive autoencoders
4. Denoising autoencoders
5. Variational Autoencoders (for generative modelling)

1. Undercomplete autoencoders

 An undercomplete autoencoder is one of the simplest types of autoencoders.

 The way it works is very straightforward— Undercomplete autoencoder takes in an image and
tries to predict the same image as output, thus reconstructing the image from the compressed
bottleneck region.

 Undercomplete autoencoders are truly unsupervised as they do not take any form of label, the
target being the same as the input.

 The primary use of autoencoders like such is the generation of the latent space or the bottleneck,
which forms a compressed substitute of the input data and can be easily decompressed back with
the help of the network when needed.

 This form of compression in the data can be modeled as a form of dimensionality reduction

2
2. Sparse autoencoders
 Sparse autoencoders are similar to the undercomplete autoencoders in that they use the same image
as input and ground truth.
 The means via which encoding of information is regulated is significantly different.

 While undercomplete autoencoders are regulated and fine-tuned by regulating the size of the
bottleneck, the sparse autoencoder is regulated by changing the number of nodes at each hidden
layer.

 Since it is not possible to design a neural network that has a flexible number of nodes at its hidden
layers, sparse autoencoders work by penalizing the activation of some neurons in hidden layers.

3
 In other words, the loss function has a term that calculates the number of neurons that have been
activated and provides a penalty that is directly proportional to that.
 This penalty, called the sparsity function, prevents the neural network from activating more
neurons and serves as a regularizer.

 While typical regularizers work by creating a penalty on the size of the weights at the nodes,
sparsity regularizer works by creating a penalty on the number of nodes activated.

 There are two primary ways in which the sparsity regularizer term can be incorporated into the loss
function.

 L1 Loss: In here, we add the magnitude of the sparsity regularizer as we do for general
regularizers:

 KL-Divergence:

 In this case, we consider the activations over a collection of samples at once rather than
summing them as in the L1 Loss4 method. We constrain the average activation of each
neuron over this collection.
 Considering the ideal distribution as a Bernoulli distribution, we include KL divergence
within the loss to reduce the difference between the current distribution of the activations
and the ideal (Bernoulli) distribution:

3. Contractive autoencoders

 Contractive autoencoders perform task of learning a representation of the image while passing it
through a bottleneck and reconstructing it in the decoder.

 The contractive autoencoder also has a regularization term to prevent the network from learning
the identity function and mapping input into the output.

 Contractive autoencoders work on the basis that similar inputs should have similar encodings and a
similar latent space representation. It means that the latent space should not vary by a huge amount
for minor variations in the input.

 To train a model that works along with this constraint, we have to ensure that the derivatives of the
hidden layer activations are small with respect to the input data.

Where h represents the hidden layer and x represents the input.

The total loss function can be mathematically expressed as:


The gradient is summed over all training samples, and a frobenius norm of the same is taken.

6
4. Denoising autoencoders

 Denoising autoencoders, as the name suggests, are autoencoders that remove noise from an
image.

 As opposed to autoencoders we’ve already covered, this is the first of its kind that does not
have the input image as its ground truth.

 In denoising autoencoders, we feed a noisy version of the image, where noise has been added
via digital alterations.

 The noisy image is fed to the encoder-decoder architecture, and the output is compared with
the ground truth image.

7
 The denoising autoencoder gets rid of noise by learning a representation of the input where the
noise can be filtered out easily.

 While removing noise directly from the image seems difficult, the autoencoder performs this
by mapping the input data into a lower-dimensional manifold (like in undercomplete
autoencoders), where filtering of noise becomes much easier.

 Essentially, denoising autoencoders work with the help of non-linear dimensionality reduction.
The loss function generally used in these types of networks is L2 or L1 loss.

5. Variational Autoencoders

 The variational autoencoder or VAE is a directed model that uses learned


approximate inference and can be trained purely with gradient-based methods.

 To generate a sample from the model, the VAE first draws a sample z from the
code distribution pmodel (z).

 The sample is then run through a differentiable generator network g(z ). Finally,
x is sampled from a distribution pmodel (x; g (z)) = pmodel (x | z).

 However, during training, the approximate inference network (or encoder) q(z |
x) is used to obtain z and pmodel(x | z) is then viewed as a decoder network.

 The key insight behind variational autoencoders is that they may be trained
by maximizing the variational lower bound L(q) associated with data point x:

L(q) = Ez∼q(z|x) log pmodel(z, x) + H(q(z | x)) (1)

= Ez∼q(z|x) log pmodel(x | z) − DKL(q(z | x)||pmodel(z)) (2)

≤ log p model(x). (3)

8
 When q is chosen to be a Gaussian distribution, with noise added to a
predicted mean value, maximizing this entropy term encourages increasing
the standard deviation of this noise.

 More generally, this entropy term encourages the variational posterior to


place high probability mass on many z values that could have generated
x.

 The second term tries to make the approximate posterior distribution q(z
| x) and the model prior pmodel(z) approach each other.

 Traditional approaches to variational inference and learning infer q via an


opti- mization algorithm.

 These approaches are slow and often require the ability to compute Ez∼q
log p model(z, x) in closed form.

 The main idea behind the variational autoencoder is to train a parametric


encoder (also sometimes called an inference network or recognition
model) that produces the parameters of q.

 So long as z is a continuous variable, we can then back-propagate through


samples of z drawn from q(z | x) = q (z; f(x; θ)) in order to obtain a
gradient with respect to θ.

 Learning then consists solely of maximizing L with respect to the


parameters of the encoder and decoder.
9
 All of the expectations in L may be approximated by Monte Carlo
sampling.
 The variational autoencoder approach is elegant, theoretically
pleasing, and simple to implement.

 It also obtains excellent results and is among the state of the art
approaches to generative modeling.

 Its main drawback is that samples from variational autoencoders trained


on images tend to be somewhat blurry.

 The causes of this phenomenon are not yet known. One possibility is
that the blurriness is an intrinsic effect of maximum likelihood, which
minimizes DKL(pdatapmodel).

VAE Framework

 The VAE framework is very straightforward to extend to a wide range of


model architectures.

 This is a key advantage over Boltzmann machines, which require


extremely careful model design to maintain tractability.

 VAEs work very well with a diverse family of differentiable operators.


One particularly sophisticated VAE is the deep recurrent attention writer
or DRAW model.

 DRAW uses a recurrent encoder and recurrent decoder combined with an


attention mechanism.

 The generation process for the DRAW model consists of sequentially


visiting different small image patches and drawing the values of the pixels
10
at those points.
 VAEs can also be extended to generate sequences by defining variational
RNNs by using a recurrent encoder and decoder within the VAE
framework.

 Generating a sample from a traditional RNN involves only non-


deterministic operations at the output space.

 Variational RNNs also have random variability at the potentially more


abstract level captured by the VAE latent variables.

Applications of autoencoders

1. Dimensionality reduction
o Undercomplete autoencoders are those that are used for dimensionality reduction.

o These can be used as a pre-processing step for dimensionality reduction as they can
perform fast and accurate dimensionality reductions without losing much information.

2. Image denoising

o Autoencoders like the denoising autoencoder can be used for performing efficient and
highly accurate image denoising.

o Unlike traditional methods of denoising, autoencoders do not search for noise, they extract
the image from the noisy data that has been fed to them via learning a representation of it.
The representation is then decompressed to form a noise-free image.

o Denoising autoencoders thus can denoise complex images that cannot be denoised via
traditional methods.

11
3. Generation of image and time series data

o Variational Autoencoders can be used to generate both image and time series data.

o The parameterized distribution at the bottleneck of the autoencoder can be randomly


sampled to generate discrete values for latent attributes, which can then be forwarded to the
decoder,leading to generation of image data.

o VAEs can also be used to model time series data like music.

4. Anomaly detection

o Undercomplete autoencoders can also be used for anomaly detection.

o For example—consider an autoencoder that has been trained on a specific dataset P. For
any image sampled for the training dataset, the autoencoder is bound to give a low
reconstruction loss and is supposed to reconstruct the image as is.

12
13

You might also like