Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
30 views

Autoencoder

Here are the key steps for training an autoencoder with sparsity regularization: 1. Define the objective function to include an L2 weight regularization term and a sparsity regularization term based on the Kullback-Leibler divergence between the average activation of each hidden unit and a target sparsity proportion. 2. Train the autoencoder using backpropagation to minimize the objective function, which will result in sparse activations across the hidden units by penalizing units that are either too active or too inactive on average. 3. The trained autoencoder can then reconstruct input data using a sparse representation, where each data point is approximated by only a small subset of hidden units rather than a distributed combination across all hidden units.

Uploaded by

Akash Savaliya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Autoencoder

Here are the key steps for training an autoencoder with sparsity regularization: 1. Define the objective function to include an L2 weight regularization term and a sparsity regularization term based on the Kullback-Leibler divergence between the average activation of each hidden unit and a target sparsity proportion. 2. Train the autoencoder using backpropagation to minimize the objective function, which will result in sparse activations across the hidden units by penalizing units that are either too active or too inactive on average. 3. The trained autoencoder can then reconstruct input data using a sparse representation, where each data point is approximated by only a small subset of hidden units rather than a distributed combination across all hidden units.

Uploaded by

Akash Savaliya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Lecture 8: Autoencoder

Topics of this lecture


• What is an autoencoder?
• Training of autoencoder
• Autoencoder for “internal representation”
• Training with l2-norm regularization
• Training with sparsity regularization
• Training of deep network

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/2
What is an autoencoder? (1/2)
• autoencoder is a
special MLP. Output: 𝑥ො

• There is one input


layer, one output layer,
and one or more Hidden layer(s)

hidden layers.
• The teacher signal Input: 𝑥
equals to the input.
The main purpose of using an autoencoder is to find a new
(internal or latent) representation for the given feature space,
with the hope of obtaining the true factors that control the
distribution of the input data.
Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/3
What is an autoencoder? (2/2)
• Using principal component
analysis (PCA) we can obtain a Output layer
linear autoencoder.
• Each hidden unit corresponds to
an eigenvector of the co- Hidden layer(s)

variance matrix, and each datum


is a linear combination of these
Input layer
eigenvectors.
Using a non-linear activation function in the hidden layer, we can
obtain a non-linear autoencoder. That is, each datum can be
represented using a set of non-linear basis functions.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/4
Training of autoencoder (1/4)
• Implementing an autoencoder using an MLP, we
can find a more compact representation for the
given problem space.
• Compared with classification MLP, an
autoencoder can be trained with un-labelled data.
• However, training of an autoencoder is
supervised learning because the input itself is the
teacher signal.
• BP algorithm can also be used for training.
Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/5
Training of autoencoder (2/4)
• The 𝑛-th input is denoted by 𝑥, and the corresponding output is
denoted by 𝑥ෞ𝑛 .
• Normally, the objective (loss) function used for training an
autoencoder is defined as follows:

𝐸 𝑤 = σ𝑁
𝑛=1 𝑥𝑛 − 𝑥
ෞ𝑛 2
(1)

• where 𝑤 is a vector containing all weights and bias of the network.


• If the input data are binary, we can also define the objective
function as the cross-entropy.
• For a single hidden layer MLP, the hidden layer is called the encoder,
and the output layer is called the decoder.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/6
Training of
autoencoder (3/4)
• Train an autoencoder for
the well-known IRIS
database using Matlab.
• The hidden layer size is 5.
• We can also specify
other parameters. For
detail, visit the web page
given below.

[Matlab manual] https://www.mathworks.com/help/nnet/ref/trainautoencoder.html

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/7
Training of autoencoder (4/4)
1. X=iris_dataset;
2. Nh=5;
3. enc=trainAutoencoder(X,Nh);
4. view(enc)

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/8
Internal representation of data (1/2)
1. X=digitTrainCellArrayData;
2. Nh=36;
3. enc=trainAutoencoder(X,Nh);
4. plotWeights(enc);
• An autoencoder is trained for the dataset
containing 5,000 handwritten digits.
• We used 500 iterations for training, and
the hidden layer size 36.
• The right figure shows the weights of the
hidden layer.
• These are similar to the Eigenfaces.

The “weight image” for each hidden neuron serves as a basis


image. Each datum is mapped to these basis images, and the
results are then combined to reconstruct the original image.
Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/9
Internal representation of data (2/2)
XTrain=digitTrainCellArrayData;
Nh=36; % We can get better results using a larger Nh
enc=trainAutoencoder(XTrain,Nh);
XTest=digitTestCellArrayData;
xReconstructed=predict(enc,XTest);
𝑋 − 𝑋෠ =0.0184

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/10
Training with
l2-norm regularization (1/2)
• The main purpose of autoencoder is to reconstruct the input
space using a smaller number of basis functions.
• If we use the objective function given by (1), the results may
not generalize well for test data.
• To improve the generalization ability, a common practice is to
introduce a penalty in the objective function as follows:
𝐸 𝑤 = σ𝑁 𝑛=1 𝑛 𝑥 − 𝑥
ෞ𝑛
2+𝜆 𝑤 2
(2)
𝐿 σ 𝑁𝑘 𝑘 2
𝑤 2 = σ𝑁 σ
𝑘=1 𝑗=1 𝑖=1 𝑤𝑗𝑖 (3)
• The effect of introducing this l2-norm is to make the
solution more “smooth”.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/11
Training with
l2-norm regularization (2/2)
𝑋 − 𝑋෠ =0.0261

For this example, we cannot see the positive effect clearly. Generally speaking,
however, if the inputs are noisy, regularization can obtain better results.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/12
Training with sparsity regularization (1/6)
• In nearest neighbor-based approximation, each datum is
approximated by one of the already observed data (i.e. the
nearest one).
• In PCA, each datum is approximated by a point in a linear
space spanned by the basis vectors (eigenvectors).
• Using autoencoder, each datum is approximate by a linear
combination of the hidden neuron outputs.
• Usually, the basis functions are global in the sense that ANY
given datum can be approximated well by using the same set
of basis functions.
• Usually, the number 𝑁𝑏 of basis functions equals to the rank 𝑟
of the linear space. For PCA, 𝑁𝑏 ≪ 𝑟 because we use the
“principal basis functions”.
[Lv and ZHAO, 2007] https://www.emeraldinsight.com/doi/abs/10.1108/17427370710847327
Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/13
Training with sparsity regularization (2/6)

• Using an autoencoder, however, we can make the


number 𝑁ℎ of hidden neurons larger than the rank 𝑟.
• Instead, we can approximate each datum using a much
smaller number of hidden neurons.
• This way, we can encode each data point using a small
number of parameters as follows:
𝑥 = σ𝑚
𝑛=1 𝑤𝑘𝑛 𝑦𝑘𝑛 (4)
• where 𝑚 is the number of hidden neurons for
approximating 𝑥, 𝑦𝑘𝑛 is the output of the 𝑘𝑛 -th hidden
neuron, 𝑘1 < 𝑘2 < ⋯ < 𝑘𝑚 𝜖 1, 𝑁ℎ .

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/14
Training with sparsity regularization (3/6)
• For sparse representation, we introduce another penalty in
the objective function as follows:
𝐸 𝑤 = σ𝑁 ෞ𝑛 2 + 𝜆 𝑤 2 + 𝛽 ∙ 𝐹𝑠𝑝𝑎𝑟𝑠𝑖𝑡𝑦 (5)
𝑛=1 𝑥𝑛 − 𝑥
• To define the sparsity term 𝐹𝑠𝑝𝑎𝑟𝑠𝑖𝑡𝑦 , we need the average
output activation value of a neuron given by
1 𝑁ℎ (1)
𝜌ො = σ𝑗=1 𝑔 𝑢𝑗 (6)
𝑁
(1)
• where 𝑁 is the number of training data, and 𝑢𝑗 is the
effective input of the 𝑗-th hidden neuron.
• A neuron is very “active” if 𝜌ො is high. To obtain a sparse
neural network, it is necessary to make the neurons less
active. This way, we can reconstruct any given datum using
less number of hidden neurons (basis functions).

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/15
Training with sparsity regularization (4/6)

• Based on the average output activation value, we


can define the sparsity term using the Kullback–
Leibler divergence as follows:
𝑁ℎ

𝐹𝑠𝑝𝑎𝑟𝑠𝑖𝑡𝑦 = ෍ 𝐾𝐿(𝜌 ∥ 𝜌)

𝑖=1
𝑁ℎ 𝜌 1−𝜌
= σ𝑖=1 𝜌 log + 1−𝜌 log( ) (7)

𝜌 1−ෝ
𝜌
• where 𝜌 is a sparsity parameter to be specified by
the user. The smaller, the more sparse.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/16
Training with sparsity regularization (5/6)

enc = trainAutoencoder(XTrain,Nh,...
'L2WeightRegularization',0.004,...
'SparsityRegularization',4,...
'SparsityProportion',0.10);

𝐸 𝑤 = σ𝑁
𝑛=1 𝑥𝑛 − 𝑥
ෞ𝑛 2
+𝜆 𝑤 2
+ 𝛽 ∙ 𝐹𝑠𝑝𝑎𝑟𝑠𝑖𝑡𝑦 (5’)
• By specifying a small sparsity proportion 𝜌, we can get an
autoencoder with less active hidden neurons.
• If we use a proper norm of w, we can also reduced the
number of non-zero weights of the hidden neurons, and make
the network more sparse.
Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/17
Training with sparsity regularization (6/6)
𝑋 − 𝑋෠ =0.0268

In this example, the reconstructed images are not better. However,


with less active hidden neurons, it is possible to “interpret” the
neural network more conveniently (see reference below).
[Furukawa and ZHAO, 2017] https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8328367
Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/18
Training of deep network (1/5)
• Using autoencoder, we can extract useful features for
representing the input data without using “labels”.
• The extracted features can be used by another MLP for
making the final decision.
• Shown in the next page is a Matlab example. In this
example,
– The first two hidden layers are found by using autoencoder
training;
– The last layer is a soft-max layer.
– The three layers are “stacked” to form a deep network,
which can be re-trained using the given data using the BP
algorithm.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/19
Training of deep network (2/5)

• autoenc1 =
trainAutoencoder(X,Nh1,'DecoderTransferFunction','purelin');
• features1 = encode(autoenc1,X);
• autoenc2 =
trainAutoencoder(features1,Nh2'DecoderTransferFunction',…
'purelin','ScaleData',false);
• features2 = encode(autoenc2,features1);
• softnet =
trainSoftmaxLayer(features2,T,'LossFunction','crossentropy');
• deepnet = stack(autoenc1,autoenc2,softnet);
• deepnet = train(deepnet,X,T);

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/20
Training of deep network (3/5)

• Shown here is an example


using the wine dataset.
• The right figure is the
confusion matrix of the deep
network, and
• the figure in the bottom is the
structure of the deep network.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/21
Training of deep network (4/5)
• To summarize, we can design a deep neural network
(with K layers, not include the input layer) as follows:
– Step 1: i=1; X(i)=X(0); % X(0) is the given data
– Step 2: Train an autoencoder A(i) based on X(i);
– Step 3: X(i+1)=encoder(X(i));
– Step 4: If i<K, return to Step 3;
– Step 5: Train a regression layer R using BP
• Training data: X(K-1)
• Teacher signal: Provided in the training set
– Step 6: Stack [A(1), A(2),…,A(K-1),R] to form a deep MLP.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/22
Training of deep network (5/5)
• We can also train a deep autoencoder by modifying the
algorithm slightly as follows:
– Step 1: i=1; X(i)=X(0); % X(0) is the given data
– Step 2: Train an autoencoder A(i) based on X(i);
– Step 3: X(i+1)=encoder(X(i));
– Step 4: If i<K, return to Step 3;
• K is the specified number of layers
– Step 5: Train a regression layer R using BP
• Training data: X(K-1)
• Teacher signal: X(0)
– Step 6: Stack [A(1), A(2),…,A(K-1),R] to form a deep
autoencoder.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/23
Homework
• Try the Matlab program for digit reconstruction given in the
following page:
https://www.mathworks.com/help/nnet/ref/trainautoencoder.html
• See what happen if we change the parameter for l2-norm
regularization; and
• See what happen if we change the parameter for sparsity
regularization.
• You may plot
– The weights of the hidden neurons (as images);
– The outputs of the hidden neurons; or
– The reconstructed data.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/24

You might also like