0% found this document useful (0 votes)

32 views

Autoencoder

Here are the key steps for training an autoencoder with sparsity regularization: 1. Define the objective function to include an L2 weight regularization term and a sparsity regularization term based on the Kullback-Leibler divergence between the average activation of each hidden unit and a target sparsity proportion. 2. Train the autoencoder using backpropagation to minimize the objective function, which will result in sparse activations across the hidden units by penalizing units that are either too active or too inactive on average. 3. The trained autoencoder can then reconstruct input data using a sparse representation, where each data point is approximated by only a small subset of hidden units rather than a distributed combination across all hidden units.

Uploaded by

Akash Savaliya

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Autoencoder

Uploaded by

Akash Savaliya

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Lecture 8: Autoencoder

Topics of this lecture

• What is an autoencoder?
• Training of autoencoder
• Autoencoder for “internal representation”
• Training with l2-norm regularization
• Training with sparsity regularization
• Training of deep network

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/2
What is an autoencoder? (1/2)
• autoencoder is a
special MLP. Output: 𝑥ො

• There is one input

layer, one output layer,
and one or more Hidden layer(s)

hidden layers.
• The teacher signal Input: 𝑥
equals to the input.
The main purpose of using an autoencoder is to find a new
(internal or latent) representation for the given feature space,
with the hope of obtaining the true factors that control the
distribution of the input data.
Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/3
What is an autoencoder? (2/2)
• Using principal component
analysis (PCA) we can obtain a Output layer
linear autoencoder.
• Each hidden unit corresponds to
an eigenvector of the co- Hidden layer(s)

variance matrix, and each datum

is a linear combination of these
Input layer
eigenvectors.
Using a non-linear activation function in the hidden layer, we can
obtain a non-linear autoencoder. That is, each datum can be
represented using a set of non-linear basis functions.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/4
Training of autoencoder (1/4)
• Implementing an autoencoder using an MLP, we
can find a more compact representation for the
given problem space.
• Compared with classification MLP, an
autoencoder can be trained with un-labelled data.
• However, training of an autoencoder is
supervised learning because the input itself is the
teacher signal.
• BP algorithm can also be used for training.
Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/5
Training of autoencoder (2/4)
• The 𝑛-th input is denoted by 𝑥, and the corresponding output is
denoted by 𝑥ෞ𝑛 .
• Normally, the objective (loss) function used for training an
autoencoder is defined as follows:

𝐸 𝑤 = σ𝑁
𝑛=1 𝑥𝑛 − 𝑥
ෞ𝑛 2
(1)

• where 𝑤 is a vector containing all weights and bias of the network.

• If the input data are binary, we can also define the objective
function as the cross-entropy.
• For a single hidden layer MLP, the hidden layer is called the encoder,
and the output layer is called the decoder.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/6
Training of
autoencoder (3/4)
• Train an autoencoder for
the well-known IRIS
database using Matlab.
• The hidden layer size is 5.
• We can also specify
other parameters. For
detail, visit the web page
given below.

[Matlab manual] https://www.mathworks.com/help/nnet/ref/trainautoencoder.html

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/7
Training of autoencoder (4/4)
1. X=iris_dataset;
2. Nh=5;
3. enc=trainAutoencoder(X,Nh);
4. view(enc)

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/8
Internal representation of data (1/2)
1. X=digitTrainCellArrayData;
2. Nh=36;
3. enc=trainAutoencoder(X,Nh);
4. plotWeights(enc);
• An autoencoder is trained for the dataset
containing 5,000 handwritten digits.
• We used 500 iterations for training, and
the hidden layer size 36.
• The right figure shows the weights of the
hidden layer.
• These are similar to the Eigenfaces.

The “weight image” for each hidden neuron serves as a basis

image. Each datum is mapped to these basis images, and the
results are then combined to reconstruct the original image.
Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/9
Internal representation of data (2/2)
XTrain=digitTrainCellArrayData;
Nh=36; % We can get better results using a larger Nh
enc=trainAutoencoder(XTrain,Nh);
XTest=digitTestCellArrayData;
xReconstructed=predict(enc,XTest);
𝑋 − 𝑋෠ =0.0184

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/10
Training with
l2-norm regularization (1/2)
• The main purpose of autoencoder is to reconstruct the input
space using a smaller number of basis functions.
• If we use the objective function given by (1), the results may
not generalize well for test data.
• To improve the generalization ability, a common practice is to
introduce a penalty in the objective function as follows:
𝐸 𝑤 = σ𝑁 𝑛=1 𝑛 𝑥 − 𝑥
ෞ𝑛
2+𝜆 𝑤 2
(2)
𝐿 σ 𝑁𝑘 𝑘 2
𝑤 2 = σ𝑁 σ
𝑘=1 𝑗=1 𝑖=1 𝑤𝑗𝑖 (3)
• The effect of introducing this l2-norm is to make the
solution more “smooth”.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/11
Training with
l2-norm regularization (2/2)
𝑋 − 𝑋෠ =0.0261

For this example, we cannot see the positive effect clearly. Generally speaking,
however, if the inputs are noisy, regularization can obtain better results.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/12
Training with sparsity regularization (1/6)
• In nearest neighbor-based approximation, each datum is
approximated by one of the already observed data (i.e. the
nearest one).
• In PCA, each datum is approximated by a point in a linear
space spanned by the basis vectors (eigenvectors).
• Using autoencoder, each datum is approximate by a linear
combination of the hidden neuron outputs.
• Usually, the basis functions are global in the sense that ANY
given datum can be approximated well by using the same set
of basis functions.
• Usually, the number 𝑁𝑏 of basis functions equals to the rank 𝑟
of the linear space. For PCA, 𝑁𝑏 ≪ 𝑟 because we use the
“principal basis functions”.
[Lv and ZHAO, 2007] https://www.emeraldinsight.com/doi/abs/10.1108/17427370710847327
Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/13
Training with sparsity regularization (2/6)

• Using an autoencoder, however, we can make the

number 𝑁ℎ of hidden neurons larger than the rank 𝑟.
• Instead, we can approximate each datum using a much
smaller number of hidden neurons.
• This way, we can encode each data point using a small
number of parameters as follows:
𝑥 = σ𝑚
𝑛=1 𝑤𝑘𝑛 𝑦𝑘𝑛 (4)
• where 𝑚 is the number of hidden neurons for
approximating 𝑥, 𝑦𝑘𝑛 is the output of the 𝑘𝑛 -th hidden
neuron, 𝑘1 < 𝑘2 < ⋯ < 𝑘𝑚 𝜖 1, 𝑁ℎ .

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/14
Training with sparsity regularization (3/6)
• For sparse representation, we introduce another penalty in
the objective function as follows:
𝐸 𝑤 = σ𝑁 ෞ𝑛 2 + 𝜆 𝑤 2 + 𝛽 ∙ 𝐹𝑠𝑝𝑎𝑟𝑠𝑖𝑡𝑦 (5)
𝑛=1 𝑥𝑛 − 𝑥
• To define the sparsity term 𝐹𝑠𝑝𝑎𝑟𝑠𝑖𝑡𝑦 , we need the average
output activation value of a neuron given by
1 𝑁ℎ (1)
𝜌ො = σ𝑗=1 𝑔 𝑢𝑗 (6)
𝑁
(1)
• where 𝑁 is the number of training data, and 𝑢𝑗 is the
effective input of the 𝑗-th hidden neuron.
• A neuron is very “active” if 𝜌ො is high. To obtain a sparse
neural network, it is necessary to make the neurons less
active. This way, we can reconstruct any given datum using
less number of hidden neurons (basis functions).

• Based on the average output activation value, we

can define the sparsity term using the Kullback–
Leibler divergence as follows:
𝑁ℎ

𝐹𝑠𝑝𝑎𝑟𝑠𝑖𝑡𝑦 = ෍ 𝐾𝐿(𝜌 ∥ 𝜌)
ො
𝑖=1
𝑁ℎ 𝜌 1−𝜌
= σ𝑖=1 𝜌 log + 1−𝜌 log( ) (7)
ෝ
𝜌 1−ෝ
𝜌
• where 𝜌 is a sparsity parameter to be specified by
the user. The smaller, the more sparse.

enc = trainAutoencoder(XTrain,Nh,...
'L2WeightRegularization',0.004,...
'SparsityRegularization',4,...
'SparsityProportion',0.10);

𝐸 𝑤 = σ𝑁
𝑛=1 𝑥𝑛 − 𝑥
ෞ𝑛 2
+𝜆 𝑤 2
+ 𝛽 ∙ 𝐹𝑠𝑝𝑎𝑟𝑠𝑖𝑡𝑦 (5’)
• By specifying a small sparsity proportion 𝜌, we can get an
autoencoder with less active hidden neurons.
• If we use a proper norm of w, we can also reduced the
number of non-zero weights of the hidden neurons, and make
the network more sparse.
Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/17
Training with sparsity regularization (6/6)
𝑋 − 𝑋෠ =0.0268

In this example, the reconstructed images are not better. However,

with less active hidden neurons, it is possible to “interpret” the
neural network more conveniently (see reference below).
[Furukawa and ZHAO, 2017] https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8328367
Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/18
Training of deep network (1/5)
• Using autoencoder, we can extract useful features for
representing the input data without using “labels”.
• The extracted features can be used by another MLP for
making the final decision.
• Shown in the next page is a Matlab example. In this
example,
– The first two hidden layers are found by using autoencoder
training;
– The last layer is a soft-max layer.
– The three layers are “stacked” to form a deep network,
which can be re-trained using the given data using the BP
algorithm.

• autoenc1 =
trainAutoencoder(X,Nh1,'DecoderTransferFunction','purelin');
• features1 = encode(autoenc1,X);
• autoenc2 =
trainAutoencoder(features1,Nh2'DecoderTransferFunction',…
'purelin','ScaleData',false);
• features2 = encode(autoenc2,features1);
• softnet =
trainSoftmaxLayer(features2,T,'LossFunction','crossentropy');
• deepnet = stack(autoenc1,autoenc2,softnet);
• deepnet = train(deepnet,X,T);

• Shown here is an example

using the wine dataset.
• The right figure is the
confusion matrix of the deep
network, and
• the figure in the bottom is the
structure of the deep network.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/21
Training of deep network (4/5)
• To summarize, we can design a deep neural network
(with K layers, not include the input layer) as follows:
– Step 1: i=1; X(i)=X(0); % X(0) is the given data
– Step 2: Train an autoencoder A(i) based on X(i);
– Step 3: X(i+1)=encoder(X(i));
– Step 4: If i<K, return to Step 3;
– Step 5: Train a regression layer R using BP
• Training data: X(K-1)
• Teacher signal: Provided in the training set
– Step 6: Stack [A(1), A(2),…,A(K-1),R] to form a deep MLP.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/22
Training of deep network (5/5)
• We can also train a deep autoencoder by modifying the
algorithm slightly as follows:
– Step 1: i=1; X(i)=X(0); % X(0) is the given data
– Step 2: Train an autoencoder A(i) based on X(i);
– Step 3: X(i+1)=encoder(X(i));
– Step 4: If i<K, return to Step 3;
• K is the specified number of layers
– Step 5: Train a regression layer R using BP
• Training data: X(K-1)
• Teacher signal: X(0)
– Step 6: Stack [A(1), A(2),…,A(K-1),R] to form a deep
autoencoder.

Machine Learning: Produced by Qiangfu Zhao (Since 2018), All rights reserved (C) Lec08/23
Homework
• Try the Matlab program for digit reconstruction given in the
following page:
https://www.mathworks.com/help/nnet/ref/trainautoencoder.html
• See what happen if we change the parameter for l2-norm
regularization; and
• See what happen if we change the parameter for sparsity
regularization.
• You may plot
– The weights of the hidden neurons (as images);
– The outputs of the hidden neurons; or
– The reconstructed data.

Black Book Solutions
100% (1)
Black Book Solutions
376 pages
Lecture 1 - Introduction To autoCAD
67% (3)
Lecture 1 - Introduction To autoCAD
46 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Auto Encoder
No ratings yet
Auto Encoder
39 pages
Autoencoder
No ratings yet
Autoencoder
39 pages
Deep Learning Module-2 & 4
No ratings yet
Deep Learning Module-2 & 4
48 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
ch14 Autoencoder
No ratings yet
ch14 Autoencoder
42 pages
Autoencoder - Unit 4
No ratings yet
Autoencoder - Unit 4
39 pages
Lecture 14 Autoencoders
No ratings yet
Lecture 14 Autoencoders
39 pages
Deep Learning Autoencoders
No ratings yet
Deep Learning Autoencoders
31 pages
DL Class5
No ratings yet
DL Class5
23 pages
Deep Learning: Prof:Naveen Ghorpade
No ratings yet
Deep Learning: Prof:Naveen Ghorpade
43 pages
1 Autoencoders
No ratings yet
1 Autoencoders
22 pages
Unit-5 Auto Encoders in Deep Learning
No ratings yet
Unit-5 Auto Encoders in Deep Learning
23 pages
Autoencoder
No ratings yet
Autoencoder
14 pages
Lec16 - Autoencoders
No ratings yet
Lec16 - Autoencoders
18 pages
Deep Learning: Autoencoder
No ratings yet
Deep Learning: Autoencoder
42 pages
Ch3-Auto-encoder
No ratings yet
Ch3-Auto-encoder
40 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
Autoencoder_GAN_edited
No ratings yet
Autoencoder_GAN_edited
138 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
Chapter17 Autoencoders
No ratings yet
Chapter17 Autoencoders
23 pages
Deep Learning Algorithms Report PDF
No ratings yet
Deep Learning Algorithms Report PDF
11 pages
UNIT 3
No ratings yet
UNIT 3
23 pages
AD3501-DL-UNIT 5 NOTES
No ratings yet
AD3501-DL-UNIT 5 NOTES
16 pages
Deep Learning Basics Lecture 8 Autoencoder & DBM
No ratings yet
Deep Learning Basics Lecture 8 Autoencoder & DBM
28 pages
DL Lecture8 Autoencoder
No ratings yet
DL Lecture8 Autoencoder
28 pages
1-s2.0-S174680942300722X-main
No ratings yet
1-s2.0-S174680942300722X-main
5 pages
unit-iv-v-deep-learning-material
No ratings yet
unit-iv-v-deep-learning-material
32 pages
Talk MLSS Part2
No ratings yet
Talk MLSS Part2
97 pages
Autoencoders
No ratings yet
Autoencoders
4 pages
AAI Module 3
No ratings yet
AAI Module 3
11 pages
Autoencoders
No ratings yet
Autoencoders
20 pages
Auto Encoder
No ratings yet
Auto Encoder
10 pages
UNIT-5 part1
No ratings yet
UNIT-5 part1
15 pages
ML Lec 19 Autoencoder
No ratings yet
ML Lec 19 Autoencoder
54 pages
Autoencoders
No ratings yet
Autoencoders
12 pages
Introduction To Autoencoders: A Brief Overview
No ratings yet
Introduction To Autoencoders: A Brief Overview
27 pages
MODULE 5 Auto-Encoders and Generative Models
No ratings yet
MODULE 5 Auto-Encoders and Generative Models
25 pages
UNIT V
No ratings yet
UNIT V
32 pages
DL M3 Tech
No ratings yet
DL M3 Tech
15 pages
DL Unit - 4
No ratings yet
DL Unit - 4
26 pages
DL Unit 5
No ratings yet
DL Unit 5
19 pages
6. Brief Introduction on Current Research Areas - Autoencoders
No ratings yet
6. Brief Introduction on Current Research Areas - Autoencoders
20 pages
Experiment 4
No ratings yet
Experiment 4
26 pages
Unit 5e - Autoencoders
No ratings yet
Unit 5e - Autoencoders
32 pages
DL UNIT 4
No ratings yet
DL UNIT 4
21 pages
Unit II
No ratings yet
Unit II
35 pages
Vae Gan
No ratings yet
Vae Gan
214 pages
D5_PPT
No ratings yet
D5_PPT
79 pages
Unit 3
No ratings yet
Unit 3
39 pages
Autoencoders U
No ratings yet
Autoencoders U
44 pages
Lecture_6373_07
No ratings yet
Lecture_6373_07
53 pages
Autoencoders
No ratings yet
Autoencoders
14 pages
Introduction To Data Learning Autoencoders
No ratings yet
Introduction To Data Learning Autoencoders
8 pages
Auto Encoder S
No ratings yet
Auto Encoder S
32 pages
Study Materials - Sparse Autoencoder
No ratings yet
Study Materials - Sparse Autoencoder
8 pages
week6 (1)
No ratings yet
week6 (1)
4 pages
465-Lecture 12
No ratings yet
465-Lecture 12
31 pages
7& 9 Autoencoder and Variational Autoencoder
No ratings yet
7& 9 Autoencoder and Variational Autoencoder
13 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Programs On String
No ratings yet
Programs On String
15 pages
Strings MCQ's
No ratings yet
Strings MCQ's
20 pages
Solving Programming Questions Based On CLA, Solving MCQs Questions Based On CLA
No ratings yet
Solving Programming Questions Based On CLA, Solving MCQs Questions Based On CLA
17 pages
String 1.1
No ratings yet
String 1.1
34 pages
Past-and-Present-Leaders-in-India
No ratings yet
Past-and-Present-Leaders-in-India
60 pages
16BME0386 Akash Savaliya
No ratings yet
16BME0386 Akash Savaliya
26 pages
Theories of Failure Failure of Ductile Materials
No ratings yet
Theories of Failure Failure of Ductile Materials
35 pages
Plane Stress, Transformation of Stresses
No ratings yet
Plane Stress, Transformation of Stresses
29 pages
Water Purification Module 3 - I
No ratings yet
Water Purification Module 3 - I
18 pages
Quantitative Ability: Number Properties - 1
No ratings yet
Quantitative Ability: Number Properties - 1
23 pages
Corrosion - Module 4
No ratings yet
Corrosion - Module 4
23 pages
Reasoning Ability
No ratings yet
Reasoning Ability
15 pages
Special Theory of Relativity
No ratings yet
Special Theory of Relativity
28 pages
Non-Verbal Communication
No ratings yet
Non-Verbal Communication
11 pages
CAT-1 Sample Paper-2
No ratings yet
CAT-1 Sample Paper-2
1 page
CAT-1 Sample Paper-1
No ratings yet
CAT-1 Sample Paper-1
2 pages
Solar - Pond
No ratings yet
Solar - Pond
50 pages
Log 1.1
No ratings yet
Log 1.1
16 pages
Product Data: Order Tracking Analyzer - Type 2145
No ratings yet
Product Data: Order Tracking Analyzer - Type 2145
8 pages
DSP Quiz
No ratings yet
DSP Quiz
15 pages
Economics Worksheet, Class 11 Correlation - 2023
100% (1)
Economics Worksheet, Class 11 Correlation - 2023
4 pages
Fluids Laboratory Experiment (Friction Loss Along A Pipe)
No ratings yet
Fluids Laboratory Experiment (Friction Loss Along A Pipe)
17 pages
PT2-Grade 10 Portions
No ratings yet
PT2-Grade 10 Portions
5 pages
CS101-MidTerm MCQs With Reference Solved by Arslan
100% (4)
CS101-MidTerm MCQs With Reference Solved by Arslan
39 pages
M3 - Kinematics of Robots
No ratings yet
M3 - Kinematics of Robots
16 pages
AD630 Bal Mod Demod
No ratings yet
AD630 Bal Mod Demod
12 pages
Linear Algebra Harvard Notes Lecture 3
No ratings yet
Linear Algebra Harvard Notes Lecture 3
3 pages
Dme Important Topics
No ratings yet
Dme Important Topics
3 pages
Triangular PDF
No ratings yet
Triangular PDF
10 pages
Reliability Resilience Scale
No ratings yet
Reliability Resilience Scale
10 pages
" " - Fennel Hudson: Exams Test Your Memory, Life Tests Your Learning Others Will Test Your Patience
100% (1)
" " - Fennel Hudson: Exams Test Your Memory, Life Tests Your Learning Others Will Test Your Patience
2 pages
circles activity!!!
No ratings yet
circles activity!!!
2 pages
Full Download Linear Algebra with Applications 7th Edition W. Keith Nicholson PDF DOCX
100% (3)
Full Download Linear Algebra with Applications 7th Edition W. Keith Nicholson PDF DOCX
51 pages
Class 7 Syllabus
No ratings yet
Class 7 Syllabus
3 pages
Direct Instruction Lesson Plan 1
No ratings yet
Direct Instruction Lesson Plan 1
4 pages
Industrial Instrumentation: Chapter 2: Instrument Types and Performance Characteristics-Part 1
No ratings yet
Industrial Instrumentation: Chapter 2: Instrument Types and Performance Characteristics-Part 1
15 pages
VIDYA SAGAR ABC Analysis For CA Foundation BMRS For Dec 2021
No ratings yet
VIDYA SAGAR ABC Analysis For CA Foundation BMRS For Dec 2021
3 pages
Genmath Compound Interest
0% (1)
Genmath Compound Interest
23 pages
Unit 1 Progress Check FRQ
No ratings yet
Unit 1 Progress Check FRQ
5 pages
Double-Lorentz Transmission Line Metamaterial and Its Application To Tri-Band Devices
No ratings yet
Double-Lorentz Transmission Line Metamaterial and Its Application To Tri-Band Devices
4 pages
Engineering Sign Structures
No ratings yet
Engineering Sign Structures
314 pages
Python For Finance Notes: III. Jupyter Notebook Guide
No ratings yet
Python For Finance Notes: III. Jupyter Notebook Guide
21 pages
09 M365ExcelClass
No ratings yet
09 M365ExcelClass
26 pages
Pythagoras Revision Quiz
No ratings yet
Pythagoras Revision Quiz
2 pages
Mathematics - Area Under The Curve
100% (1)
Mathematics - Area Under The Curve
43 pages
Chapter 08
No ratings yet
Chapter 08
111 pages

Autoencoder

Uploaded by

Autoencoder

Uploaded by

Lecture 8: Autoencoder

Topics of this lecture

• There is one input

variance matrix, and each datum

• where 𝑤 is a vector containing all weights and bias of the network.

[Matlab manual] https://www.mathworks.com/help/nnet/ref/trainautoencoder.html

The “weight image” for each hidden neuron serves as a basis

• Using an autoencoder, however, we can make the

• Based on the average output activation value, we

In this example, the reconstructed images are not better. However,

• Shown here is an example

You might also like