0% found this document useful (0 votes)

2 views

Lecture_2 (1)

Lecture 2 covers Multilayer Perceptrons (MLPs) in deep learning, explaining the structure and functioning of neural networks inspired by biological systems. It details the perceptron model, learning algorithms such as backpropagation, and design considerations for MLPs, including activation functions and issues like overfitting and the vanishing gradient problem. The lecture emphasizes the importance of network architecture and training processes in achieving effective machine learning outcomes.

Uploaded by

Abdelrhman Adel

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Lecture_2 (1)

Uploaded by

Abdelrhman Adel

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 52

Lecture 2: Multilayer Perceptrons

CS460: Deep Learning

What is a Neural network ?

 Thebrain is a highly complex,

nonlinear; and parallel computer. It
has the capability to organize its
neurons, so as to perform certain
computations (e.g., pattern
recognition, perception, and motor
control) many times faster than the
fastest digital computer in existence
today.
Biological Neural
Networks
Dendrites
Synapse
Synapse

Axon

Dendrites Soma
Soma
Modeling the single
neuron
Learning in simple
neurons
 If we have two groups of objects, one
group of several written A's, and the
other of B's, we may want our
neuron to tell the A's from the B's, as
in figure.
 We want it to output a 1 when an A
is presented and a 0 when it sees a
B.
Biology analogy
Biological Artificial
Soma Node/neuron
Dendrites Input
Axon Output
Synapse Weight
The perceptron
The simplest kind of neural network is a single-layer
perceptron network, which consists of a single layer
of output nodes; the inputs are fed directly to the
outputs via a series of weights. The sum of the
products of the weights and the inputs is calculated in
each node, and if the value is above some threshold
the neuron fires and takes the activated value;
otherwise it takes the deactivated value.
 Neurons with this kind of activation function are also
called artificial neurons or linear threshold units.
 In the literature the term perceptron often refers to
networks consisting of just one of these units.
The perceptron (cont’d)
 theperceptron is an algorithm for learning
a binary classifier called a
threshold function: a function that maps its
input x(a real-valued vector) to an output
value f(X) (a single binary value):

 wherew is a vector of real-valued

weights, w . x is the dot product ,
where m is the number of inputs to the
perceptron, and b is the bias.
The perceptron (cont’d)
 Perceptrons can be trained by a simple
learning algorithm that is usually called
the delta rule. It calculates the errors
between calculated output and sample
output data, and uses this to create an
adjustment to the weights, thus
implementing a form of gradient descent.
 Single-layer perceptrons are only capable
of learning linearly separable patterns
Linearly Separable
XOR Function

 Itis impossible for a single-layer

perceptron network to learn an
XOR function
Non-linear
transformations
A single-layer neural network can
compute a continuous output instead
of a step function. A common choice
is the so-called logistic function:
Non-linear
transformations
 The logistic function is one of the
family of functions called
sigmoid functions. It has a
continuous derivative, which allows
it to be used in backpropagation.
This function is also preferred
because its derivative is easily
calculated (differentiable) :
Sigmoid function
Multi Layer Perceptron
(MLP)
 MLP is a class of a feedforward (Acyclic) artificial neural
network (ANN).
 Each neuron in one layer has directed connections to
the neurons of the subsequent layer. In many
applications the units of these networks apply a
sigmoid function as an activation function.
 MLPs models are the most basic deep neural network,
which is composed of a series of fully connected layers.
 Each new layer is a set of nonlinear functions of a
weighted sum of all outputs (fully connected) from the
prior one.
 Multilayer feed-forward networks, given enough hidden
units and enough training samples, can closely
approximate any function.
The Architecture

 MLP with one hiddenx layer

1 (PE)

x2 Weighted Transfer
(PE) Sum Function
Y1
x3 (S) (f)

(PE)

(PE) (PE)

Output
(PE)
Layer

Hidden
(PE)
Layer

Input
Layer
MLP processing
(a) Single neuron (b) Multiple neurons

x1 x1 w11 (PE) Y1
w1
w21
(PE) Y

w1 w12
x2 Y  X 1W1  X 2W2
x2 w22 (PE) Y2
PE: Processing Element (or neuron)

Y1 X1W11  X 2W21
w23
Y2 X1W12  X2W22
Y3  X 2W 23 (PE) Y3
MLP processing (cont’d)

Summation function: Y = 3(0.2) + 1(0.4) + 2(0.1) = 1.2

X1 = 3 Transfer function: YT = 1/(1 + e-1.2) = 0.77

W2 = 0.4 Processing Y = 1.2

X2 = 1 YT = 0.77
element (PE)

X3 = 2
Designing the MLP
 Before training can begin, the user must decide on
the network topology by specifying:
 the number of units in the input layer,
 the number of hidden layers (if more than one), the
number of units in each hidden layer, and
 the number of units in the output layer.
 Normalizing the input values (between 0.0 and 1.0)
for each attribute measured in the training tuples
will help speed up the learning phase and prevent
the exploding gradient problem.
 Discrete-valued attributes may be encoded such
that there is one input unit per domain value.
 Choice of the transfer function
Transformation (Transfer)
Function
 Linear function
 Sigmoid (logical activation) function [0
1]
 Tangent Hyperbolic function [-1 1]
MLP: Design issues

 Neural networks can be used for both

classiﬁcation (to predict the class label
of a given tuple) and numeric prediction
(to predict a continuous-valued output).
 For classiﬁcation, one output unit may
be used to represent two classes (where
the value 1 represents one class, and
the value 0 represents the other).
 If there are more than two classes, then
one output unit per class is used.
MLP: Design issues
 There are no clear rules as to the “best”
number of hidden layer units.
 Network design is a trial-and-error process and
may affect the accuracy of the resulting trained
network.
 The initial values of the weights may also affect
the resulting accuracy.
 Once a network has been trained and its
accuracy is not considered acceptable, it is
common to repeat the training process with
 a different network topology or
 a different set of initial weights.
The XOR function -
revisted
MLP Box Office prediction
example
The Learning algorithm

 Itadjusts the weights of the

machine, in order to minimize the
average squared error.
Learning in MLP
 The learning algorithm procedure
 Initialize weights with random values and set
other network parameters
 Read in the inputs and the desired outputs
 Compute the actual output (by working
forward through the layers)
 Compute the error (difference between the
actual and desired output)
 Change the weights by working backward
through the hidden layers
 Repeat steps 2-5 until weights stabilize
Learning in MLP (cont’d)
 Backpropagation learns by iteratively
processing a data set of training tuples,
comparing the network’s prediction for each
tuple with the actual known target value.
 The target value may be the known class label
of the training tuple (for classiﬁcation
problems) or a continuous value (for numeric
prediction).
 For each training tuple, the weights are
modiﬁed so as to minimize the mean-squared
error between the network’s prediction and the
actual target value.
Learning in MLP (cont’d)

 These modiﬁcations are made in the

“backwards” direction (i.e., from the
output layer) through each hidden
layer down to the ﬁrst hidden layer
(hence the name backpropagation).
 Although it is not guaranteed, in
general the weights will eventually
converge, and the learning process
stops.
MLPs Bottlenecks
1. Dimensionality issue

 Rule of thumb: The number of

training samples should be at least 5
to 10 times the number of weights in
the network.
 Otherwise,the network is prone to
overfitting
2. Overfitting
2. Overfitting (cont’d)
3. The black-box syndrome

 A common criticism for ANN: The lack of

transparency/explainability
 Answer: sensitivity analysis
 Conducted on a trained ANN
 The inputs are perturbed while the
relative change on the output is
measured/recorded
 Results illustrate the relative importance
of input variables
sensitivity analysis
4. Vanishing gradient
problem
 In machine learning, the vanishing gradient problem is
encountered when training artificial neural networks with
gradient-based learning methods and backpropagation. In such
methods, during each iteration of training each of the neural
network's weights receives an update proportional to the
partial derivative of the error function with respect to the current
weight. The problem is that in some cases, the gradient will be
vanishingly small, effectively preventing the weight from changing
its value. In the worst case, this may completely stop the neural
network from further training. As one example of the problem cause,
traditional activation functions such as the hyperbolic tangent
function have gradients in the range (0,1], and backpropagation
computes gradients by the chain rule. This has the effect of
multiplying n of these small numbers to compute gradients of the
early layers in an n-layer network, meaning that the gradient (error
signal) decreases exponentially with n while the early layers train
very slowly.
Building Neural
Networks
 Architecture of a neural network is driven
by the task it is intended to address
 Classification, regression, clustering,
general optimization, association, ….
 Most popular architecture: Feedforward
multi-layered perceptron with
backpropagation learning algorithm
 Used for both classification and regression
type problems
 Others – Recurrent, self-organizing feature
maps, Hopfield networks, …
Development of NNs
Backpropagation
 Multi-layer networks use a variety of learning techniques, the most
popular being back-propagation.
 The output values are compared with the correct answer to compute the
value of some predefined error-function. By various techniques, the error
is then fed back through the network.
 The algorithm adjusts the weights of each connection in order to reduce
the value of the error function by some small amount.
 After repeating this process for a sufficiently large number of training
cycles, the network will usually converge to some state where the error
of the calculations is small.
 In this case, one would say that the network has learned a certain target
function. To adjust weights properly, one applies a general method for
non-linear optimization that is called gradient descent. For this, the
network calculates the derivative of the error function with respect to the
network weights, and changes the weights such that the error decreases
(thus going downhill on the surface of the error function).
 For this reason, back-propagation can only be applied on networks with
differentiable activation functions.
The steps Of The
Backpropagation
Initialize the weights:
 The weights in the network are
initialized to small random numbers
(e.g., ranging from−1.0 to 1.0, or−0.5
to 0.5).
 Each unit has a bias associated with it,
as explained later.
 The biases are similarly initialized to
small random numbers.
 Each training tuple, X, is processed by
the following steps.
Propagate the inputs
forward:
 First,the training tuple is fed to the
network’s input layer.
 The inputs pass through the input units,
unchanged.
 That is, for an input unit, j, its output, Oj, is
equal to its input value, Ij.
 Next, the net input and output of each unit in
the hidden and output layers are computed.
 The net input to a unit in the hidden or
output layers is computed as a linear
combination of its inputs.
The steps Of The
Backpropagation
 Propagate the inputs forward:
 Each hidden layer or output layer unit has a
number of inputs to it that are, in fact, the
outputs of the units connected to it in the
previous layer.
Propagate the inputs
forward
 To compute the net input to the unit, each input
connected to the unit is multiplied by its
corresponding weight, and this is summed.
 Given a unit, j in a hidden or output layer, the net
input, Ij, to unit j is

 where wij is the weight of the connection from unit i

in the previous layer to unit j; Oi is the output of
unit i from the previous layer; and θj is the bias of
unit j.
 The bias acts as a threshold in that it serves to vary
the activity of the unit.
Propagate the inputs
forward
 Each unit in the hidden and output layers takes its net
input and then applies an activation function to it.
 The function symbolizes the activation of the neuron
represented by the unit.
 The logistic, or sigmoid, function is used. Given the
net input Ij to unit j, then Oj, the output of unit j, is
computed as

 Thelogistic function is nonlinear and differentiable,

allowing the backpropagation algorithm to model
classiﬁcation problems that are linearly inseparable.
Propagate the inputs
forward
 We compute the output values, Oj, for
each hidden layer, up to and including
the output layer, which gives the
network’s prediction.
 In practice, it is a good idea to cache (i.e.,
save) the intermediate output values at
each unit as they are required again later
when back propagating the error.
 This trick can substantially reduce the
amount of computation required.
Back propagate the
error
 Theerror is propagated backward by updating
the weights and biases to reﬂect the error of
the network’s prediction. For a unit j in the
output layer, the error Errj is computed by

 where Oj is the actual output of unit j, and Tj is

the known target value of the given training
tuple.
 Note that Oj(1−Oj) is the derivative of the
logistic function.
Back propagate the
error
 To compute the error of a hidden layer unit j,
the weighted sum of the errors of the units
connected to unit j in the next layer are
considered.
 The error of a hidden layer unit j is

 where wjk is the weight of the connection from

unit j to a unit k in the next higher layer, and
Errk is the error of unit k.
Back propagate the
error
 The weights and biases are updated to reﬂect the
propagated errors.
 Weights are updated by the following equations,
where delta(wij) is the change in weight wij:

 The variable l is the learning rate, a constant typically

having a value between 0.0 and 1.0.
 The learning rate helps avoid getting stuck at a local
minimum in decision space. If the learning rate is too
small, then learning will occur at a very slow pace. If
learning rate is too large, then oscillation.
Back propagate the
error
 Biasesare updated by the following equations, where
delta(θj) is the change in bias θj:

 The updating of the weights and biases after the

presentation of each tuple, referred to case updating.
 Alternatively, the weight and bias increments could
be accumulated in variables, so that the weights and
biases are updated after all the tuples in the training
set have been presented. (called epoch updating)
 Batch/mini-batch updating : weight and bias are
updated after several samples
 one iteration through the training set is an epoch.
Terminating condition
 Training stops when:
 All delta(wij) in the previous epoch are so small
as to be below some specified threshold, or
 The percentage of tuples misclassified in the
previous epoch is below some threshold, or
 A pre-specified number of epochs has expired.
 Inpractice, several hundreds of thousands
of epochs may be required before the
weights will converge.

HCIA-AI V3.0 Training Material
100% (2)
HCIA-AI V3.0 Training Material
474 pages
Artificial Intelligence Artificial Neural Networks - : Introduction
No ratings yet
Artificial Intelligence Artificial Neural Networks - : Introduction
43 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
UNit 6 Machine Learning
No ratings yet
UNit 6 Machine Learning
23 pages
CNN and Gan: Introduction To
No ratings yet
CNN and Gan: Introduction To
58 pages
Introduction To ANNs
No ratings yet
Introduction To ANNs
31 pages
Charotar University of Science and Technology Faculty of Technology and Engineering
No ratings yet
Charotar University of Science and Technology Faculty of Technology and Engineering
10 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
34 pages
ML 6
No ratings yet
ML 6
10 pages
Unit 03 - Neural Networks - MD
No ratings yet
Unit 03 - Neural Networks - MD
24 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Assign 1 Soft Comp
No ratings yet
Assign 1 Soft Comp
12 pages
Multi Layer Perceptron
No ratings yet
Multi Layer Perceptron
62 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Training Neural Networks With GA Hybrid Algorithms
No ratings yet
Training Neural Networks With GA Hybrid Algorithms
12 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
14 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
ANN PG Module1
No ratings yet
ANN PG Module1
75 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
DL Concepts 1 Overview
No ratings yet
DL Concepts 1 Overview
80 pages
Unit-5 (1)
No ratings yet
Unit-5 (1)
10 pages
CS217_2024_lec11
No ratings yet
CS217_2024_lec11
7 pages
9.a Handout-1-NN Arch + Representation Power + Layer Sizes
No ratings yet
9.a Handout-1-NN Arch + Representation Power + Layer Sizes
7 pages
Unit I
No ratings yet
Unit I
21 pages
Session 1
No ratings yet
Session 1
8 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
66 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
ML unit 4
No ratings yet
ML unit 4
23 pages
Artificial Neural Network Notes
No ratings yet
Artificial Neural Network Notes
9 pages
Institute For Advanced Management Systems Research Department of Information Technologies Abo Akademi University
No ratings yet
Institute For Advanced Management Systems Research Department of Information Technologies Abo Akademi University
41 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
MODULE 1 DL
No ratings yet
MODULE 1 DL
6 pages
Unit 12
No ratings yet
Unit 12
26 pages
backpropagation
No ratings yet
backpropagation
7 pages
Soft Computing
No ratings yet
Soft Computing
38 pages
Soft Computing Manual.-1
No ratings yet
Soft Computing Manual.-1
45 pages
ML Unit 2
No ratings yet
ML Unit 2
91 pages
7_Neural Networks (1)
No ratings yet
7_Neural Networks (1)
66 pages
13 - Chapter 5 PDF
No ratings yet
13 - Chapter 5 PDF
40 pages
FRJ Paper
No ratings yet
FRJ Paper
9 pages
Aditya Jain NN Assignment
No ratings yet
Aditya Jain NN Assignment
13 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Neural Networks - Basics Matlab PDF
No ratings yet
Neural Networks - Basics Matlab PDF
59 pages
Unit 2
No ratings yet
Unit 2
9 pages
ML UNIT 2
No ratings yet
ML UNIT 2
23 pages
NN unit_1
No ratings yet
NN unit_1
27 pages
Deep Leaning
No ratings yet
Deep Leaning
117 pages
Neural Networks (Review) : Piero P. Bonissone GE Corporate Research & Development
No ratings yet
Neural Networks (Review) : Piero P. Bonissone GE Corporate Research & Development
16 pages
Unit 1 NNDL
No ratings yet
Unit 1 NNDL
8 pages
Artifical Neural Network
No ratings yet
Artifical Neural Network
7 pages
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet
Neural Networks
From Everand
Neural Networks
Sasha Kurzweil
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
From Everand
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Fouad Sabry
No ratings yet
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
From Everand
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
Fouad Sabry
No ratings yet
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
Algorithms: Advances in Artificial Neural Networks - Methodological Development and Application
No ratings yet
Algorithms: Advances in Artificial Neural Networks - Methodological Development and Application
35 pages
Wang 2018
No ratings yet
Wang 2018
4 pages
English Discussion-Angeles Araith Fernandez Guillen
No ratings yet
English Discussion-Angeles Araith Fernandez Guillen
5 pages
Blue Print
No ratings yet
Blue Print
1 page
10-AI-Project-Cycle-MCQ
No ratings yet
10-AI-Project-Cycle-MCQ
10 pages
BE Computer Projects: 2021-22: Group No. Guide Project Title Domain Roll No Full Name
No ratings yet
BE Computer Projects: 2021-22: Group No. Guide Project Title Domain Roll No Full Name
4 pages
Machine Learning Basic
No ratings yet
Machine Learning Basic
20 pages
Seminar Report - Merged
No ratings yet
Seminar Report - Merged
27 pages
Computer Vision Pretrained Models: What Is Pre-Trained Model?
No ratings yet
Computer Vision Pretrained Models: What Is Pre-Trained Model?
10 pages
Age and Gender Detection
No ratings yet
Age and Gender Detection
13 pages
ETEG 425 Internal Exam Questions 2021
No ratings yet
ETEG 425 Internal Exam Questions 2021
2 pages
Ai
No ratings yet
Ai
3 pages
Deep Long-Tailed Learning A Survey
No ratings yet
Deep Long-Tailed Learning A Survey
20 pages
Ann Unit V
No ratings yet
Ann Unit V
30 pages
26 Weka
No ratings yet
26 Weka
5 pages
Mengistu Abebe
No ratings yet
Mengistu Abebe
137 pages
Visual Taxonomy Report
No ratings yet
Visual Taxonomy Report
10 pages
Advances in Machine Learning and Deep Learning Applications Towards Wafer Map Defect Recognition and Classification-A Review
No ratings yet
Advances in Machine Learning and Deep Learning Applications Towards Wafer Map Defect Recognition and Classification-A Review
33 pages
Artificial Intelligence Answer Key
No ratings yet
Artificial Intelligence Answer Key
1 page
Chapter #5 - Deep Learning
No ratings yet
Chapter #5 - Deep Learning
34 pages
Unit 2 (Second Order Methods)
No ratings yet
Unit 2 (Second Order Methods)
9 pages
DL Modules
No ratings yet
DL Modules
1 page
Lecture7 8 - Diffusion - Model 1 78 1 66
No ratings yet
Lecture7 8 - Diffusion - Model 1 78 1 66
66 pages
Start Here With Machine Learning
No ratings yet
Start Here With Machine Learning
25 pages
A Comprehensive Survey On Artificial Intelligence and Machine Learning Techniques
No ratings yet
A Comprehensive Survey On Artificial Intelligence and Machine Learning Techniques
7 pages
Esakov - Data Structures - An Advanced Approach Using C
100% (1)
Esakov - Data Structures - An Advanced Approach Using C
195 pages
Predicting_Football_Match_Result_Using_Fusion-based_Classification_Models
No ratings yet
Predicting_Football_Match_Result_Using_Fusion-based_Classification_Models
6 pages
(2110.06512) MedNet - Pre-Trained Convolutional Neural Network Model For The Medical Imaging Tasks
No ratings yet
(2110.06512) MedNet - Pre-Trained Convolutional Neural Network Model For The Medical Imaging Tasks
4 pages
DSI Guide - AI vs ML vs DL vs DS
No ratings yet
DSI Guide - AI vs ML vs DL vs DS
9 pages