0% found this document useful (0 votes)

360 views

Error Back Propagation Algorithm

The error backpropagation algorithm was developed in the 1980s to train multilayer neural networks. It is a gradient descent algorithm that minimizes the error between the network's output and the desired output by propagating error backwards from the output layer through the network. Weights are adjusted in proportion to the error signal and the input activation at each node. This allows hidden layers to be trained as well, enabling complex patterns to be learned. Proper initialization of weights and adjustment of the learning rate are important for ensuring convergence to an optimal solution.

Uploaded by

karan26121989

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

360 views

Error Back Propagation Algorithm

Uploaded by

karan26121989

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

ERROR BACKPROPAGATION ALGORITHM

Why Error Back Propagation Algorithm is required?

Lack of suitable training methods for multilayer perceptrons (MLP)s led to a waning of
interest in NN in 1960s and 1970s. This was changed by the reformulation of the
backPropagation training method for MLPs in the mid-1980s by Rumelhart et al.
Backpropagation was created by generalizing the Widrow-Hoff learning rule to multiple-
layer networks and nonlinear differentiable transfer functions. Standard
backpropagation is a gradient descent algorithm, as is the Widrow-Hoff learning rule, in
which the network weights are moved along the negative of the gradient of the
performance function. The term backpropagation refers to the manner in which the
gradient is computed for nonlinear multilayer networks.

As in simple cases of the delta learning rule training studied before, input patterns are
submitted during the back-propagation training sequentially. If a pattern is submitted
and its classification or association is determined to be erroneous, the synaptic weights
as well as the thresholds are adjusted so that the current least mean square classification
error is reduced. The input l output mapping, comparison of target and actual values,
and adjustment, if needed, continue until all mapping examples from the training set are
learned within an acceptable overall error. Usually, mapping error is cumulative and
computed over the full training set.

During the association or classification phase, the trained neural network itself operates
in a feedforward manner. However, the weight adjustments enforced by the learning
rules propagate exactly backward from the output layer through the so-called "hidden
layers" toward the input layer.
The input and output values of the network are denoted y j and o k , respectively. We thus,
denote yj, for j = 1, 2, . . . , J, and ok, for k = 1, 2, . . . , K, as signal values at the j'th
column of nodes, and k'th column of nodes, respectively. As before, the weight w kj
connects the output of the j'th neuron with the input to the k'th neuron.

The activation function net k of layer k is expressed as

Eqn. 1
The error expression generalized to include all squared errors at the outputs k=1,2,3…K

Eqn:2

Where p is a specific pattern and p=1 2……P

Delta learning rule can be formally derived for a multiperceptron layer. Assumptions
made are
1. gradient descent search is performed to reduce the error Ep through adjustments
of weights
2. threshold values are adjustable with other weights and no distinction is made
between threshold and weights during learning

3. Fixed input of value during both the training and recall phases
Minimization of error requires the weight changes to be in the negative gradient
direction. Individual weight adjustments are computed as follows

Eqn:3

Error E is defined in Eqn:2.

Now for each node in layer k where k=1,2,….K
Eqn:4

And the corresponding neuron output is given by

Eqn:5

Eqn:6

Eqn:7

Since

Eqn:8

Substituting Eqn 8, Eqn 6 in Eqn 7 we get

Eqn:9

The weight adjustment formula of Eqn 3 can accordingly be rewritten as

Eqn: 10

Eqn 10 represents the general formula for delta training/learning weight adjustments for
a single-layer network. It also follows that the adjustments of weight wkj is proportional
to the input activation yj, and to the error signal value at the kth neuron’s output.
The delta value needs to be explicitly computed for specifically chosen activation
functions.
Eqn: 11

Thus we have from equation 6

Eqn: 12

Denoting the second term in the above equation as a derivative of activation function

Eqn: 13

And

Eqn: 14
And rewriting eqn 12 we have

Eqn: 15

Eqn 15 shows that the error signal term depicts the local error (d k -o k ) at the output of the
k’th neuron scaled by the multiplicative factor f’ k (net k ).
The final formula for the weight adjustment of the single-layer network can be obtained
from Eqn 10 as
Eqn: 16
Eqn 16 is identical to the delta training rule. The updated weight values become
Eqn: 17

Delta Training rules for unipolar continuous activation function:

Eqn: 18

Eqn: 19

or
Eqn: 20

Therefore the delta value for unipolar activation function becomes

Eqn: 21

Delta Training rules for bipolar continuous activation function:

The activation function in the case of bipolar continuous activation function is given by
We obtain

An useful identity can be applied here

Verification of identity
Letting o=f(net)

LHS=RHS
The delta value for a bipolar continuous activation function is given by

Summarzing the updated weights are given by

The updated weights under the delta training rule for the single-layer network can be
expressed using the vector notation
where the error signal δo is defined as a column vector consisting of the individual error
signal terms

Generalized Delta Learning Rule

The negative gradient neurons for the hidden neurons is given by

There are two modes of updation of weights
1. Batch mode
2. Incremental mode

When the weights are being changed immediately after a training pattern is presented
then it is called as incremental approach.
When the weights are changed only after all the training patterns are presented then it is
called as batch mode. This mode requires additional local storage for each connection to
maintain the immediate weight changes.

The BP learning algorithm is an example of optimization problem. [Note:- an

optimization problem is the problem of finding the best solution from all feasible
solutions]. The essence of the error back-propagation algorithm is the evaluation of the
contribution of each particular weight to the output error. There are many difficulties
that arise in the implementation of the algorithm. One of the problems is that the error
minimization procedure may produce only a local minimum of the error function.
The learning is successful if it is well below the acceptable Erms value. Erms (Root Mean
Square Normalized Error) and is given by the following formula

Where P=number of training patterns K=number of outputs

But there are 2 such troughs in wl1 and wl2. So if the learning commences at point 2 we
may end up in a local minima instead of a global minima wg. Thus the trained network
will be unable to produce the desired performance in terms of its acceptable terminal
error. To ensure convergence to a satisfactory minimum the starting point should be
changed to 1.
The problem of local minima can however be avoided by inserting some form of
randomness to the training.

The convergence of EBPTA depends on various factors. To name a few we have

1. learning rate
2. Selection of initial weights
3. Momentum
4. Number of training data
5. Number of hidden layer nodes

Selection of Initial weights

The weights of the network to be trained are typically initialized at small random values.
The initialization strongly affects the ultimate solution.
• If all weights start out with equal weight values, and if the solution requires that
unequal weights be developed the network may not train properly.
• Weights can’t be very high because the sigmoidal activation function used may
get saturated from the beginning itself and the system may be stuck at a local
minima or at a very flat plateau at the starting point itself
• One method of choosing the weight wij is choosing it in the range of
 −3 3 
 oi
 oi  where oi is the number of processing elements j that feed-forward to
 
processing element i.

Steepness of activation function

λ is the steepness factor in the activation function. It was assumed to be 1 in the
computation of f’(net). f’(net) serves as a multiplying factor in the computation of
error signals. Thus the choice and shape of the activation function would strongly
affect the speed of network learning.
The derivation of activation function can be computed as follows
and it reaches a maximum of 1/2 λ when net=0.

Since the weights are adjusted in proportion to the f’(net), the weights that are
connected to the midrange are changed the most. Since the error signals are
computed with f’(net) as multiplier, the back propagated errors are large for only
those neurons which are in the steep thresholding mode.
The other feature which is apparent from the graph is that for fixed learning constant
all adjustments in weight are in proportion to steepness coefficient. This observation
leads to a conclusion that using activation functions with larger values of λ may yield
results with larger learning constant. So it is advisable to keep λ fixed at 1 and
control only the learning constant, rather than controlling both.

Effect of learning rate

Affects the convergence of BPA. A larger value of α speeds up the convergence but
might result in overshooting, while a smaller value of α results in overshooting and
vice versa. The learning constants should be chosen experimentally for each problem.
The range of learning constants are from 10-3to 10 have been reported throughout the
technical literature as successful for many computational back-propagation
experiments.
Based on the above observations some heuristics for improving the rate of
convergence are proposed.
Momentum Method
This method is used for accelerating the convergence of EBPTA. This method
involves supplementing the current weight adjustments with a fraction of most recent
weight adjustments. This is usually done according to the formula
where t and t-1 represents the current and most recent training step respectively and
a is user-selected positive momentum constant. This second term is called as
momentum term. For N steps using momentum method, the current weight is
expressed as

Typically a is choosen between 0.1 and 0.8.

What is the significance of this momentum term?

From the above figure it is seen that in the case of A’and A”the signs are same. So
combining the gradient component of adjacent step would result in convergence
speed-up. But in the case of B’ and B” the signs are different. This shows that if the
gradient component changes sign in two consecutive iterations, the learning rate
along this axis should be decreased.
This indicates that the momentum term typically helps to speed up convergence and
to achieve an efficient and more reliable learning profile.
Momentum term technique can be recommended for problems where convergence
occur too slowly or for cases when learning is difficult to achieve.
Network architecture versus data representation

Starting from a simple case of single hidden layer the number of input nodes are
determined by the dimension, size of the input vector to be classified, generalized or
associated with a certain output quantity.
The input vector size corresponds to the number of inputs to be classified, generalized
or associated with a certain output quantity.
In planar images, size of input vector is sometimes made equal to the total number of
pixels in the evaluated images.
The conditions for selecting the number of output neurons depends on the type of
neural processing. In the case of auto-associator which associates the distorted input
vector with undistorted class prototype then we have I=K.
In the case of classifier the number of output neurons are equal to the number of
classes.

Necessary number of Hidden neurons

The number of Hidden neurons depends on the dimension n of the input vector and on
the number of separable regions in n-dimensional input space.

Learning Rules
No ratings yet
Learning Rules
60 pages
Network Learning (Training)
No ratings yet
Network Learning (Training)
29 pages
9 Lesson 06
No ratings yet
9 Lesson 06
12 pages
A Multilayer Feed-Forward Neural Network
No ratings yet
A Multilayer Feed-Forward Neural Network
9 pages
Tutorial Backpropagation Neural Network
No ratings yet
Tutorial Backpropagation Neural Network
10 pages
A Modified Conjugate Gradient Formula For Back Propagation Neural Network Algorithm-Libre
No ratings yet
A Modified Conjugate Gradient Formula For Back Propagation Neural Network Algorithm-Libre
8 pages
Experiments On Learning by Back Propagation
No ratings yet
Experiments On Learning by Back Propagation
45 pages
Principles of Training Multi-Layer Neural Network Using Backpropagation
100% (1)
Principles of Training Multi-Layer Neural Network Using Backpropagation
15 pages
Multi-Layer Feed-Forward Networks
No ratings yet
Multi-Layer Feed-Forward Networks
6 pages
Experiment 7 AISC
No ratings yet
Experiment 7 AISC
5 pages
Backpropagation Neural Network
No ratings yet
Backpropagation Neural Network
9 pages
Model of Neuron in An ANN
No ratings yet
Model of Neuron in An ANN
12 pages
Artificial Intelligence, An Introductory Course
No ratings yet
Artificial Intelligence, An Introductory Course
8 pages
CT1 NNDL Question Bank
No ratings yet
CT1 NNDL Question Bank
8 pages
Ann R16 Unit 4 PDF
No ratings yet
Ann R16 Unit 4 PDF
16 pages
Stopping Criteria PDF
No ratings yet
Stopping Criteria PDF
4 pages
NN3 PDF
No ratings yet
NN3 PDF
7 pages
Multiple-Layer Networks Backpropagation Algorithms
No ratings yet
Multiple-Layer Networks Backpropagation Algorithms
25 pages
l6 - Generalized Delta Ruled
No ratings yet
l6 - Generalized Delta Ruled
16 pages
Back Propagation
No ratings yet
Back Propagation
20 pages
unit-4
No ratings yet
unit-4
28 pages
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-09-07 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-09-07 Reference-Material-I
7 pages
Multi Layer Feed-Forward Network Learning
No ratings yet
Multi Layer Feed-Forward Network Learning
5 pages
6.1-Fundamentals of Artificial Neural Networks
No ratings yet
6.1-Fundamentals of Artificial Neural Networks
12 pages
Ia Davma Unidad 2
No ratings yet
Ia Davma Unidad 2
113 pages
B - Principles of Training BP
No ratings yet
B - Principles of Training BP
11 pages
Networks With Threshold Activation Functions: Navigation
No ratings yet
Networks With Threshold Activation Functions: Navigation
6 pages
The Categories of Neural Network Learning Rules
No ratings yet
The Categories of Neural Network Learning Rules
7 pages
Least Mean Square (LMS) Algorithm: 3.1 Spatial Filtering
No ratings yet
Least Mean Square (LMS) Algorithm: 3.1 Spatial Filtering
16 pages
Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
No ratings yet
Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
8 pages
RBFN and TDNN
No ratings yet
RBFN and TDNN
42 pages
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
No ratings yet
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
20 pages
The Multilayer Perceptron
No ratings yet
The Multilayer Perceptron
11 pages
Wa0007.
No ratings yet
Wa0007.
4 pages
Learning Rules For Multilayer Feedforward Neural Networks
No ratings yet
Learning Rules For Multilayer Feedforward Neural Networks
19 pages
XOR Problem Demonstration Using MATLAB
0% (1)
XOR Problem Demonstration Using MATLAB
19 pages
20-Delta Rule-02-09-2024
No ratings yet
20-Delta Rule-02-09-2024
3 pages
A Review of Artificial Neural Network (ANN)
No ratings yet
A Review of Artificial Neural Network (ANN)
5 pages
Clase 3 - Redes Neuronales - Entrenamiento y Aplicaciones
No ratings yet
Clase 3 - Redes Neuronales - Entrenamiento y Aplicaciones
9 pages
Back-Propagation Algorithm of CHBPN Code
No ratings yet
Back-Propagation Algorithm of CHBPN Code
10 pages
Additional Topics
No ratings yet
Additional Topics
21 pages
Supervised Training Via Error Backpropagation: Derivations: 4.1 A Closer Look at The Supervised Training Problem
No ratings yet
Supervised Training Via Error Backpropagation: Derivations: 4.1 A Closer Look at The Supervised Training Problem
32 pages
Learning Rules in ANN
No ratings yet
Learning Rules in ANN
11 pages
Assignment Neural Networks
No ratings yet
Assignment Neural Networks
7 pages
Lec 15 MLP Cont
No ratings yet
Lec 15 MLP Cont
34 pages
Fast Training of Multilayer Perceptrons
No ratings yet
Fast Training of Multilayer Perceptrons
15 pages
Lect3 UWA PDF
No ratings yet
Lect3 UWA PDF
73 pages
Supervised Learning: Csm10: BACKPROPAGATION: An Example of
No ratings yet
Supervised Learning: Csm10: BACKPROPAGATION: An Example of
6 pages
The Multilayer Perceptron
No ratings yet
The Multilayer Perceptron
11 pages
1 Hassoun Chap3 Perceptron
No ratings yet
1 Hassoun Chap3 Perceptron
10 pages
A Study of Neural Network Algorithms: Namrata Aneja
No ratings yet
A Study of Neural Network Algorithms: Namrata Aneja
3 pages
Neural Networks: Single Neurons (Continued) : G. Extension of The Delta Rule: Smooth F (Z)
No ratings yet
Neural Networks: Single Neurons (Continued) : G. Extension of The Delta Rule: Smooth F (Z)
5 pages
Unit 3
100% (1)
Unit 3
11 pages
Week 3
No ratings yet
Week 3
15 pages
ch6 Perceptron MLP PDF
No ratings yet
ch6 Perceptron MLP PDF
31 pages
Backpropagation and Resilient Propagation
No ratings yet
Backpropagation and Resilient Propagation
6 pages
Back Propagation Learning Algorithm
No ratings yet
Back Propagation Learning Algorithm
15 pages
Multilayer Perceptron Neural Network
No ratings yet
Multilayer Perceptron Neural Network
8 pages
Introduction To Neural Networks: John Paxton Montana State University Summer 2003
No ratings yet
Introduction To Neural Networks: John Paxton Montana State University Summer 2003
24 pages
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
Artificial Intelligence Lab Record
No ratings yet
Artificial Intelligence Lab Record
20 pages
AS Computer Science 9618 P2
100% (1)
AS Computer Science 9618 P2
34 pages
Graph-1 Bfs
100% (1)
Graph-1 Bfs
40 pages
300 Embedded Programming Problems by Yashwanth Naidu T
No ratings yet
300 Embedded Programming Problems by Yashwanth Naidu T
85 pages
Heuristic Search
No ratings yet
Heuristic Search
29 pages
Laporan Praktikum Binary Search Tree
No ratings yet
Laporan Praktikum Binary Search Tree
8 pages
Sensitivity/Post Optimal Analysis
No ratings yet
Sensitivity/Post Optimal Analysis
27 pages
Game Playing AI: (Based On Earlier Lecture From Stephen Gould)
No ratings yet
Game Playing AI: (Based On Earlier Lecture From Stephen Gould)
28 pages
Longest Path in Matrix
No ratings yet
Longest Path in Matrix
11 pages
X Y X XY X Y X X 4 T(S) K: Column C Linear (Column C) Polynomial (Column C)
No ratings yet
X Y X XY X Y X X 4 T(S) K: Column C Linear (Column C) Polynomial (Column C)
3 pages
Network Communication - Embedded Lab Inlab Exam Lab Code: L41+L42 Name: Sannithi Sai Lokesh Reg No: 19bce2379
No ratings yet
Network Communication - Embedded Lab Inlab Exam Lab Code: L41+L42 Name: Sannithi Sai Lokesh Reg No: 19bce2379
8 pages
Python Exam Practice - Exercises
No ratings yet
Python Exam Practice - Exercises
6 pages
AD-3501-Deep learning_COURSE PLAN_Unit_wise
No ratings yet
AD-3501-Deep learning_COURSE PLAN_Unit_wise
5 pages
Solution:: Quiz 1
No ratings yet
Solution:: Quiz 1
16 pages
PPS QBANK 2 of 2023,2024 ptu ques papsr
No ratings yet
PPS QBANK 2 of 2023,2024 ptu ques papsr
2 pages
Mengenali Fungsi Logika "And" Melalui Pemrograman Perceptron Dengan Matlab
No ratings yet
Mengenali Fungsi Logika "And" Melalui Pemrograman Perceptron Dengan Matlab
8 pages
DS - Unit-1 (Amiraj) (VisionPapers - In)
No ratings yet
DS - Unit-1 (Amiraj) (VisionPapers - In)
22 pages
Laboratorium Pembelajaran Ilmu Komputer Fakultas Ilmu Komputer Universitas Brawijaya
No ratings yet
Laboratorium Pembelajaran Ilmu Komputer Fakultas Ilmu Komputer Universitas Brawijaya
11 pages
2 - Programming and Data Structures PDF
No ratings yet
2 - Programming and Data Structures PDF
224 pages
Objectives: Data Structures: A Pseudocode Approach With C
No ratings yet
Objectives: Data Structures: A Pseudocode Approach With C
17 pages
Huffman Coding, RLE, LZW
No ratings yet
Huffman Coding, RLE, LZW
41 pages
2D Segment - Quad Tree Explanation With C++ - Stack Overflow
No ratings yet
2D Segment - Quad Tree Explanation With C++ - Stack Overflow
4 pages
SCSA3015 Deep Learning Quiz For IV Year (Batch 2019 - 2023)
No ratings yet
SCSA3015 Deep Learning Quiz For IV Year (Batch 2019 - 2023)
15 pages
LU Decomposition Method
No ratings yet
LU Decomposition Method
5 pages
03 Backtrace For Computing Alignments 5-55
No ratings yet
03 Backtrace For Computing Alignments 5-55
3 pages
Java Programs XII 802
No ratings yet
Java Programs XII 802
25 pages
100 Top Data Structures and ALGORITHMS Multiple Choice Questions and Answers
No ratings yet
100 Top Data Structures and ALGORITHMS Multiple Choice Questions and Answers
30 pages
Fminunc
No ratings yet
Fminunc
9 pages
Assignment 3: COMP 250 Fall 2020
No ratings yet
Assignment 3: COMP 250 Fall 2020
9 pages
Artificial Intelligence: Unit - I
No ratings yet
Artificial Intelligence: Unit - I
30 pages

Error Back Propagation Algorithm

Uploaded by

Error Back Propagation Algorithm

Uploaded by

ERROR BACKPROPAGATION ALGORITHM

Why Error Back Propagation Algorithm is required?

The activation function net k of layer k is expressed as

Where p is a specific pattern and p=1 2……P

Error E is defined in Eqn:2.

And the corresponding neuron output is given by

Substituting Eqn 8, Eqn 6 in Eqn 7 we get

The weight adjustment formula of Eqn 3 can accordingly be rewritten as

Thus we have from equation 6

Delta Training rules for unipolar continuous activation function:

Therefore the delta value for unipolar activation function becomes

Delta Training rules for bipolar continuous activation function:

An useful identity can be applied here

Summarzing the updated weights are given by

Generalized Delta Learning Rule

The negative gradient neurons for the hidden neurons is given by

The BP learning algorithm is an example of optimization problem. [Note:- an

Where P=number of training patterns K=number of outputs

The convergence of EBPTA depends on various factors. To name a few we have

Selection of Initial weights

Steepness of activation function

Effect of learning rate

Typically a is choosen between 0.1 and 0.8.

What is the significance of this momentum term?

Necessary number of Hidden neurons

You might also like