0% found this document useful (0 votes)

158 views

Understanding Backpropagation Algorithm - Towards Data Science

The document provides an in-depth explanation of the backpropagation algorithm, which is used to train neural networks. It defines a simple 4-layer neural network and walks through the calculations for forward propagation and backpropagation. Backpropagation works by computing the gradient of the cost function with respect to the network's parameters (weights and biases) using the chain rule, and then adjusting the parameters in the direction that minimizes the cost. This allows the network to learn by reducing its errors. The document includes equations for forward propagation, cost calculation, computing gradients, and parameter optimization during training.

Uploaded by

Kashaf Bakali

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

158 views

Understanding Backpropagation Algorithm - Towards Data Science

Uploaded by

Kashaf Bakali

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Understanding Backpropagation

Algorithm
Learn the nuts and bolts of a neural network’s most important
ingredient

Simeon Kostadinov
Aug 8 · 8 min read

“A man is running on a highway” — photo by Andrea Leopardi on Unsplash

Backpropagation algorithm is probably the most fundamental building block in a

neural network. It was first introduced in 1960s and almost 30 years later (1989)
popularized by Rumelhart, Hinton and Williams in a paper called “Learning
representations by back-propagating errors”.
The algorithm is used to effectively train a neural network through a method
called chain rule. In simple terms, after each forward pass through a network,
backpropagation performs a backward pass while adjusting the model’s parameters
(weights and biases).

In this article, I would like to go over the mathematical process of training and
optimizing a simple 4-layer neural network. I believe this would help the reader
understand how backpropagation works as well as realize its importance.

Define the neural network model

The 4-layer neural network consists of 4 neurons for the input layer, 4 neurons for the
hidden layers and 1 neuron for the output layer.

Simple 4-layer neural network illustration

Input layer
The neurons, colored in purple, represent the input data. These can be as simple as
scalars or more complex like vectors or multidimensional matrices.

Equation for input x_i

The first set of activations (a) are equal to the input values. NB: “activation” is the
neuron’s value after applying an activation function. See below.
Hidden layers
The final values at the hidden neurons, colored in green, are computed using z^l —
weighted inputs in layer l, and a^l— activations in layer l. For layer 2 and 3 the
equations are:

l=2

Equations for z² and a²

l=3

Equations for z³ and a³

W² and W³ are the weights in layer 2 and 3 while b² and b³ are the biases in those
layers.

Activations a² and a³ are computed using an activation function f. Typically, this

function f is non-linear (e.g. sigmoid, ReLU, tanh) and allows the network to learn
complex patterns in data. We won’t go over the details of how activation functions
work, but, if interested, I strongly recommend reading this great article.

Looking carefully, you can see that all of x, z², a², z³, a³, W¹, W², b¹ and b² are missing
their subscripts presented in the 4-layer network illustration above. The reason is that
we have combined all parameter values in matrices, grouped by layers. This is the
standard way of working with neural networks and one should be comfortable with the
calculations. However, I will go over the equations to clear out any confusion.

Let’s pick layer 2 and its parameters as an example. The same operations can be applied
to any layer in the network.
W¹ is a weight matrix of shape (n, m) where n is the number of output neurons
(neurons in the next layer) and m is the number of input neurons (neurons in the
previous layer). For us, n = 2 and m = 4.

Equation for W¹

NB: The first number in any weight’s subscript matches the index of the neuron in
the next layer (in our case this is the Hidden_2 layer) and the second number
matches the index of the neuron in previous layer (in our case this is the Input
layer).

x is the input vector of shape (m, 1) where m is the number of input neurons. For
us, m = 4.

Equation for x

b¹ is a bias vector of shape (n , 1) where n is the number of neurons in the current

layer. For us, n = 2.

Equation for b¹
Following the equation for z², we can use the above definitions of W¹, x and b¹ to derive
“Equation for z²”:

Equation for z²

Now carefully observe the neural network illustration from above.

Input and Hidden_1 layers

You will see that z² can be expressed using (z_1)² and (z_2)² where (z_1)² and (z_2)²
are the sums of the multiplication between every input x_i with the corresponding
weight (W_ij)¹.

This leads to the same “Equation for z²” and proofs that the matrix representations for
z², a², z³ and a³ are correct.

Output layer
The final part of a neural network is the output layer which produces the predicated
value. In our simple example, it is presented as a single neuron, colored in blue and
evaluated as follows:

Equation for output s

Again, we are using the matrix representation to simplify the equation. One can use the
above techniques to understand the underlying logic. Please leave any comments
below if you find yourself lost in the equations — I would love to help!

Forward propagation and evaluation

The equations above form network’s forward propagation. Here is a short overview:

Overview of forward propagation equations colored by layer

The final step in a forward pass is to evaluate the predicted output s against an
expected output y.

The output y is part of the training dataset (x, y) where x is the input (as we saw in the
previous section).

Evaluation between s and y happens through a cost function. This can be as simple as
MSE (mean squared error) or more complex like cross-entropy.

We name this cost function C and denote it as follows:

Equation for cost function C

were cost can be equal to MSE, cross-entropy or any other cost function.

Based on C’s value, the model “knows” how much to adjust its parameters in order to
get closer to the expected output y. This happens using the backpropagation algorithm.

Backpropagation and computing gradients

According to the paper from 1989, backpropagation:

repeatedly adjusts the weights of the connections in the network so as to minimize a

measure of the difference between the actual output vector of the net and the desired
output vector.

and

the ability to create useful new features distinguishes back-propagation from earlier,
simpler methods…

In other words, backpropagation aims to minimize the cost function by adjusting

network’s weights and biases. The level of adjustment is determined by the gradients
of the cost function with respect to those parameters.

One question may arise — why computing gradients?

To answer this, we first need to revisit some calculus terminology:

Gradient of a function C(x_1, x_2, …, x_m) in point x is a vector of the partial

derivatives of C in x.

Equation for derivative of C in x

The derivative of a function C measures the sensitivity to change of the function value
(output value) with respect to a change in its argument x (input value). In other words,
the derivative tells us the direction C is going.

The gradient shows how much the parameter x needs to change (in positive or negative
direction) to minimize C.

Compute those gradients happens using a technique called chain rule.

For a single weight (w_jk)^l, the gradient is:

Equations for derivative of C in a single weight (w_jk)^l

Similar set of equations can be applied to (b_j)^l:

Equations for derivative of C in a single bias (b_j)^l

The common part in both equations is often called “local gradient” and is expressed as
follows:

Equation for local gradient

The “local gradient” can easily be determined using the chain rule. I won’t go over the
process now but if you have any questions, please comment below.

. . .

The gradients allow us to optimize the model’s parameters:

Algorithm for optimizing weights and biases (also called “Gradient descent”)

Initial values of w and b are randomly chosen.

Epsilon (e) is the learning rate. It determines the gradient’s influence.

w and b are matrix representations of the weights and biases. Derivative of C in w
or b can be calculated using partial derivatives of C in the individual weights or
biases.

Termination condition is met once the cost function is minimized.

. . .

I would like to dedicate the final part of this section to a simple example in which we
will calculate the gradient of C with respect to a single weight (w_22)².

Let’s zoom in on the bottom part of the above neural network:

Visual representation of backpropagation in a neural network

Weight (w_22)² connects (a_2)² and (z_2)², so computing the gradient requires
applying the chain rule through (z_2)³ and (a_2)³:

Equation for derivative of C in (w_22)²

Calculating the final value of derivative of C in (a_2)³ requires knowledge of the

function C. Since C is dependent on (a_2)³, calculating the derivative should be fairly
straightforward.

I hope this example manages to throw some light on the mathematics behind
computing gradients. To further enhance your skills, I strongly recommend watching
Stanford’s NLP series where Richard Socher gives 4 great explanations of
backpropagation.

Final remarks
In this article, I went through a detailed explanation of how backpropagation works
under the hood using mathematical techniques like computing gradients, chain rule
etc. Knowing the nuts and bolts of this algorithm will fortify your neural networks
knowledge and make you feel comfortable to take on more complex models. Enjoy your
deep learning journey!

Thank you for the reading. Hope you enjoyed the article
🤩 and I wish you a great day!

Arti cial Intelligence Deep Learning Algorithms Mathematics Data Science

About Help Legal

ELLIPTIC GEOMETRY - Mondia
100% (2)
ELLIPTIC GEOMETRY - Mondia
30 pages
How To Build Your Own Neural Network From Scratch in
No ratings yet
How To Build Your Own Neural Network From Scratch in
6 pages
Backpropagation algorithm
No ratings yet
Backpropagation algorithm
12 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
nn2
No ratings yet
nn2
12 pages
Neural Networks
No ratings yet
Neural Networks
52 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
16 pages
NeuralNetworks
No ratings yet
NeuralNetworks
29 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
No ratings yet
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
6 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
17 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
9 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
Back-Propagation Is Very Simple. Who Made It Complicated
No ratings yet
Back-Propagation Is Very Simple. Who Made It Complicated
26 pages
EPS-DL-Handout3-Build ANN From Scratch Basics
No ratings yet
EPS-DL-Handout3-Build ANN From Scratch Basics
25 pages
NN Lecture Notes
No ratings yet
NN Lecture Notes
45 pages
Understanding and Coding Neural Networks From Scratch in Python and R
No ratings yet
Understanding and Coding Neural Networks From Scratch in Python and R
12 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
Neural Network
100% (1)
Neural Network
54 pages
Neural
No ratings yet
Neural
53 pages
Unit 4
No ratings yet
Unit 4
16 pages
A Step by Step Backpropagation Example
No ratings yet
A Step by Step Backpropagation Example
9 pages
Understanding Neural Networks
No ratings yet
Understanding Neural Networks
12 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
32 pages
Back Propagation Algorithm
No ratings yet
Back Propagation Algorithm
13 pages
Backpropagation Example
No ratings yet
Backpropagation Example
9 pages
A Step by Step Backpropagation
No ratings yet
A Step by Step Backpropagation
8 pages
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
Step by Step Back Propagation
No ratings yet
Step by Step Back Propagation
8 pages
Understanding and Coding Neural Networks From Scratch in Python and R
100% (1)
Understanding and Coding Neural Networks From Scratch in Python and R
15 pages
DS303_NN
No ratings yet
DS303_NN
20 pages
Intro To DL
No ratings yet
Intro To DL
28 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
No ratings yet
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
31 pages
Understanding and Creating Neural Networks
No ratings yet
Understanding and Creating Neural Networks
69 pages
Neural Networks: Derivation: 1 Model
No ratings yet
Neural Networks: Derivation: 1 Model
9 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
Neural Networks
No ratings yet
Neural Networks
27 pages
ANN_example
No ratings yet
ANN_example
10 pages
neural network 2
No ratings yet
neural network 2
14 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Neural Networks Handout
No ratings yet
Neural Networks Handout
7 pages
Mind - How To Build A Neural Network (Part One)
No ratings yet
Mind - How To Build A Neural Network (Part One)
9 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
AI - W7L13
No ratings yet
AI - W7L13
46 pages
ANN research
No ratings yet
ANN research
18 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
Back propogation
No ratings yet
Back propogation
9 pages
Learning Algorithm
No ratings yet
Learning Algorithm
100 pages
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
No ratings yet
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
27 pages
Curs3site PDF
No ratings yet
Curs3site PDF
38 pages
1.4+Computing+Gradient+Using+Backpropagation
No ratings yet
1.4+Computing+Gradient+Using+Backpropagation
5 pages
DL Unit 2.3
No ratings yet
DL Unit 2.3
16 pages
Ann
No ratings yet
Ann
31 pages
Single Neuron Model
No ratings yet
Single Neuron Model
16 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Intelligent Waste Separator PDF
No ratings yet
Intelligent Waste Separator PDF
15 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
Introduction To Machine Learning Top-Down Approach - Towards Data Science
No ratings yet
Introduction To Machine Learning Top-Down Approach - Towards Data Science
6 pages
Turn Python Scripts Into Beautiful ML Tools - Towards Data Science PDF
No ratings yet
Turn Python Scripts Into Beautiful ML Tools - Towards Data Science PDF
14 pages
Turn Python Scripts Into Beautiful ML Tools - Towards Data Science
100% (1)
Turn Python Scripts Into Beautiful ML Tools - Towards Data Science
14 pages
Evaluated Kinetic Data For Combustion Modeling
No ratings yet
Evaluated Kinetic Data For Combustion Modeling
642 pages
Lesson Plan Framework Mathematics Secondary Pre Calc 11
No ratings yet
Lesson Plan Framework Mathematics Secondary Pre Calc 11
3 pages
Matrices and Its Definition
No ratings yet
Matrices and Its Definition
2 pages
Animation in 3d Space
No ratings yet
Animation in 3d Space
16 pages
Circles and Construction
No ratings yet
Circles and Construction
3 pages
Parabola
No ratings yet
Parabola
43 pages
Measure Theory Notes
No ratings yet
Measure Theory Notes
4 pages
I PU Model Paper 1 With Solution
No ratings yet
I PU Model Paper 1 With Solution
13 pages
Iit Jee Coordinate Geometry Circle Solved Examples
No ratings yet
Iit Jee Coordinate Geometry Circle Solved Examples
21 pages
Wv Nov 2024 Final v1
No ratings yet
Wv Nov 2024 Final v1
9 pages
Arithmetica Part-1 PDF
No ratings yet
Arithmetica Part-1 PDF
30 pages
Chapter 7. Complete Metric Spaces and Function Spaces
No ratings yet
Chapter 7. Complete Metric Spaces and Function Spaces
8 pages
(2022-S2) 02 Robot Kinematics Part 1 New
No ratings yet
(2022-S2) 02 Robot Kinematics Part 1 New
36 pages
Celtic Knots
100% (4)
Celtic Knots
199 pages
Kant and The Natural Numbers
No ratings yet
Kant and The Natural Numbers
21 pages
Download Complete Exercises and Problems in Mathematical Methods of Physics Giampaolo Cicogna PDF for All Chapters
100% (1)
Download Complete Exercises and Problems in Mathematical Methods of Physics Giampaolo Cicogna PDF for All Chapters
65 pages
Pattern C5 Vectors - SM025 - 2019 2020
No ratings yet
Pattern C5 Vectors - SM025 - 2019 2020
17 pages
Advanced Mathematical Techniques in Chemical Engineering Prof. S. de Department of Chemical Engineering Indian Institute of Technology, Kharagpur
No ratings yet
Advanced Mathematical Techniques in Chemical Engineering Prof. S. de Department of Chemical Engineering Indian Institute of Technology, Kharagpur
18 pages
Discrete-Time Simulation With Simulink: ECE4560: Digital Control Laboratory
No ratings yet
Discrete-Time Simulation With Simulink: ECE4560: Digital Control Laboratory
5 pages
6 Different Ways To Compensate For Missing Values in A Dataset
No ratings yet
6 Different Ways To Compensate For Missing Values in A Dataset
6 pages
Modmath Lesson2 Presentation
No ratings yet
Modmath Lesson2 Presentation
28 pages
15MA302 Discrete Mathematics PDF
100% (1)
15MA302 Discrete Mathematics PDF
3 pages
Analysis of 4-Parallel Radix-2 2 Feedforward FFT Architecture
No ratings yet
Analysis of 4-Parallel Radix-2 2 Feedforward FFT Architecture
42 pages
UDPG1673 - Tutorial 7 - 201601
No ratings yet
UDPG1673 - Tutorial 7 - 201601
4 pages
Action Plan in Numeracy
No ratings yet
Action Plan in Numeracy
1 page
Beyer Bommer 2006 - Relationships Between Median Values
No ratings yet
Beyer Bommer 2006 - Relationships Between Median Values
11 pages
Aims Esp Lesson
100% (1)
Aims Esp Lesson
11 pages
Selection Quick
No ratings yet
Selection Quick
13 pages
CVEN 2300: Surveying LAB #7: Staking Out Points On A Horizontal Curve
No ratings yet
CVEN 2300: Surveying LAB #7: Staking Out Points On A Horizontal Curve
3 pages
Lecture 11 12
No ratings yet
Lecture 11 12
9 pages