Session NN

#LifeKoKaroLift
Post-Graduate Diploma in
ML/AI
12-07-2020
Course : Machine Learning
Lecture On : Neural
Edit Master Network - Intro
text styles
12-07-2020 2
Session - Agenda
➢ Introduction, Industry Use-cases
➢ Perceptron
➢ Feed Forward
➢ Backpropagation
➢ Assignment - Problem Statement
➢ Doubt Resolution
12-07-2020 3
Data Science Certification
In which of the following applications can we use deep learning to solve the
problem?
A) Protein structure prediction

B) Prediction of chemical reactions
C) Detection of exotic particles
D) All of these
12-07-2020 4
In which of the following applications can we use deep learning to solve the
problem?
A) Protein structure prediction

B) Prediction of chemical reactions
C) Detection of exotic particles
D) All of these
We can use neural network to approximate

any function so it can theoretically be used
to solve any problem.
12-07-2020 5
Neural Networks in Business
Voice Assistants Recommendation Engines
The Terminator
Google
• Search
• Translate
• maps
Facial Recognition
Self-driving cars
Image
Understanding
Email filtering
Customer Support Queries (and Chatbots)

https://www.analyticsvidhya.com/blog/2018/05/24-ultimate-data-science-projects-to-boost- Catching Fraud in Banking
your-knowledge-and-skills/ Video Surveillance
12-07-2020 6
Intuition behind the Perceptron
The human brain consists of neurons or nerve cells

which transmit and process the information
received from our senses. Many such nerve cells
are arranged together in our brain to form a
network of nerves. These nerves pass electrical
impulses i.e the excitation from one neuron to the
other.
The dendrites receive the impulse from the

terminal button. Dendrites carry the impulse to
the nucleus Here , the electrical impulse is
processed and then passed on to the axon
12-07-2020 7
Construction of an Artificial Neuron
In this case , the neurons are created

artificially on a computer . Connecting
many such artificial neurons creates an
artificial neural network.
The data in the network flows through each neuron by a connection. Every
connection has a specific weight by which the flow of data is regulated.
12-07-2020 8
Poll
Click1to add Title
Assume a simple MLP model with 3 neurons and inputs= 1,2,3. The weights
to the input neurons are 4,5 and 6 respectively. Assume the activation
function is a linear constant value of 3. What will be the output ?
A) 32
B) 643
C) 96
D) 48
Practice in teams of 4 students
Industry expert mentoring to learn better
Get personalised feedback for improvements
12-07-2020 Footer 11 9
Poll
Click1to add Title
Assume a simple MLP model with 3 neurons and inputs= 1,2,3. The weights
to the input neurons are 4,5 and 6 respectively. Assume the activation
function is a linear constant value of 3. What will be the output ?
A) 32
B) 643
C) 96
D) 48
The output will be calculated Industry
as 3(1*4+2*5+6*3) = learn
expert mentoring to 96 better
12-07-2020 Footer 11 10
But how does it work?
The perceptron works on these simple steps
All the inputs x are multiplied with Add all the multiplied values and Apply that weighted sum to the
their weights w. Let’s call it k. call them Weighted Sum correct Activation Function.
12-07-2020 11
Why do we need Weights and Bias?
Weights shows the strength of

the particular node.
A bias value allows you to shift

the activation function curve up
or down.
12-07-2020 12
Poll
Click2to add Title
Statement 1: It is possible to train a network well by initializing all the weights as 0
Statement 2: It is possible to train a network well by initializing biases as 0
Which of the statements given above is true?
A) Statement 1 is true while Statement 2 is false

B) Statement 2 is true while statement 1 is false
C) Both statements are true
D) Both statements are false
Even if all the biases are zero, there is a chance

Practice that
in teams of neural network may learn. On the
4 students
other hand, if all the weights are Industry
zero; the neural
expert neuraltonetwork
mentoring may never learn to
learn better
perform the task. Get personalised feedback for improvements
12-07-2020 Footer 11 13
Poll
Click2to add Title
Statement 1: It is possible to train a network well by initializing all the weights as 0
Statement 2: It is possible to train a network well by initializing biases as 0
Which of the statements given above is true?
A) Statement 1 is true while Statement 2 is false

B) Statement 2 is true while statement 1 is false
C) Both statements are true
D) Both statements are false
Even if all the biases are zero, there is a chance that neural network may learn. On the
other hand, if all the weights are zero; the neural neural network may never learn to
perform the task.
12-07-2020 Footer 11 14
Why we use Activation functions
Activation functions are really important

for a Artificial Neural Network to learn and
make sense Non-linear complex functional
mappings between the inputs and response
variable.
They introduce non-linear properties to our

Network. Their main purpose is to convert
a input signal of a node in a A-NN to an
output signal. That output signal now is
used as a input in the next layer in the
stack.
https://ai.stackexchange.com/questions/5493/what-is-the-purpose-of-an-activation-function-in-neural-networks
12-07-2020 15
why can’t we do it without activating the input signal
A linear equation is easy to solve but they

are limited in their complexity and have
less power to learn complex functional
mappings from data. A Neural Network
without Activation function would simply
be a Linear regression Model, which
has limited power and does not performs
good most of the times.
That is why we use Artificial Neural network techniques such as Deep learning to make sense of something
complicated ,high dimensional, non-linear -big datasets, where the model has lots and lots of hidden
layers in between and has a very complicated architecture which helps us to make sense and extract
knowledge form such complicated big datasets.
https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6
12-07-2020 16
which one is better to use ?
Nowadays we should use ReLu which

should only be applied to the hidden layers.
And if your model suffers form dead
neurons during training we should
use leaky ReLu or Maxout function.
It’s just that Sigmoid and Tanh should not

be used nowadays due to the vanishing
Gradient Problem which causes a lots of
problems to train,degrades the accuracy
and performance of a deep Neural
Network Model.
12-07-2020 17
Poll
Click3to add Title
Which of following activation function can’t be used at output layer to
classify an image ?
A) sigmoid
B) Tanh
C) ReLU
D) If(x>5,1,0)
E) None of the above
12-07-2020 Footer 11 18
Poll
Click3to add Title
Which of following activation function can’t be used at output layer to
classify an image ?
A) sigmoid
B) Tanh
C) ReLU
D) If(x>5,1,0)
E) None of the above
Solution: C Industry expert mentoring to learn better
ReLU gives continuous outputGet inpersonalised
range 0 tofeedback for improvements
infinity. But in output layer, we
want a finite range of values. So option C is correct.
12-07-2020 Footer 11 19
Multiple Neurons
▪ The average human brain has 100 billion neurons
▪ Allowing the brain to make extremely complicated

decisions
▪ Each neuron is connected to 10,000 neurons
▪ Which together create a complicated network of

about 1000 trillion connections
12-07-2020 20
Let’s Add a Bit of Complexity Now
3 types of layers in neural network:
•Input layer — It is used to pass in our input(an image, text

or any suitable type of data for NN).
•Hidden Layer — These are the layers in between the input

and output layers. These layers are responsible for learning the
mapping between input and output. (i.e. in the dog and cat gif
above, the hidden layers are the ones responsible to learn that
the dog picture is linked to the name dog, and it does this
through a series of matrix multiplications and mathematical
transformations to learn these mappings).
•Output Layer — This layer is responsible for giving us the

output of the NN given our inputs.
12-07-2020 21
Feedforward
the output from one layer is used as input to

the next layer. Such networks are called
feedforward neural networks.
12-07-2020 22
Example
To create a classifier we want the output to look like as set of probabilities
By feeding the input through the

network we get a set of scores Y
called “logits” but in here they
cannot be used as probabilities
• They are not within the range [0,1]

• They do not add up to 1.
12-07-2020 23
Example
To convert logits into probabilities we can use the softmax function
This guarantees all values are between [0,z] and they add up to 1.
12-07-2020 24
Poll
Click4to add Title
The number of nodes in the input & output layers are 10 and there are 3
hidden layer having 5 nodes each what is the total number of network
parameters
A) 150
B) 170
C) 220
D) It is an arbitrary value
12-07-2020 Footer 11 25
Poll
Click4to add Title
The number of nodes in the input & output layers are 10 and there are 3
hidden layer having 5 nodes each what is the total number of network
parameters
A) 150
B) 170
C) 220
D) It is an arbitrary value
Solution: B Industry expert mentoring to learn better
(50 + 25+25+50) + 20. Get personalised feedback for improvements
12-07-2020 Footer 11 26
Cost Function
A Cost Function/Loss Function evaluates the performance of our

Machine Learning Algorithm. The Loss function computes the error
for a single training example while the Cost function is the average of
the loss functions for all the training examples.
The goal of any Learning Algorithm is to minimize the Cost Function.
lower error between the actual and the predicted values signifies that the algorithm has done a
good job in learning.
A common measure of the discrepancy between the two values is the ”Cross-entropy”
12-07-2020 27
Backward Propagation
We know that propagation is used to calculate the gradient of the loss function with respect
to the parameters.
12-07-2020 28
Backward Propagation
dZ — Gradient of the cost with respect to

the linear output (of current layer l).
12-07-2020 29
Training
Training refers to the task of finding

the optimal combination of weights
and biases to minimize the total loss
The optimization is done using the
familiar gradient descent algorithm.
In gradient descent, the parameter
being optimized is iterated in the
direction of reducing cost according
to the following rule
12-07-2020 30
Gradient Descent
Starting at the top of the mountain, we take our first
step downhill in the direction specified by the negative
gradient. Next we recalculate the negative gradient
(passing in the coordinates of our new point) and take
another step in the direction it specifies. We continue
this process iteratively until we get to the bottom of our
graph, or to a point where we can no longer move
downhill–a local minimum.
Gradient descent is an optimization algorithm used to

minimize some function by iteratively moving in the
direction of steepest descent as defined by the
negative of the gradient
12-07-2020 31
#LifeKoKaroLift
Thank You!
Introduction to Reinforcement learning by Sutton and barto

https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf 21
12-07-2020

Session NN

Uploaded by

Copyright:

Available Formats

Session NN

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session NN

Uploaded by

Copyright:

Available Formats

#LifeKoKaroLift

A) Protein structure prediction

A) Protein structure prediction

We can use neural network to approximate

Customer Support Queries (and Chatbots)

The human brain consists of neurons or nerve cells

The dendrites receive the impulse from the

In this case , the neurons are created

The perceptron works on these simple steps

Weights shows the strength of

A bias value allows you to shift

A) Statement 1 is true while Statement 2 is false

Even if all the biases are zero, there is a chance

A) Statement 1 is true while Statement 2 is false

Activation functions are really important

They introduce non-linear properties to our

A linear equation is easy to solve but they

Nowadays we should use ReLu which

It’s just that Sigmoid and Tanh should not

▪ The average human brain has 100 billion neurons

▪ Allowing the brain to make extremely complicated

▪ Each neuron is connected to 10,000 neurons

▪ Which together create a complicated network of

3 types of layers in neural network:

•Input layer — It is used to pass in our input(an image, text

•Hidden Layer — These are the layers in between the input

•Output Layer — This layer is responsible for giving us the

the output from one layer is used as input to

To create a classifier we want the output to look like as set of probabilities

By feeding the input through the

• They are not within the range [0,1]

To convert logits into probabilities we can use the softmax function

A Cost Function/Loss Function evaluates the performance of our

dZ — Gradient of the cost with respect to

Training refers to the task of finding

Gradient descent is an optimization algorithm used to

Introduction to Reinforcement learning by Sutton and barto

You might also like