Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Session NN

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

#LifeKoKaroLift

Post-Graduate Diploma in
ML/AI

12-07-2020
Course : Machine Learning
Lecture On : Neural
Edit Master Network - Intro
text styles

12-07-2020 2
Session - Agenda
➢ Introduction, Industry Use-cases
➢ Perceptron
➢ Feed Forward
➢ Backpropagation
➢ Assignment - Problem Statement
➢ Doubt Resolution

12-07-2020 3
Data Science Certification
In which of the following applications can we use deep learning to solve the
problem?

A) Protein structure prediction


B) Prediction of chemical reactions
C) Detection of exotic particles
D) All of these

12-07-2020 4
In which of the following applications can we use deep learning to solve the
problem?

A) Protein structure prediction


B) Prediction of chemical reactions
C) Detection of exotic particles
D) All of these

We can use neural network to approximate


any function so it can theoretically be used
to solve any problem.

12-07-2020 5
Neural Networks in Business
Voice Assistants Recommendation Engines
The Terminator

Google
• Search
• Translate
• maps
Facial Recognition
Self-driving cars

Image
Understanding

Email filtering

Customer Support Queries (and Chatbots)


https://www.analyticsvidhya.com/blog/2018/05/24-ultimate-data-science-projects-to-boost- Catching Fraud in Banking
your-knowledge-and-skills/ Video Surveillance
12-07-2020 6
Intuition behind the Perceptron

The human brain consists of neurons or nerve cells


which transmit and process the information
received from our senses. Many such nerve cells
are arranged together in our brain to form a
network of nerves. These nerves pass electrical
impulses i.e the excitation from one neuron to the
other.

The dendrites receive the impulse from the


terminal button. Dendrites carry the impulse to
the nucleus Here , the electrical impulse is
processed and then passed on to the axon

12-07-2020 7
Construction of an Artificial Neuron

In this case , the neurons are created


artificially on a computer . Connecting
many such artificial neurons creates an
artificial neural network.

The data in the network flows through each neuron by a connection. Every
connection has a specific weight by which the flow of data is regulated.

12-07-2020 8
Poll
Click1to add Title
Assume a simple MLP model with 3 neurons and inputs= 1,2,3. The weights
to the input neurons are 4,5 and 6 respectively. Assume the activation
function is a linear constant value of 3. What will be the output ?

A) 32
B) 643
C) 96
D) 48
Practice in teams of 4 students
Industry expert mentoring to learn better
Get personalised feedback for improvements

12-07-2020 Footer 11 9
Poll
Click1to add Title
Assume a simple MLP model with 3 neurons and inputs= 1,2,3. The weights
to the input neurons are 4,5 and 6 respectively. Assume the activation
function is a linear constant value of 3. What will be the output ?

A) 32
B) 643
C) 96
D) 48
Practice in teams of 4 students
The output will be calculated Industry
as 3(1*4+2*5+6*3) = learn
expert mentoring to 96 better
Get personalised feedback for improvements

12-07-2020 Footer 11 10
But how does it work?

The perceptron works on these simple steps

All the inputs x are multiplied with Add all the multiplied values and Apply that weighted sum to the
their weights w. Let’s call it k. call them Weighted Sum correct Activation Function.

12-07-2020 11
Why do we need Weights and Bias?

Weights shows the strength of


the particular node.

A bias value allows you to shift


the activation function curve up
or down.

12-07-2020 12
Poll
Click2to add Title
Statement 1: It is possible to train a network well by initializing all the weights as 0
Statement 2: It is possible to train a network well by initializing biases as 0
Which of the statements given above is true?

A) Statement 1 is true while Statement 2 is false


B) Statement 2 is true while statement 1 is false
C) Both statements are true
D) Both statements are false

Even if all the biases are zero, there is a chance


Practice that
in teams of neural network may learn. On the
4 students
other hand, if all the weights are Industry
zero; the neural
expert neuraltonetwork
mentoring may never learn to
learn better
perform the task. Get personalised feedback for improvements

12-07-2020 Footer 11 13
Poll
Click2to add Title
Statement 1: It is possible to train a network well by initializing all the weights as 0
Statement 2: It is possible to train a network well by initializing biases as 0
Which of the statements given above is true?

A) Statement 1 is true while Statement 2 is false


B) Statement 2 is true while statement 1 is false
C) Both statements are true
D) Both statements are false

Even if all the biases are zero, there is a chance that neural network may learn. On the
Practice in teams of 4 students
other hand, if all the weights are zero; the neural neural network may never learn to
Industry expert mentoring to learn better
perform the task.
Get personalised feedback for improvements

12-07-2020 Footer 11 14
Why we use Activation functions

Activation functions are really important


for a Artificial Neural Network to learn and
make sense Non-linear complex functional
mappings between the inputs and response
variable.

They introduce non-linear properties to our


Network. Their main purpose is to convert
a input signal of a node in a A-NN to an
output signal. That output signal now is
used as a input in the next layer in the
stack.

https://ai.stackexchange.com/questions/5493/what-is-the-purpose-of-an-activation-function-in-neural-networks

12-07-2020 15
why can’t we do it without activating the input signal

A linear equation is easy to solve but they


are limited in their complexity and have
less power to learn complex functional
mappings from data. A Neural Network
without Activation function would simply
be a Linear regression Model, which
has limited power and does not performs
good most of the times.

That is why we use Artificial Neural network techniques such as Deep learning to make sense of something
complicated ,high dimensional, non-linear -big datasets, where the model has lots and lots of hidden
layers in between and has a very complicated architecture which helps us to make sense and extract
knowledge form such complicated big datasets.

https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6

12-07-2020 16
which one is better to use ?

Nowadays we should use ReLu which


should only be applied to the hidden layers.
And if your model suffers form dead
neurons during training we should
use leaky ReLu or Maxout function.

It’s just that Sigmoid and Tanh should not


be used nowadays due to the vanishing
Gradient Problem which causes a lots of
problems to train,degrades the accuracy
and performance of a deep Neural
Network Model.

12-07-2020 17
Poll
Click3to add Title
Which of following activation function can’t be used at output layer to
classify an image ?

A) sigmoid
B) Tanh
C) ReLU
D) If(x>5,1,0)
E) None of the above
Practice in teams of 4 students
Industry expert mentoring to learn better
Get personalised feedback for improvements

12-07-2020 Footer 11 18
Poll
Click3to add Title
Which of following activation function can’t be used at output layer to
classify an image ?

A) sigmoid
B) Tanh
C) ReLU
D) If(x>5,1,0)
E) None of the above
Practice in teams of 4 students
Solution: C Industry expert mentoring to learn better
ReLU gives continuous outputGet inpersonalised
range 0 tofeedback for improvements
infinity. But in output layer, we
want a finite range of values. So option C is correct.
12-07-2020 Footer 11 19
Multiple Neurons

▪ The average human brain has 100 billion neurons

▪ Allowing the brain to make extremely complicated


decisions

▪ Each neuron is connected to 10,000 neurons

▪ Which together create a complicated network of


about 1000 trillion connections

12-07-2020 20
Let’s Add a Bit of Complexity Now

3 types of layers in neural network:

•Input layer — It is used to pass in our input(an image, text


or any suitable type of data for NN).

•Hidden Layer — These are the layers in between the input


and output layers. These layers are responsible for learning the
mapping between input and output. (i.e. in the dog and cat gif
above, the hidden layers are the ones responsible to learn that
the dog picture is linked to the name dog, and it does this
through a series of matrix multiplications and mathematical
transformations to learn these mappings).

•Output Layer — This layer is responsible for giving us the


output of the NN given our inputs.

12-07-2020 21
Feedforward

the output from one layer is used as input to


the next layer. Such networks are called
feedforward neural networks.

12-07-2020 22
Example

To create a classifier we want the output to look like as set of probabilities

By feeding the input through the


network we get a set of scores Y
called “logits” but in here they
cannot be used as probabilities

• They are not within the range [0,1]


• They do not add up to 1.

12-07-2020 23
Example

To convert logits into probabilities we can use the softmax function

This guarantees all values are between [0,z] and they add up to 1.

12-07-2020 24
Poll
Click4to add Title
The number of nodes in the input & output layers are 10 and there are 3
hidden layer having 5 nodes each what is the total number of network
parameters

A) 150
B) 170
C) 220
D) It is an arbitrary value
Practice in teams of 4 students
Industry expert mentoring to learn better
Get personalised feedback for improvements

12-07-2020 Footer 11 25
Poll
Click4to add Title
The number of nodes in the input & output layers are 10 and there are 3
hidden layer having 5 nodes each what is the total number of network
parameters

A) 150
B) 170
C) 220
D) It is an arbitrary value
Practice in teams of 4 students
Solution: B Industry expert mentoring to learn better
(50 + 25+25+50) + 20. Get personalised feedback for improvements

12-07-2020 Footer 11 26
Cost Function

A Cost Function/Loss Function evaluates the performance of our


Machine Learning Algorithm. The Loss function computes the error
for a single training example while the Cost function is the average of
the loss functions for all the training examples.
The goal of any Learning Algorithm is to minimize the Cost Function.

lower error between the actual and the predicted values signifies that the algorithm has done a
good job in learning.

A common measure of the discrepancy between the two values is the ”Cross-entropy”

12-07-2020 27
Backward Propagation

We know that propagation is used to calculate the gradient of the loss function with respect
to the parameters.

12-07-2020 28
Backward Propagation

dZ — Gradient of the cost with respect to


the linear output (of current layer l).

12-07-2020 29
Training

Training refers to the task of finding


the optimal combination of weights
and biases to minimize the total loss
The optimization is done using the
familiar gradient descent algorithm.
In gradient descent, the parameter
being optimized is iterated in the
direction of reducing cost according
to the following rule

12-07-2020 30
Gradient Descent
Starting at the top of the mountain, we take our first
step downhill in the direction specified by the negative
gradient. Next we recalculate the negative gradient
(passing in the coordinates of our new point) and take
another step in the direction it specifies. We continue
this process iteratively until we get to the bottom of our
graph, or to a point where we can no longer move
downhill–a local minimum.

Gradient descent is an optimization algorithm used to


minimize some function by iteratively moving in the
direction of steepest descent as defined by the
negative of the gradient

12-07-2020 31
#LifeKoKaroLift

Thank You!

Introduction to Reinforcement learning by Sutton and barto


https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf 21
12-07-2020

You might also like