Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

lecture 9-NN- modified

Uploaded by

20208046
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

lecture 9-NN- modified

Uploaded by

20208046
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

Lecture 9: Artificial Neural Network (ANN)

Biological Inspirations
• Humans perform complex tasks like vision, motor
control, or language understanding very well.

• One way to build intelligent machines is to try to


imitate the (organizational principles of) human
brain.
Human Brain
Biological Neuron
Biological Neuron
A biological neuron may have as many as 10,000 different inputs, and may send its
output (the presence or absence of a short duration spike) to many other neurons.
Neurons are wired up in a 3-dimensional pattern.
Dendrites: Nerve fibres carrying electrical signals to the cell.
Cell Body: Computes a non-linear function of its inputs .
Axon: Single long fiber that carries the electrical signal from the cell body to other
neurons.
Synapse : The point of contact between the axon of one cell and the dendrite of
another, regulating a chemical connection whose strength affects the input to the
cell.
Neural Network
What is Artificial Neural Network?
Properties of Artificial Neural Network
Model Of A Neuron
Wa
X1

X2 Wb S f(S) Y

X3 Wc

Input units Connection Summing computation


weights function
(axon)
(dendrite) (synapse) (soma)
Modelling a Neuron

▪ aj :Activation value of unit j


ini =  j Wj , iaj ▪ wj,I :Weight on the link from unit j to unit i
▪ inI :Weighted sum of inputs to unit i
▪ aI :Activation value of unit i
▪g :Activation function
Processing of ANN
1) Network Topology
Single Layer Network (perceptron)
Multi Layer Network
2) Adjustments of Weights or Learning

• Reinforcement learning
3) Activation Functions

• It is the extra force or effort applied over the input to obtain an exact output. In
ANN, we can also apply activation functions over the input to get the exact output.
Why do we need Activation Functions?
• Neural Network without activation function simply be a linear regression model
• a linear equation is polynomial of one degree.
• We want a neural network to not just learn and compute a linear function but
something more complicated than that.
• Complicated kind of data such as images, videos, audio, speech etc.
Activation Function types

ReLU Softplus

Sigmoid/logist Tanh
ic

Binary Signum

Softmax
Activation Function types
1- Binary Step
• Binary step function depends on a threshold value that
decides whether a neuron should be activated or not.

• If the input is greater than it, then the neuron is


activated, else it is deactivated, meaning that its output
is not passed on to the next hidden layer.

limitations

•It cannot provide multi-value outputs—for example, it cannot be used for multi-class
classification problems.
•The gradient of the step function is zero, which causes a hindrance in the backpropagation
process.
Activation Function types
2- Sigmoid / Logistic Activation Function
• This function takes any real value as input
and outputs values in the range of 0 to 1.
• it is commonly used for models where we
have to predict the probability as an
output.
• The function is differentiable and provides a
smooth gradient, i.e., preventing jumps in
output values. This is represented by an S-
shape. It is derivable at every point.
Activation Function types
2- Sigmoid / Logistic Activation Function
limitations
•As the gradient value approaches zero, the network
stops to learn and suffers from the Vanishing gradient
problem.
•The outputs aren’t zero centred. The output of this activation function always lies
within 0 & 1 i.e. always positive. As a result, it would take a substantially longer time to
converge. Whereas zero centred function helps in fast convergence.
•It saturates and kills gradients. Refer to the figure of the derivative of the sigmoid. At
both positive and negative ends, the value of the gradient saturates at 0. That means for
those values, the gradient will be 0 or close to 0, which simply means no learning in
backpropagation.
• It is computationally expensive because of the exponential term in it.
Activation Function types
3- Tanh (Hyperbolic Tangent)
• Tanh function is very similar to the sigmoid/logistic activation
function, and even has the same S-shape with the difference in
output range of -1 to 1. it is a mathematically shifted version
of sigmoid .
• It has similar advantages as sigmoid but better than that
because it is zero centred. The output of tanh lies between
-1 and 1. Hence solving one of the issues with the sigmoid.
limitations
•It also has the problem of vanishing gradient but the
derivatives are steeper than that of the sigmoid. Hence making
the gradients stronger for tanh than sigmoid.
•As it is almost similar to sigmoid, tanh is also computationally
expensive. because of the exponential term in it.
•Similar to sigmoid, here also the gradients saturate.
Activation Function types
4- ReLU Function (Rectified Linear Unit)
•It is computationally effective as it involves simpler
mathematical operations than sigmoid and tanh.
•Although it looks like a linear function, it adds non-linearity to
the network, making it able to learn complex patterns.
•It doesn't suffer from the vanishing gradient problem.
•It is unbounded at the positive side. Hence removing the
problem of gradient saturation.

limitations
•It suffers from the dying ReLU problem. ReLU is always going to
discard the negative values i.e. the deactivations by making it 0. But
because of this, the gradient of these units will also become 0 .
• It is non-differentiable at 0.
Activation Function types
5- Softmax
• the Softmax function is described as a combination of multiple
sigmoids.
• It calculates the relative probabilities. Similar to the sigmoid/logistic activation function,
the SoftMax function returns the probability of each class.
• It is most commonly used as an activation function for the last layer of the neural
network in the case of multi-class classification.
• It is able to handle multiple classes. It normalizes the outputs for each class
between 0 and 1 and divides by their sum. Hence forming a probability
distribution. Therefore giving a clear probability of input belonging to any
particular class.
How to choose the Activation Function
• You need to match your activation function for your output layer based on the
type of prediction problem that you are solving—specifically, the type of
predicted variable.

• you can begin with using the ReLU activation function and then move over to
other activation functions if ReLU doesn’t provide optimum results.

➢ And here are a few other guidelines to help you out.

1. ReLU activation function should only be used in the hidden layers.

2.Sigmoid/Logistic and Tanh functions should not be used in hidden layers as they make the
model more susceptible (sensitive) to problems during training (due to vanishing gradients).
How to choose the Activation Function
a few rules for choosing the activation function for your output layer
based on the type of prediction problem that you are solving:

➢ Regression - Linear Activation Function

➢ Binary Classification - Sigmoid/Logistic Activation Function

➢ Multiclass Classification - Softmax

➢ Convolutional Neural Network (CNN): ReLU activation function.

➢ Recurrent Neural Network: Tanh and/or Sigmoid activation function.


The First Neural Neural Networks

AND
X1 1
X1 X2 Y
Y
1 1 1
X2 1 1 0 0
0 1 0
AND Function
0 0 0
Threshold(Y) = 2
The First Neural Neural Networks

X1 2 OR
Y
X1 X2 Y
1 1 1
X2 2 1 0 1
0 1 1
OR Function
0 0 0
Threshold(Y) =2
Threshold(Y) =2
Architecture of a typical artificial neural network

XOR
W1 𝑾𝟏𝟏𝟏
W2 𝑾𝟏𝟐𝟏
W3 𝑾𝟏𝟏𝟐
W4 𝑾𝟏𝟐𝟐
W5 𝑾𝟐𝟏𝟏
W6 𝑾𝟐𝟏𝟐

Input layer Hidden layer Output layer


W1
A W5
W2
x1

W3 1/0

B W6

x2 W4

Weights =Wi
Activation function
input

Input layer Hidden Output layer


layer
W1
A W5
W2
x1
1/0

Yi
W3 W6
B
x2 W4
Activation function
input

Input layer Hidden Output


layer layer
W1
A W5

x1 W2
1/0

Yi
W3 W6
B
x2 W4
S= sop (Xi,Wi)
𝑚
s
𝑆 = ෍ 𝑥𝑖 𝑤𝑖
1
Activation function
input

Input layer Hidden Output


layer layer
W1
A W5

x1 W2
1/0

Yi
W3 W6
B
x2 W4
s
Sop=x1w1+x2w3
Activation function
input

Input layer Hidden Output


layer layer
W1
A W5

x1 W2
1/0

Yi
W3 W6
B
x2 s
W4
Sop=x1w2+x2w
4
Activation function
input
Input layer Hidden Output
layer layer
W1
A W5
W2
x1 1/0

Yi
W3 W6
B
x2 W4
s
Sop=s1w5+s2w
6
Activation function
output

Input layer Hidden layer Output


layer
W1
W5
A
W2
x1 1/0

Yi
W3 W6
B
x2 W4 s F(s)
Activation Function types

ReLU Softplus

Sigmoid/logist Tanh
ic

Binary Signum

Softmax
Activation function
output

Input layer Hidden layer Output


layer
W1
W5
A
W2
x1 1/0

Yi
W3 W6
B
x2 W4 s bin
Bias
hidden & output layers neurons

Input layer Hidden layer Output


bi layer
W1
b3
A W5

x1 W2
1/0

Yi
W3 W6
B S1= b1 +x1w1+x2w3
x2 W4
S2= b2 +x1w2+x2w4
b S3= b3 +s1w5+s2w6
Learning Rate 0 ≤  ≤ 1

Step n=0,1,2,……,n

Input layer Hidden layer Output layer


bi
W1
b3
A W5

x1 W2
1/0

Yi
W3 W6
B
x2 W4
b2
Desired Output dj

Input layer Hidden layer Output


layer S1= b1 +x1w1+x2w3
bi
W1
b3 S2= b2 +x1w2+x2w4
A W5
S3= b3 +s1w5+s2w6
x1 W2
1/0

Yi
W3 W6
B
x2 W4
b
2
Element of Neural Network
neuron
Input Layer Layer Layer Outpu
1 2 …… L yt
𝑥1 1

…… y2
𝑥2

……
……

……

……

……
𝑥N …… yM
Input Output
Laye Hidden Layer
r Layers
Deep means many hidden
layers
Neural Network training steps
1 Weight Initialization

2 Inputs Application

3 Sum of inputs - Weights product

4 Activation functions

5 Weights Adaptations

6 Back to step 2
Regarding 5th step: Weights Adaptation
First method:

Learning Rate  0 ≤  ≤ 0 ≤ α ≤
1 1
Regarding 5th step: Weights Adaptation
second method: Back
propagation
▪ Forward VS Backward passes
The Backpropagation algorithm is a sensible approach for dividing the contribution of each weight.

Forward Backward

Input
Prediction Prediction Prediction Prediction Input
weight SOP SOP
Output Error Error Output weights
s
Regarding 5th step: Weights Adaptation
second method: Back
propagation
▪ Backward pass
What is the change in prediction Error (E) given the change in weight (W) ?
Get partial derivative of E W.R.T
𝜕𝐸 W
𝜕𝑊

1 m
1
𝐸 = (𝑑 − 𝑦)2
𝑓(𝑠) =
1 + e−𝑠 s =  xi w ji + bi w1 , w2
2 j
d (desired output) Const s (Sum Of Product SOP )
y ( predicted output)

1 1
E = (d − n
)2
2

e j x −
i wij
+b
i
Regarding 5th step: Weights Adaptation
second method: Back Chain
propagation Rule
▪ Weight derivative

1
1
𝐸 = (𝑑 − 𝑦)2
𝑦 = 𝑓(𝑠) =
1 + e−𝑠 s= x1 w1 + x 2 w2 + b w1 , w2
2

𝜕𝐸 𝜕𝐸 𝜕𝑦 s s
,
𝜕𝑊 𝜕𝑦 𝜕𝑠  w1  w2

E E y s E E y s
= x x = x x
 w1 y s  w1  w2 y s  w2
Regarding 5th step: Weights Adaptation
second method: Back
propagation
▪ Weight derivative
𝜕𝐸 𝜕 1
= (𝑑 − 𝑦)2 = 𝑦 − 𝑑
𝜕𝑦 𝜕𝑦 2
𝜕𝑦 𝜕 1 1 1
= −𝑠
= −𝑠
(1 − −𝑠
)
𝜕𝑠 𝜕𝑠 1 + e 1+e 1+e

s  s 
= x1 w1 + x 2 w2 + b = x1 = x1 w1 + x 2 w2 + b = x2
 w1  w1  w2  w2

E 1 1
= (y − d) −s
(1 − )
− s xi
 wi 1+ e 1+ e
Regarding 5th step: Weights Adaptation
second method: Back
propagation
▪ interpreting derivatives∇𝑊
E f ( s)
= (y − d) xi
 wi s

Derivatives
Derivatives sign Magnitude
Positive
directly
proportional

Negative
opposite
Regarding 5th step: Weights Adaptation
second method: Back
propagation
▪ Update the Weights
In order to update the weights , use the Gradient Descent

f(w) f(w)

- slop
+ slop

w w
Wnew= Wold - (-ve) Wnew= Wold - (+ve)
Simple Neural Network Training Example
(Backpropagation)
Neural Networks training example
Neural Networks training : Steps
Forward
1- Activation function input (SOP)

2- Activation function output


• In this example , the sigmoid activation function is
used.
• Based on the SOP calculated previously the output is as
follows
Neural Networks training : Steps
Forward
3- Prediction Error
• The square error function .

• Based on the predicted output ,The prediction error


is:-
Neural Networks training : Steps
Backward

𝜕𝐸 𝜕𝐸 𝜕𝑦 s s
,
𝜕𝑊 𝜕𝑦 𝜕𝑠  w1  w2
1- partial derivative of error w.r.t. predicted output

𝜕𝐸 𝜕 1
= (𝑑 − 𝑦)2 = 𝑦 − 𝑑
𝜕𝑦 𝜕𝑦 2
Neural Networks training : Steps
Backward

𝜕𝐸 𝜕𝐸 𝜕𝑦 s s
,
𝜕𝑊 𝜕𝑦 𝜕𝑠  w1  w2
2- partial derivative of predicted output w.r.t. SOP
𝜕𝑦 𝜕 1 1 1
= −𝑠
= −𝑠
(1 − −𝑠
)
𝜕𝑠 𝜕𝑠 1 + e 1+e 1+e
Neural Networks training : Steps
Backward

𝜕𝐸 𝜕𝐸 𝜕𝑦 s
𝜕𝑠 s
,
𝜕𝑊 𝜕𝑦 𝜕𝑠  w1  w2
𝜕𝑤 1
3- partial derivative of SOP w.r.t. W1
s s
,
 w1  w2
Neural Networks training : Steps
Backward

𝜕𝐸 𝜕𝐸 𝜕𝑦 s s
𝜕𝑠
,
𝜕𝑊 𝜕𝑦 𝜕𝑠  w1 𝜕𝑤w22
3- partial derivative of SOP w.r.t. W2
s 
= x1 w1 + x 2 w2 + b = x 2
 w2  w2
Neural Network Training Example
(Backpropagation)
Example
𝟎. 𝟏
𝟎. 𝟓 𝟎. 𝟏 𝟎. 𝟒 𝟎. 𝟒𝟖
𝒊𝒏𝒉𝟏 = 𝟎. 𝟑 =
𝟎. 𝟔𝟐 𝟎. 𝟐 −𝟎. 𝟏 𝟎. 𝟎𝟐𝟐
𝟏

𝟏
𝝈 𝒊𝒏𝒉𝟏 = 𝒐𝒖𝒕𝒉𝟏= 𝟏+𝒆−𝟎.𝟒𝟖 𝟎. 𝟔𝟏𝟖
𝟏 =
𝟎. 𝟓𝟎𝟔
𝟏_+𝒆−𝟎.𝟎𝟐𝟐

𝟎. 𝟔𝟏𝟖
𝒊𝒏𝒐𝒖𝒕 = −𝟎. 𝟐 𝟎. 𝟑 𝟏. 𝟖𝟑 𝟎. 𝟓𝟎𝟔 = 𝟏. 𝟖𝟓𝟖
𝟏

𝝈 𝒊𝒏𝒐𝒖𝒕 = 𝒐𝒖𝒕𝒐𝒖𝒕= 𝟏
= 𝟎. 𝟖𝟔𝟓
𝟏+𝒆−𝟏.𝟖𝟓𝟖
0.641
𝟎. 𝟖𝟔𝟓
0.641

𝟎.0.641
𝟖𝟔𝟓
0.64
𝟎. 𝟖𝟔𝟓
1

0.63
1
0.63
1

0.631
0.63
1
0.63
1

0.63
1
0.63
1
0.63
1

0.63
1
0.63
1
0.63
1

0.63
1
0.63
1
0.63
1

0.63
1
0.63
1
0.63
1

0.63
1

You might also like