AIML-Module-3-part 2
AIML-Module-3-part 2
AIML-Module-3-part 2
2
Biological Motivation
The basic computational unit of the brain is a neuron.
3
Neurons
4
Biological Motivation
5
Biological Motivation
6
Module 3- Outline
Artificial Neural Network
1. Biological Motivation
2. Neural Network Representation
3. Appropriate Problems for NN learning
4. Perceptions
5. Multilayer Networks and Backpropagation Algorithm
6. Remarks on Backpropagation Algorithm
7. Summary
7
Neuron
8
Typical NN
9
Example: Autonomous Driving
10
Example: Autonomous Driving
11
Autonomous Driving
12
Autonomous Driving
13
Autonomous
driving
14
Autonomous Driving
15
Example 2: Bank Credit Score
16
Example 2: Bank Credit Score
To make things clearer, lets understand ANN using a simple
example: A bank wants to assess whether to approve a loan
application to a customer, so, it wants to predict whether a
customer is likely to default on the loan. It has data like
below:
17
Example 2: Bank Credit Score
18
Example 2: Bank Credit Score
19
Example 2: Bank Credit Score
20
Some Applications
Autonomous Driving
Speech Phenome Recognition
Image Classification
Financial Prediction
21
Properties of NNs
22
Module 3- Outline
Artificial Neural Network
1. Biological Motivation
2. Neural Network Representation
3. Appropriate Problems for NN learning
4. Perceptions
5. Multilayer Networks and Backpropagation Algorithm
6. Remarks on Backpropagation Algorithm
7. Summary
23
Appropriate problems for ANN
24
Module 3- Outline
Artificial Neural Network
1. Biological Motivation
2. Neural Network Representation
3. Appropriate Problems for NN learning
4. Perceptions
5. Multilayer Networks and Backpropagation Algorithm
6. Remarks on Backpropagation Algorithm
7. Summary
25
A Perceptron
26
Perceptron
27
Popular Activation functions
Sigmoid
Step
Sign
28
29
Architectures
30
Single Layer
31
Multi Layer Perceptrons
32
Recurrent
33
Mesh
34
Representation power
35
36
37
38
39
Neural Representation of AND, OR,
NOT Logic Gates (Perceptron
Algorithm)
40
AND gate
42
AND - Another representation
43
OR gate
44
NOT gate
45
NAND gate
46
XOR gate
47
XNOR
48
Key Terms
Input Nodes (input layer)
• Just pass the information to the next layer
• A block of nodes is also called layer.
Hidden nodes (hidden layer)
• In Hidden layers is where intermediate processing or computation is
done,
• they perform computations and then transfer the weights (signals or
information) from the input layer to the next layers
• It is possible to have a neural network without a hidden layer also.
Output Nodes (output layer)
• Here we finally use an activation function that maps to the desired
output format (e.g. softmax for classification).
49
Key Terms
Connections and weights
• The network consists of connections, each connection
transferring the output of a neuron i to the input of a
neuron j.
• In this sense i is the predecessor of j and j is the successor
of i, Each connection is assigned a weight Wij.
Activation function
• the activation function of a node defines the output of
that node given an input or set of inputs.
• A standard computer chip circuit can be seen as a digital
network of activation functions that can be “ON” (1) or
“OFF” (0), depending on input.
• In artificial neural networks this function is also called the
transfer function.
50
Key Terms
Learning rule
• The learning rule is a rule or an algorithm which modifies
the parameters of the neural network, in order for a given
input to the network to produce a favored output.
• This learning process typically amounts to modifying the
weights and thresholds.
51
Perceptron Training Rule
52
Illustration – Perceptron learning
53
Illustration – Perceptron learning
54
Illustration – Perceptron learning
We do only ONE
epoch/iteration
Consider training
example-1
55
Illustration – Perceptron learning
Consider
training
example-2
56
Illustration –
Perceptron
learning
Consider
training
Example-3
57
Limitation
Perceptron rule fails if data is not linearly separable
58
Delta Rule
The Delta Rule employs
• the error function for what is known as Gradient Descent
learning,
• which involves the ‘modification of weights along the most
direct path in weight-space to minimize error’
• so change applied to a given weight is proportional to the
negative of the derivative of the error with respect to that
weight
59
Delta Rule
60
Error Surface
61
Gradient Descent
62
Gradient Descent
64
Derivation of Gradient Descent
65
Gradient Descent
66
Gradient Descent Algorithm
67
Perceptron vs Delta rule
Mainly there are two differences between the perceptron and
the delta rule.
68
Perceptron vs Delta rule
69
Module 3- Outline
Artificial Neural Network
1. Biological Motivation
2. Neural Network Representation
3. Appropriate Problems for NN learning
4. Perceptions
5. Multilayer Networks and Backpropagation Algorithm
6. Remarks on Backpropagation Algorithm
7. Summary
70
Multilayer Networks (ANN)
71
BACKPROPOGATION
72
One major difference in the case of multilayer networks is - the error
surface can have multiple local minima
73
BACKPROPOGATION Algorithm
74
This algorithm applies to feedforward networks
• containing two layers of sigmoid units,
• with units at each layer connected to all units from the
preceding layer.
This is the incremental or stochastic, gradient descent
version of Backpropagation.
The notation used here is the same as that used in earlier
sections, with the following extensions:
75
Termination Conditions
76
Adding Momentum
77
δ computation
δk is simply the familiar (tk - ok) from the delta rule, multiplied by the
factor ok(1 - ok), which is the derivative of the sigmoid squashing function.
However, since training examples provide target values tk only for network
outputs,
• no target values are directly available to indicate the error of hidden
units values.
Instead, the error term for hidden unit h is calculated by
• summing the error terms δk for each output unit influenced by h,
• weighting each of the δk's by wkh, the weight from hidden unit h to
output unit k.
• This weight characterizes the degree to which hidden unit h is
"responsible for" the error in output unit k.
78
Derivation of Backpropagation rule
We have where
79
Derivation of Backpropagation rule
80
81
82
Case 2: Training Rule for Hidden Unit Weights.
83
BACKPROPOGATION Algorithm
84
Learning Algorithm:
Backpropagation
The following slides describes teaching process of multi-layer neural network
employing backpropagation algorithm. To illustrate this process the three layer neural
network with two inputs and one output,which is shown in the picture below, is used:
Learning Algorithm:
Backpropagation
Each neuron is composed of two units. First unit adds products of weights coefficients and
input signals. The second unit realise nonlinear function, called neuron transfer (activation)
function. Signal e is adder output signal, and y = f(e) is output signal of nonlinear element.
Signal y is also output signal of neuron.
Learning Algorithm:
Backpropagation
• To teach the neural network we need training data set. The training data set
consists of input signals (x1 and x2 ) assigned with corresponding target
(desired output) z.
• Each teaching step starts with forcing both input signals from training set.
After this stage we can determine output signals values for each neuron in
each network layer.
Learning Algorithm:
Backpropagation
Pictures below illustrate how signal is propagating through the network,
Symbols w(xm)n represent weights of connections between network input xm and
neuron n in input layer. Symbols yn represents output signal of neuron n.
Learning Algorithm:
Backpropagation
Learning Algorithm:
Backpropagation
Learning Algorithm:
Backpropagation
Learning Algorithm:
Backpropagation
Learning Algorithm:
Backpropagation
The weights' coefficients wmn used to propagate errors back are equal to
this used during computing output value. Only the direction of data flow
is changed (signals are propagated from output to inputs one after the
other). This technique is used for all network layers. If propagated errors
came from few neurons they are added. The illustration is below:
Learning Algorithm:
Backpropagation
When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified. In formulas
below df(e)/de represents derivative of neuron activation function
(which weights are modified).
Learning Algorithm:
Backpropagation
When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified.
When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified. In formulas
below df(e)/de represents derivative of neuron activation function
(which weights are modified).
Illustration
we will build a neural network with three layers:
• Input layer with two inputs neurons
• One hidden layer with two neurons
• Output layer with a single neuron
101
Dataset
102
Our initial weights will be as following: w1 = 0.11, w2 = 0.21,
w3 = 0.12, w4 = 0.08, w5 = 0.14 and w6 = 0.15
103
Forward Pass
104
Calculating Error
105
Reducing Error
106
Backpropagation (of errors)
107
Now, using the new weights we will repeat the forward pass
through another training example
108
Representation Power
109
Inductive Bias
110
Illustration: Face Recognition
111
Illustration: Design Choice
112
Illustration: Design Choice
113
Module 3- Outline
Artificial Neural Network
1. Biological Motivation
2. Neural Network Representation
3. Appropriate Problems for NN learning
4. Perceptions
5. Multilayer Networks and Backpropagation Algorithm
6. Remarks on Backpropagation Algorithm
7. Summary
114
Convergence and Local Minima
Convergence to global minima is not guaranteed
Still, backpropagation is a highly effective function approximation method
in practice.
Despite the comments, gradient descent over the complex error surfaces
represented by ANNs is still poorly understood,
• no methods are known to predict with certainty when local minima
will cause difficulties.
Common heuristics to attempt to alleviate the problem of local minima
include:
• Add a momentum term to the weight-update rule
• Use stochastic gradient descent rather than true gradient descent.
• Train multiple networks using the same data, but initializing each
network with different random weights.
115
Representational Power of Feedforward Networks
117
Hidden layer representation
One intriguing property of Backpropagation is its ability to
discover useful intermediate representations at the hidden
unit layers inside the network
Backpropagation to define hidden layer features that are not
explicit in the input representation,
• but which capture properties of the input instances that
are most relevant to learning the target function.
118
Generalization, Overfitting, and
Stopping Criterion
Unspecified in the algorithm
One obvious choice is to continue training until the error E on
the training examples falls below some predetermined
threshold.
• This is a poor strategy because Backpropagation is
susceptible to overfitting the training examples at the cost
of decreasing generalization accuracy over other unseen
examples.
119
Module 3- Outline
Artificial Neural Network
1. Biological Motivation
2. Neural Network Representation
3. Appropriate Problems for NN learning
4. Perceptions
5. Multilayer Networks and Backpropagation Algorithm
6. Remarks on Backpropagation Algorithm
7. Summary
120
Summary
121
Advanced Topics
122
Thank You
123