Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

AIML-Module-3-part 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 122

Module 3- Outline

Artificial Neural Network


1. Biological Motivation
2. Neural Network Representation
3. Appropriate Problems for NN learning
4. Perceptions
5. Multilayer Networks and Backpropagation Algorithm
6. Remarks on Backpropagation Algorithm
7. Summary

2
Biological Motivation
 The basic computational unit of the brain is a neuron.

3
Neurons

4
Biological Motivation

5
Biological Motivation

6
Module 3- Outline
Artificial Neural Network
1. Biological Motivation
2. Neural Network Representation
3. Appropriate Problems for NN learning
4. Perceptions
5. Multilayer Networks and Backpropagation Algorithm
6. Remarks on Backpropagation Algorithm
7. Summary

7
Neuron

8
Typical NN

9
Example: Autonomous Driving

10
Example: Autonomous Driving

11
Autonomous Driving

12
Autonomous Driving

13
Autonomous
driving

14
Autonomous Driving

15
Example 2: Bank Credit Score

16
Example 2: Bank Credit Score
 To make things clearer, lets understand ANN using a simple
example: A bank wants to assess whether to approve a loan
application to a customer, so, it wants to predict whether a
customer is likely to default on the loan. It has data like
below:

17
Example 2: Bank Credit Score

18
Example 2: Bank Credit Score

19
Example 2: Bank Credit Score

20
Some Applications
 Autonomous Driving
 Speech Phenome Recognition
 Image Classification
 Financial Prediction

21
Properties of NNs

22
Module 3- Outline
Artificial Neural Network
1. Biological Motivation
2. Neural Network Representation
3. Appropriate Problems for NN learning
4. Perceptions
5. Multilayer Networks and Backpropagation Algorithm
6. Remarks on Backpropagation Algorithm
7. Summary

23
Appropriate problems for ANN

24
Module 3- Outline
Artificial Neural Network
1. Biological Motivation
2. Neural Network Representation
3. Appropriate Problems for NN learning
4. Perceptions
5. Multilayer Networks and Backpropagation Algorithm
6. Remarks on Backpropagation Algorithm
7. Summary

25
A Perceptron

26
Perceptron

27
Popular Activation functions
 Sigmoid

 Step

 Sign

28
29
Architectures

30
Single Layer

31
Multi Layer Perceptrons

32
Recurrent

33
Mesh

34
Representation power

35
36
37
38
39
Neural Representation of AND, OR,
NOT Logic Gates (Perceptron
Algorithm)

40
AND gate

42
AND - Another representation

43
OR gate

44
NOT gate

45
NAND gate

46
XOR gate

47
XNOR

48
Key Terms
 Input Nodes (input layer)
• Just pass the information to the next layer
• A block of nodes is also called layer.
 Hidden nodes (hidden layer)
• In Hidden layers is where intermediate processing or computation is
done,
• they perform computations and then transfer the weights (signals or
information) from the input layer to the next layers
• It is possible to have a neural network without a hidden layer also.
 Output Nodes (output layer)
• Here we finally use an activation function that maps to the desired
output format (e.g. softmax for classification).

49
Key Terms
 Connections and weights
• The network consists of connections, each connection
transferring the output of a neuron i to the input of a
neuron j.
• In this sense i is the predecessor of j and j is the successor
of i, Each connection is assigned a weight Wij.
 Activation function
• the activation function of a node defines the output of
that node given an input or set of inputs.
• A standard computer chip circuit can be seen as a digital
network of activation functions that can be “ON” (1) or
“OFF” (0), depending on input.
• In artificial neural networks this function is also called the
transfer function.

50
Key Terms
 Learning rule
• The learning rule is a rule or an algorithm which modifies
the parameters of the neural network, in order for a given
input to the network to produce a favored output.
• This learning process typically amounts to modifying the
weights and thresholds.

51
Perceptron Training Rule

52
Illustration – Perceptron learning

53
Illustration – Perceptron learning

54
Illustration – Perceptron learning

 We do only ONE
epoch/iteration
 Consider training
example-1

55
Illustration – Perceptron learning

 Consider
training
example-2

56
Illustration –
Perceptron
learning

 Consider
training
Example-3

57
Limitation
 Perceptron rule fails if data is not linearly separable

58
Delta Rule
 The Delta Rule employs
• the error function for what is known as Gradient Descent
learning,
• which involves the ‘modification of weights along the most
direct path in weight-space to minimize error’
• so change applied to a given weight is proportional to the
negative of the derivative of the error with respect to that
weight

59
Delta Rule

60
Error Surface

61
Gradient Descent

62
Gradient Descent

Gradient descent is an iterative optimization algorithm for finding the


minimum of a function; in our case we want to minimize the error
function.

To find a local minimum of a function using gradient descent, one takes


steps proportional to the negative of the gradient of the function at the
current point.
63
Derivation of Gradient Descent

64
Derivation of Gradient Descent

Therefore weight update rule for gradient descent is

65
Gradient Descent

66
Gradient Descent Algorithm

67
Perceptron vs Delta rule
 Mainly there are two differences between the perceptron and
the delta rule.

1. The perceptron is based on an output from a step function,


whereas the delta rule uses the linear combination of
inputs directly.
2. The perceptron is guaranteed to converge to a consistent
hypothesis assuming the data is linearly separable.
The delta rules converges in the limit but it does not need
the condition of linearly separable data.

68
Perceptron vs Delta rule

69
Module 3- Outline
Artificial Neural Network
1. Biological Motivation
2. Neural Network Representation
3. Appropriate Problems for NN learning
4. Perceptions
5. Multilayer Networks and Backpropagation Algorithm
6. Remarks on Backpropagation Algorithm
7. Summary

70
Multilayer Networks (ANN)

σ'(y) = σ(y) (1- σ(y))

71
BACKPROPOGATION

72
One major difference in the case of multilayer networks is - the error
surface can have multiple local minima

Despite this obstacle, in practice Backpropagation Algorithm been found to


produce excellent results in many real-world applications.

73
BACKPROPOGATION Algorithm

74
 This algorithm applies to feedforward networks
• containing two layers of sigmoid units,
• with units at each layer connected to all units from the
preceding layer.
 This is the incremental or stochastic, gradient descent
version of Backpropagation.
 The notation used here is the same as that used in earlier
sections, with the following extensions:

75
Termination Conditions

76
Adding Momentum

77
δ computation
 δk is simply the familiar (tk - ok) from the delta rule, multiplied by the
factor ok(1 - ok), which is the derivative of the sigmoid squashing function.
 However, since training examples provide target values tk only for network
outputs,
• no target values are directly available to indicate the error of hidden
units values.
 Instead, the error term for hidden unit h is calculated by
• summing the error terms δk for each output unit influenced by h,
• weighting each of the δk's by wkh, the weight from hidden unit h to
output unit k.
• This weight characterizes the degree to which hidden unit h is
"responsible for" the error in output unit k.

78
Derivation of Backpropagation rule

 We have where

79
Derivation of Backpropagation rule

 Let us derive the expression for


stochastic gradient descent rule.

 Case 1: Training Rule for Output Unit Weights.

80
81
82
Case 2: Training Rule for Hidden Unit Weights.

83
BACKPROPOGATION Algorithm

84
Learning Algorithm:
Backpropagation
The following slides describes teaching process of multi-layer neural network
employing backpropagation algorithm. To illustrate this process the three layer neural
network with two inputs and one output,which is shown in the picture below, is used:
Learning Algorithm:
Backpropagation
Each neuron is composed of two units. First unit adds products of weights coefficients and
input signals. The second unit realise nonlinear function, called neuron transfer (activation)
function. Signal e is adder output signal, and y = f(e) is output signal of nonlinear element.
Signal y is also output signal of neuron.
Learning Algorithm:
Backpropagation
• To teach the neural network we need training data set. The training data set
consists of input signals (x1 and x2 ) assigned with corresponding target
(desired output) z.

• The network training is an iterative process. In each iteration weights


coefficients of nodes are modified using new data from training data set.
Modification is calculated using algorithm described below:

• Each teaching step starts with forcing both input signals from training set.
After this stage we can determine output signals values for each neuron in
each network layer.
Learning Algorithm:
Backpropagation
Pictures below illustrate how signal is propagating through the network,
Symbols w(xm)n represent weights of connections between network input xm and
neuron n in input layer. Symbols yn represents output signal of neuron n.
Learning Algorithm:
Backpropagation
Learning Algorithm:
Backpropagation
Learning Algorithm:
Backpropagation
Learning Algorithm:
Backpropagation
Learning Algorithm:
Backpropagation

Propagation of signals through the output layer.


Learning Algorithm:
Backpropagation

In the next algorithm step the output signal of the network y is


compared with the desired output value (the target), which is found in
training data set. The difference is called error signal d of output layer
neuron
Learning Algorithm:
Backpropagation

The idea is to propagate error signal d (computed in single teaching step)


back to all neurons, which output signals were input for discussed
neuron.
Learning Algorithm:
Backpropagation

The idea is to propagate error signal d (computed in single teaching step)


back to all neurons, which output signals were input for discussed
neuron.
Learning Algorithm:
Backpropagation

The weights' coefficients wmn used to propagate errors back are equal to
this used during computing output value. Only the direction of data flow
is changed (signals are propagated from output to inputs one after the
other). This technique is used for all network layers. If propagated errors
came from few neurons they are added. The illustration is below:
Learning Algorithm:
Backpropagation

When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified. In formulas
below df(e)/de represents derivative of neuron activation function
(which weights are modified).
Learning Algorithm:
Backpropagation
When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified.

In formulas below df(e)/de represents derivative of neuron activation


function (which weights are modified).
Learning Algorithm:
Backpropagation

When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified. In formulas
below df(e)/de represents derivative of neuron activation function
(which weights are modified).
Illustration
 we will build a neural network with three layers:
• Input layer with two inputs neurons
• One hidden layer with two neurons
• Output layer with a single neuron

101
Dataset

102
 Our initial weights will be as following: w1 = 0.11, w2 = 0.21,
w3 = 0.12, w4 = 0.08, w5 = 0.14 and w6 = 0.15

103
Forward Pass

104
Calculating Error

105
Reducing Error

The question now is how to change\update the weights value


so that the error is reduced?

The answer is Backpropagation!

106
Backpropagation (of errors)

107
 Now, using the new weights we will repeat the forward pass
through another training example

108
Representation Power

109
Inductive Bias

110
Illustration: Face Recognition

111
Illustration: Design Choice

112
Illustration: Design Choice

113
Module 3- Outline
Artificial Neural Network
1. Biological Motivation
2. Neural Network Representation
3. Appropriate Problems for NN learning
4. Perceptions
5. Multilayer Networks and Backpropagation Algorithm
6. Remarks on Backpropagation Algorithm
7. Summary

114
Convergence and Local Minima
 Convergence to global minima is not guaranteed
 Still, backpropagation is a highly effective function approximation method
in practice.
 Despite the comments, gradient descent over the complex error surfaces
represented by ANNs is still poorly understood,
• no methods are known to predict with certainty when local minima
will cause difficulties.
 Common heuristics to attempt to alleviate the problem of local minima
include:
• Add a momentum term to the weight-update rule
• Use stochastic gradient descent rather than true gradient descent.
• Train multiple networks using the same data, but initializing each
network with different random weights.

115
Representational Power of Feedforward Networks

 What set of functions can be represented by feedforward networks?


 Although much is still unknown about which function classes can be
described by which types of networks, three quite general results are
known:
• Every Boolean functions.
• number of hidden units required grows exponentially in the
worst case with the number of network inputs.
• Every bounded Continuous functions.
• The networks that use sigmoid units at the hidden layer and
(unthresholded) linear units at the output layer will achieve this.
• The number of hidden units required depends on the function to
be approximated.
• Arbitrary functions.
• Any function can be approximated to arbitrary accuracy by a
network with three layers of units.
116
Hypothesis Space Search and Inductive Bias

 Hypothesis space is the n-dimensional Euclidean space of the


n network weights.
 Hypothesis space is continuous

 It is difficult to characterize precisely the inductive bias of


backpropagation
• Smooth interpolation between the data points

117
Hidden layer representation
 One intriguing property of Backpropagation is its ability to
discover useful intermediate representations at the hidden
unit layers inside the network
 Backpropagation to define hidden layer features that are not
explicit in the input representation,
• but which capture properties of the input instances that
are most relevant to learning the target function.

118
Generalization, Overfitting, and
Stopping Criterion
 Unspecified in the algorithm
 One obvious choice is to continue training until the error E on
the training examples falls below some predetermined
threshold.
• This is a poor strategy because Backpropagation is
susceptible to overfitting the training examples at the cost
of decreasing generalization accuracy over other unseen
examples.

119
Module 3- Outline
Artificial Neural Network
1. Biological Motivation
2. Neural Network Representation
3. Appropriate Problems for NN learning
4. Perceptions
5. Multilayer Networks and Backpropagation Algorithm
6. Remarks on Backpropagation Algorithm
7. Summary

120
Summary

121
Advanced Topics

122
Thank You

123

You might also like