Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Adaline, Madaline, Widrow Hoff

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

ADALINE, MADALINE and the

Widrow-Hoff Rule

Adaptive Linear Combiner

ADALINE

MADALINE

Minimal Disturbance Principle


Adjust weights to reduce error wrt current
pattern with minimal disturbance to patterns
already learnt.
In other words, make changes to the weight
vector in the same direction as the input

Learning Rules

Error Correction- Single Element


Network

Perceptron Convergence Rule


Non-linear

Weight update:

Quantizer error:

Geometric Visualization of the


Perceptron Convergence Rule

Geometric Visualization of the


Perceptron Convergence Rule

Geometric Visualization of the


Perceptron Convergence Rule

Least Mean Square (LMS)


Linear

Weight Update equation:

Error for the kth input pattern

Change in error for the kth input pattern after the weights have been updated:

Condition for convergence and stability

Error Correction Rules for MultiLayer Networks

Madaline Rule 1
Non-Linear

Steps:
If output matches the desired response- no
adaptation
If output is different:
- Find the adaline whose linear sum is closest to
0
- Adapt its weights in the LMS direction far
enough to reverse its output.
- LOAD SHARING : Do until you get desired
response.

Madaline Rule II
Non-Linear

Steps: (For one training pattern)


Similar to MR I
Concept of trial adaptation, by adding a small
perturbation of suitable amplitude and
polarity
If output Hamming error is reduced- change
the weights of that adaline in direction
collinear with input, else no adaptation
Keep doing this for all adalines with
sufficiently small linear output magnitude.
Finally last layer adapted using alpha-LMS.

Steepest Descent Single


Element Network

Error Surface of a Linear Combiner

The Optimal Weiner Hopf Weight


The squared error can be
written as:
Taking expectation of the
above expression yields:

So MSE surface equations is :

With global optimal weight solution as:

Gradient Descent Algorithm


The aim of gradient descent is to make weight
updates in the direction of the negative gradient
by a factor of , which controls the stability and
convergence of the algorithm and is the
gradient a point on the MSE surface
corresponding to w= .
+1 = + ( )

- LMS
Linear

It uses the Instantaneous Gradient i.e. the


gradient of the squared error of the current
training sample is used as an approximation of
the actual gradient.

Since instantaneous gradient can be easily


calculated from the current sample, rather
than averaging over instantaneous gradients
over all pattern in the training set.
For stability and convergence we need:

Madaline III
Non-Linear

Steps:
Small perturbation added to input
Change in error and change in putput due to
this perturbation on input is calculated
Given this change in output error wrt input
perturbation, instantaneous gradient can be
calculated.
It is shown to be mathematically equivalent to
backprop if input perturbation is small.

Approximate Gradient:

And therefore:

Since:

So for small perturbation:

So weight update equation is thus:

Alternatively:

So weight update equation is thus:

No need to know apriori the nature of the


activation function
Robust to drifts in analog hardware

Steepest Descent- Multi-Layer


Networks

Madaline -III

Steps:
Same as a single element, except here the
change due to perturbation is measure at the
output of multiple layers.
Add perturbation to the linear sum.
Measure the change in sum of squared error
caused due to this perturbation.
Obtain the instantaneous gradient of MSE wrt
weight vector of the perturbed adaline.

Relevance to Present day work


-LMS and -LMS are still used today
MR-III and MR-II can be applied on
complicated architectures
Given an arbitrary activation function, one can
actually use MR-III architecture without
requiring the activation function to be known

You might also like