Ar$ficial Neural Network - : Adaline and Madaline
Ar$ficial Neural Network - : Adaline and Madaline
Ar$ficial Neural Network - : Adaline and Madaline
Neural
Network-‐
X3
2
Training
Rule
• The
ac$va$on
func$on
used
is
y
=
1
if
y_in
≥
0
y
=
-‐1
if
y_in
<
0.
• The
training
rule
is
called
the
Widrow-‐Hoff
rule
or
the
Delta
Rule
• It
can
be
theore$cally
shown
that
the
rule
minimizes
the
root
mean
square
error
between
the
ac$va$on
value
and
the
target
value.
• That’s
why
it’s
called
the
the
Least
Mean
Square
(LMS)
rule
as
well.
3
The
δ
Rule
The
δ
rule
works
also
works
for
more
than
one
output
unit.
4
The
δ
Rule
Consider
one
single
output
unit.
The
delta
rule
changes
the
weights
of
the
neural
connec$ons
so
as
to
minimize
the
difference
between
the
net
input
to
the
output
unit
y_in
and
the
target
value
t.
The
goal
is
to
minimize
the
error
over
all
training
paYerns.
However,
this
is
accomplished
by
reducing
the
error
to
each
paYern
one
at
a
$me.
Weight
correc$ons
can
also
be
accumulated
over
a
number
of
training
paYerns
(called
batch
upda.ng)
if
desired.
5
The
Training
Algorithm
Ini$alize
weights
to
small
random
values
Set
learning
rate
α
to
a
value
between
0
and
1
while
(the
largest
weight
change
≤
threshold)
do
for
each
bipolar
training
pair
s:t
do
{Set
ac$va$on
of
input
units
i=1..n
{xi
=
si}
Compute
net
input
to
the
output
unit:
y_in
=
b
+
Σx_i
w_i
Update
bias
and
weights:
for
i=1..n
{
b(new)
=
b(old)
+
α(t
–
y_in)
w_i(new)
=
w_i(old)
+
α
(t
–
y_in)x_i}
}
//endfor
}//end
while
6
Sefng
Learning
Parameter
α
• Usually,
just
use
a
small
value
for
α,
something
like
0.1.
• If
the
value
is
too
large,
the
learning
process
will
not
converge.
• If
the
value
of
α
is
too
small,
learning
will
be
extremely
slow
(Hecht-‐Nielsen
1990).
• For
a
single
neuron,
a
prac$cal
range
for
α
is
0.1
≤
n
×
α
≤
1.0,
where
n
is
the
number
of
input
units
(Widrow,
Winger
and
Baxter
1988).
7
MADALINE
• When
several
ADALINE
units
are
arranged
in
a
single
layer
so
that
there
are
several
output
units,
there
is
no
change
in
how
ADALINEs
are
trained
from
that
of
a
single
ADALINE.
• A
MADALINE
consists
of
many
ADALINEs
arranged
in
a
mul$-‐
layer
net.
• We
can
think
of
a
MADALINE
as
having
a
hidden
layer
of
ADALINEs.
8
MADALINE
(Many
Adalines)
• A
Madaline
is
composed
of
several
Adalines
• Each
ADALINE
unit
has
a
bias.
There
are
two
hidden
ADALINEs,
z1
and
z2.
There
is
a
single
output
ADALINE
Y.
• Each
ADALINE
simply
applies
a
threshold
func$on
to
the
unit’s
net
input.
Y
is
a
non-‐linear
func$on
of
the
input
vector
(x1,
x2).
The
use
of
hidden
units
Z1
and
Z2
gives
the
net
addi$onal
power,
but
makes
training
more
complicated.
9
MADALINE
Training
There
are
two
training
algorithms
for
a
MADALINE
with
one
hidden
layer.
Algorithm
MR-‐I
is
the
original
MADALINE
training
algorithm
(Widrow
and
Hoff
1960).
MR-‐I
changes
the
weights
on
to
the
hidden
ADALINEs
only.
The
weights
for
the
output
unit
are
fixed.
It
assumes
that
the
output
unit
is
an
OR
unit.
MR-‐II
(Widrow,
Winter
and
Baxter
1987)
adjusts
all
weights
in
the
net.
It
doesn’t
make
the
assump$on
that
the
output
unit
is
an
OR
unit.
10
MR-‐I
Training
Algorithm
Determine
the
weights
of
units
(here,
v1,
v2
and
bias
b3)
such
that
the
output
unit
Y
behaves
like
an
OR
unit.
In
other
words,
Y
is
1
if
the
Z1
or
Z2
(or
both)
is
(are)
1;
Y
is
-‐1
if
both
Z1
and
Z2
are
-‐1.
Here
a
weight
of
½
on
each
of
v1,
v2
and
v3
works.
The
weights
on
the
hidden
ADALINEs
are
adjusted
according
to
MR-‐I
algorithm.
In
this
example,
weights
on
the
first
ADALINE
(w11
and
w21)
and
weights
on
the
second
ADALINE
(w12
and
w22)
are
adjusted
according
to
MR-‐I
algorithm.
Remember
the
ac$va$on
func$on
is
f(x)
=
1
if
x
≥
0
-‐1
if
x
<
0
11
MR-‐I
Training
Algorithm
Set
learning
parameter
α
//Assume
bipolar
units
and
outputs.
Only
1
hidden
layer.
while
stopping
condi$on
is
false
do
for
each
bipolar
training
pair
s:t
do
Set
ac$va$on
of
input
units
i
=
1
to
n
{
xi
=
si
}
Compute
net
input
to
hidden
units,
e.g.,
zin1
=
b1
+
x1
w11
+
x2
w21
Determine
output
of
each
hidden
ADALINE,
e.g.,
z1
=
f
(z_in1)
Determine
output
of
net:
yin
=
b3
+
z1
v1
+
z2
v2;
y
=
f(yin)
//Determine
error
and
update
weights
if
t=y,
then
no
updates
are
performed
//no
errror
if
t=1,
//error,
the
expected
output
is
1,
the
computed
output
is
-‐1;
at
least
one
of
the
Z’s
should
be
1
then
update
weights
on
Z_J,
the
unit
whose
net
input
is
closest
to
1
(or
closest
0,
both
are
the
same)
b_J
(new)
=
b_J
(old)
+
α
(1
–
z_inJ)
w_iJ
(new)
=
w_iJ
(old)
+
α
(1
–
z_inJ)
xi
if
t=-‐1,
then
update
weights
on
all
units
Z_k
that
have
posi$ve
net
input//error
endfor
endwhile
Stopping
criterion:
Weight
changes
have
stopped
or
reached
an
acceptable
level
or
auer
a
certain
number
of
itera$ons.
12
MR-‐I
Training
Algorithm
MoBvaBon
for
performing
updates:
Update
weights
only
if
an
error
has
occurred.
Update
weights
in
such
a
way
that
it
is
more
likely
for
the
net
to
produce
the
desired
response.
If
t=1
and
error
has
occurred
(i.e.,
y=-‐1,
or
the
OR
unit
is
off
when
it
should
actually
be
on):
It
means
that
all
Z
units
had
value
-‐1
and
at
least
one
Z
unit
needs
to
have
value
+1.
Therefore,
we
consider
Z_J
to
be
the
unit
whose
net
input
is
closest
to
0
and
adjust
its
weights.
If
t=-‐1
and
error
has
occurred
(i.e.,
y=1
or
the
OR
unit
is
on
when
it
should
actually
be
off):
It
means
that
at
least
one
Z
unit
had
value
+1
and
all
Z
units
must
have
value
-‐1.
Therefore,
we
adjust
the
weights
on
all
of
the
Z
units
with
posi$ve
net
input.
13
Example
of
Use
of
MRI
• Solving
the
XOR
problem
using
MRI
• The
training
paYerns
are:
14
Madaline
Training
for
XOR
Using
MR1
Algorithm
15
Geometric
Interpreta$on
of
Madaline
MR1
weights
• The
posi$ve
response
region
for
the
Madaline
trained
in
the
previous
example
is
the
union
of
the
regions
where
each
of
the
hidden
units
have
a
posi$ve
response.
• The
decision
boundary
for
each
hidden
unit
can
be
calculated
as
described
in
Sec$on
2.1.3
of
FauseYe’s
book.
16
Geometric
Interpreta$on
of
Madaline
MR1
weights
• We
see
the
posi$ve
response
regions
for
Z1
and
Z2,
and
then
the
posi$ve
response
region
for
the
output
Y
unit
which
is
the
intersec$on
of
the
two
Z1
and
Z2
regions.
17
MR-‐II
Training
Algorithm
There
is
no
assump$on
that
the
output
unit
acts
as
a
logical
OR.
The
goal
is
to
change
weights
in
all
layers
of
the
net,
i.e.,
in
all
hidden
layers
when
we
have
several
hidden
layers
+
output
layer.
But,
we
also
want
to
cause
the
least
disturbance
in
the
net
so
that
it
remains
stable
from
itera$on
to
itera$on.
This
causes
least
“unlearning”
of
the
paYerns
for
which
the
net
has
been
trained
previously.
This
is
some$mes
called
the
“don’t
rock
the
boat”
principle.
Several
output
nodes
may
be
used;
the
total
error
for
any
input
paYern
is
the
sum
of
the
squares
of
the
errors
at
each
output
unit.
18
MR-‐II
Training
Algorithm
The
MR-‐II
algorithm
is
considerably
different
from
the
back-‐
propaga$on
algorithm
we
will
learn
later.
The
weights
are
ini$alized
to
small
random
values
and
training
paYerns
are
presented
repeatedly
in
epochs.
The
algorithm
modifies
the
weights
for
the
nodes
in
hidden
layer=1,
then
layer=2,
..
up
to
the
output
layer.
The
training
algorithm
is
a
trial-‐and-‐error
procedure
following
the
minimum
disturbance
principle.
Nodes
that
can
affect
the
output
error
incurring
the
least
change
in
their
weights
have
precedence
in
the
learning
process.
19
MR-‐II
Training
Algorithm
Set
learning
rate
α
while
stopping
condi$on
is
false
do
for
each
bipolar
training
pair
s:t
do
Compute
output
of
the
net
based
on
current
weights
and
ac$va$on
func$on
if
t≠y,
then
for
each
unit
whose
net
input
is
sufficiently
close
to
0
(say,
between
-‐α
and
α,
with
α=0.25)
do
{Sort
all
such
units
in
the
network
at
all
levels
based
on
their
net
input
values.
Start
with
the
unit
whose
net
is
closest
to
0,
then
for
the
next
closest,
etc.
Change
the
unit’s
output
from
+1
to
-‐1,
or
vice
versa
If
modifying
the
output
of
this
node
improves
network
performance
(i.e.,
reduces
error
on
test
set)
then
//if
the
error
is
not
reduced,
undo
the
reversal
adjust
the
weights
on
this
unit
to
achieve
the
output
reversal}
//how
to
do
is
not
given
endfor
endwhile
Stopping
criterion:
Weight
changes
have
stopped
or
reached
an
acceptable
level
or
auer
a
certain
number
of
itera$ons.
20
21
MR-‐II
Training
Algorithm
22