Aaa ped-22-Artificial Neural Network: Introduction to ANN

Artifiial Neural Network:
Introduition to ANN
AAA-Python Edition

Plan
●
1- Introduction to ANN
●
2- Perceptron
●
3- Single Layer Perceptron
●
4- Single Layer Perceptron examples
●
5- Multi Layer Perceptron
●
6- ANN topologies

3
1-Introduitionto
ANN
[By Amina Delali]
ConceptConcept
●
The idea of an Artifiial Neural Network: (ANN) is to build a
model based on the way the human brain learns new things.
●
It can be used in any type of maihine learning. It learns by
extracting the diferent underlying patterns in a given data.
●
This extraction is performed by stages, called layers. Each layer,
composed by a set of neurons, will identify a certain pattern. The
following layer, will identify another more iomplex pattern, from
its previous layer.
●
The most common architecture is as follow: a frst layer, has the
training data as input. It is called the input layer. In the last one,
the output of the neurons are the fnal output. It is called the
output layer. The layers in between, are called hidden layers.
●
From now, the term neural network: will mean artifiial neural
network:.

4
1-Introduitionto
ANN
[By Amina Delali]
NeurolabNeurolab
●
Neurolab is Neural Network library for Python. It supports several
types of neural networks.
●
For installation:
●
Like other machine learning techniques, a neural network need to
be trained. Can be tested. And will be used to prediit results.
●
Here is an example of how to use neurolab to create a neural
network, and how to perform the fore-mentioned tasks:
The samples are uniformally
distributed from -0.5
(included) to 0.5 (excluded)
Input layer with 2 neurons,
hidden layer with 5
neurons, and output layer
with 1 neuron
Length of the outer list
Equals to the number of
Neurons of the input layer

5
1-Introduitionto
ANN
[By Amina Delali]
Neurolab example detailsNeurolab example details
●
Data: 10 samples described by 2 features. The labels are the sum
of the 2 features. In fact, the NN tries to model the sum function
for values ranging from -0.5 to 0.5
●
After creating the data, the steps were:
➢
Create an instance of a neural network with specifed number of
layers and neurons (nl.net.newf)
➢
Train the neural network (myNN.train)
➢
Predict the output for the value [0.2,0.1] (myNN.sim)
➢
Compute the test error (the true label is known: 0.2+0.1)

6
2-Perieptron
[By Amina Delali]
DefnitionDefnition
●
The term Perieptron refers to an input layer of data features
values, with forward weighted connections to an output layer of
one single neuron , or of multiple neurons.
●
One of the simplest form of a neuron is an LTU.
●
LTU, for Linear Threshold Unit, it is a component (neuron) that:
➢
Computes a weighted sum of its inputs: a linear function
➢
Applies a step function to the resulting sum, and outputs the
results
x1
x2
xn
Z = wt
.X = step(Z).
.
.
1 LTU
The Bias neuron is used to allow a linear function to be written as a matrix
multiplication: its multiplication by the corresponding weight w0
will represent
the intercept value b: (a1
x1
+ a2
x2
+ … + an
xn
+ b ).
One output
value:
hw
(x)=
step(Z)
Bias
neuron
Input
neuron
w0
w1
w2
wn
neuron
∑i=0
i=n
wi xi

7
2-Perieptron
[By Amina Delali]
The functionsThe functions
●
The weighted sum function, is also called the Propagation
function.
●
The step function, can be :
➢
A non-linear function, in this case, it will be called the
threshold aitivation funition (this is the case in an LTU). For
example:
➔
Heaviside step function:
heaviside(z)=
➢
A linear function, simply called aitivation funition. For
example:
➔
The identity funition: which means that the value computed
by the propagation function, is the output value of the
neuron.
➢
A semi-linear function, that is monotonous and
diferentiable. Also called aitivation funition.
0 if z<0
1 if z≥0
−1 if z<0
0 if z=0
1 if z>0
➔
Sign function:
sgn(z)=

8
2-Perieptron
[By Amina Delali]
Single Layer PerceptronSingle Layer Perceptron
●
A Single Layer Perceptron (SLP), is simply a Perceptron, with only
one layer (without counting the input layer ) .
●
So, it is composed of an input layer and an output layer. The later
one can have one ore more outputs. So, it can be used for binary
and for multi-output classifcation
●
Considering ANN in general, a Perieptron is considered as a
feedforward neural network. We are going to talk about it in the
next section.
●
An SLP an apply 2 diferent kind for rules to learn: the perieptron
rule, or the delta rule. Each of the rules is associated with a certain
type of activation function. To apply the delta rule, we need the
activation function to be diferentiable.

9
3-SingleLayer
Perieptron
[By Amina Delali]
SLP LearningSLP Learning
SLP learning
With Perceptron
rule (variant of
Hebb’s rule)
With the delta
rule (Widrow
Hoff rule)
Uses a threshold
activation
function
Uses a linear or
semi-linear
activation
function
Learning algorithm
has the same
behavior of a
stochastic gradient
descent algorithm
Offline training:
batch gradient
descent.
Online (or
sequential )
training: Stochastic
Gradient descent
algorithm

10
3-SingleLayer
Perieptron
[By Amina Delali]
SLP Learning: Perceptron RuleSLP Learning: Perceptron Rule
●
To update the weights, the following formula is used:
●
The concept is that each wrong prediction reinfories the
weight corresponding to the feature that would iontributed to
the iorreit prediition. The computation of the weights is
repeated until the samples are classifed correctly.
wi, j
(next step)
=wi, j+η( yj− ^y j)xi
The weight of the
connection between the
input node i and the output
node j, for the next training
sample.
The learning rate
True value the
node j for the
current training
sample.
The true value at the
node j, for the current
training sample.
ith
input value (ith
feature) for the
current training
sample.
The weight of the connection
between the input node i and the
output node j, for the current training
sample

11
3-SingleLayer
Perieptron
[By Amina Delali]
Gradient DescentGradient Descent
●
The concept of the gradient desient is:
➢
With initial parameters of a model, predict an output value
➢
Compute the gradient of the “error” (“loss”) function (function
of the parameters of the learning model) at a certain point= the
slope of the surface of that function at that point calculated by its
derivative at that point.
➢
Update the parameters in order to fnd the loial minima by a
step proportional to the negative of that gradient. (opposite
direction ==> toward the local minima of the function). In the
case of :
➔
A stoihastii gradient desient: with one sample, prediit
→ update the parameters for the next sample → predict with
the next sample with the new parameters
➔
A Batih gradient desient predict for all samples ==1
epoih → update the parameters → predict again with the new
parameters
➢
Repeat the process in order to minimize the error.

12
3-SingleLayer
Perieptron
[By Amina Delali]
SLP Learning: Delta RuleSLP Learning: Delta Rule
●
With the activation rule being linear or semi-linear but
diferentiable, the gradient descent is used to update the weights.
●
The weights are updated as follow:
➢
In general:
➢
In a case of a linear aitivation funition, and a Sum-Squared
error function
➔
In Online training
for a given sample:
➔
In Offline training
Δ wi, j=η⋅xi⋅( yj− ^y j)=η⋅δj
wi , j
next
=wi, j+Δwi, j
This is why it
is called the
delta rule
Δ wi, j=η⋅∑
s∈X
xi
s
⋅( yj
s
− ^yj
s
)=η⋅∑
s∈X
xi
s
⋅δj
s
Learning rate
The true label that
should have the
output neuron j, for
the sample s
Predicted
label for
output neuron
j, for the
sample s
The input
feature i value
of the sample s
The difference here with
the perceptron rule is the
type of activation
function used
Δ w=
−η⋅∂ E
∂w

13
4-SingleLayer
Perieptronexamples
[By Amina Delali]
With sklearn: perceptron ruleWith sklearn: perceptron rule
●
●
Eta0==0.1 → Learning rate = 0.1
●
max_iter== 5000 → number of iterations if there
is no convergence
●
Shuffle == False → there is no shuffle after a
weight is updated for the next sample
The fact that after each weight
update the samples were
shuffled, only 20 iterations were
sufficient to reach convergence.
Sklearn uses Hinge
loss function with
threshol = 0 to compute
the error of the
prediction

14
4-SingleLayer
Perieptronexamples
[By Amina Delali]
With sklearn: perceptron ruleWith sklearn: perceptron rule
●
Use of stochastic
gradient descent
classifier class
Since there is 3
classes, and
since the
classification
strategy used is
one vs all, the
classifier will run
on 3 times. Each
time with max
epochs == 20

15
4-SingleLayer
Perieptronexamples
[By Amina Delali]
Example with neurolab: delta ruleExample with neurolab: delta rule
Since we have 4 features → we need an input layer
with 4 neurons. And since we have 3 classes, and
since we will use SoftMax activation function (ideal for
multi-class classification), we will need 3 output neuron:
Class 0 coded as 1 0 0
NB: the classes are not coded in a binary code.
The corresponding formula
of SoftMax function is :
f (xi)=
ei
x
∑
j=0
N
ej
x
N is the Number of samples

16
4-SingleLayer
Perieptronexamples
[By Amina Delali]
Example with neurolab: delta ruleExample with neurolab: delta rule
●
Third class
→ third
column == 1
Select the index
corresponding to
the maximum
probability as
the predicted
class label
This
implementa
-tion in
neurolab
uses the
sum
squared
error as
cost
function.
And it
doesn’t
use the
derivative
of the
activation
function
used

17
5-MultiLayer
Perieptron
[By Amina Delali]
Multi-Layer PerceptronMulti-Layer Perceptron
●
An MLP (Multi-Layer Perceptron) is a Perietron with one or
more hidden layers.
●
It is another Feed Forwad Artifiial neural network:. Each of
the layers (exiept the output layer) includes a bias neuron.
●
An ANN with more than one hidden layer is a Deep Neural
Network: ( DNN ).
●
Lets consider this MLP: input layer with 4 neurons,
second(hidden) layer with 2 neurons (for both frst and second
layer the bias neurons are not counted). The last layer is
composed of 3 neurons:
x1
x2
1
x3
x4
Z1
= w1
t
.X step1
(Z1
)
Z2
= w2
t
.X step1
(Z2
)
1
Z3
= w3
t
.h step2
(Z3
)
Z4
= w4
t
.h step2
(Z4
)
Z5
= w5
t
.h step2
(Z5
)
Forward propagation
Backpropagation

18
5-MultiLayer
Perieptron
[By Amina Delali]
BackpropagationBackpropagation
●
It is a generalization of the delta rule. After a forward pass, a
baik:ward pass is applied to update the weights to back
propagate the errors, using gradient desient procedure.
●
This forward/backward passes are repeated until the error function
is minimized
●
The formula (of the generalized delta rule) is:
Δwk, h
=ηok δh
The update
value for the
weights of
the
connection
between the
neuron k and
neuron h
Output of
the
neuron k
δh=
{
´f act (neth)⋅( yh− ^yh)(h is an output neuron)
´f act (neth)⋅∑
l∈L
δwh, l
(h is a neuron of a hidden layer)
k h l2
l3
l1
The connection
with the weights
to update
k h
The connection
with the weights
to update
The derivative of
the activation
function of the
neuron h
Inputs of neuron
h: result of
propagation
function at h
True label
Predicted
label

19
5-MultiLayer
Perieptron
[By Amina Delali]
ExampleExample
●
Activation function
for the second
(first hidden) layer
Activation function
for the third
(output) layer
TanSig formula:
y=
2
1+e
−2n −1
The
accuracy of
training is
equal = 1.0
(100%) but
with test data
we got less
good results
as with the
SLP ==>
over-fitting

20
6-ANNtopologies
[By Amina Delali]
FeedForward Neural NetworksFeedForward Neural Networks
●
We have already seen feedforward neural network:s (SLP and
MLP) ●
One input layer + n hidden layers + one output layer (n>=1)
●
Connection are only allowed to neurons of the following layers
●
They can have shortcut connections: the connection are not
set to the following layers but to subsequent layers
1 2
3 4
5 6 7
The corresponding
Hinton Diagram
The white
squares will
correspond
to the
existing
connections
Connections
from the First
neuron: it has
only
connections to
the third and
fourth neuron

21
6-ANNtopologies
[By Amina Delali]
Recurrent NetworksRecurrent Networks
●
Multiple layers with connections allowed to neurons
of the following layers
●
A neuron can also be connected to itself
1 2
3 4
5 6 7
Direct Recurrence
For visualization
purposes the
recurrent
connections are
represented by
smaller squares
Indirect Recurrence
1 2
3 4
5 7
●
Multiple layers with
connections allowed to
neurons of the following
layers
●
Connections are also
allowed between neurons
and preceding layer6

22
6-ANNtopologies
[By Amina Delali]
Fully connected Neural NetworkFully connected Neural Network
●
Multiple layers with connections allowed
from any neuron to any other neuron
●
Direct recurrence is not allowed
●
Connections must be symmetric

Referenies
●
Joshi Prateek. Artifcial intelligence with Python. Packt Publishing, 2017.
●
Jake VanderPlas. Python data science handbook: essential tools for working
with data. O’Reilly Media, Inc, 2017.
●
Neural Networks – II, Version 2 CSE IIT, Kharagpur, available at:
https://nptel.ac.in/courses/106105078/pdf/Lesson%2038.pdf
●
Isaac Changhau, Loss Functions in Neural Networks, 2017, available at:
https://isaacchanghau.github.io/post/loss_functions/
●
Sebastian Seung, The delta rule, MIT Department of Brain and Cognitive
Sciences , Introduction to Neural Networks, 2005,
https://ocw.mit.edu/courses/brain-and-cognitive-sciences/9-641j-introduction
-to-neural-networks-spring-2005/lecture-notes/lec19_delta.pdf
●
NeuPy, Neural Networks in Python,http://neupy.com/pages/home.html

Aaa ped-22-Artificial Neural Network: Introduction to ANN

More Related Content

Aaa ped-22-Artificial Neural Network: Introduction to ANN