Artifiial Neural Network:
Introduition to ANN
AAA-Python Edition


1- Introduction to ANN
2- Perceptron
3- Single Layer Perceptron
4- Single Layer Perceptron examples
5- Multi Layer Perceptron
6- ANN topologies


[By Amina Delali]
The idea of an Artifiial Neural Network: (ANN) is to build a
model based on the way the human brain learns new things.
It can be used in any type of maihine learning. It learns by
extracting the diferent underlying patterns in a given data.
This extraction is performed by stages, called layers. Each layer,
composed by a set of neurons, will identify a certain pattern. The
following layer, will identify another more iomplex pattern, from
its previous layer.
The most common architecture is as follow: a frst layer, has the
training data as input. It is called the input layer. In the last one,
the output of the neurons are the fnal output. It is called the
output layer. The layers in between, are called hidden layers.
From now, the term neural network: will mean artifiial neural


[By Amina Delali]
Neurolab is Neural Network library for Python. It supports several
types of neural networks.
For installation:
Like other machine learning techniques, a neural network need to
be trained. Can be tested. And will be used to prediit results.
Here is an example of how to use neurolab to create a neural
network, and how to perform the fore-mentioned tasks:
The samples are uniformally
distributed from -0.5
(included) to 0.5 (excluded)
Input layer with 2 neurons,
hidden layer with 5
neurons, and output layer
with 1 neuron
Length of the outer list
Equals to the number of
Neurons of the input layer


[By Amina Delali]
Neurolab example detailsNeurolab example details
Data: 10 samples described by 2 features. The labels are the sum
of the 2 features. In fact, the NN tries to model the sum function
for values ranging from -0.5 to 0.5
After creating the data, the steps were:
Create an instance of a neural network with specifed number of
layers and neurons (nl.net.newf)
Train the neural network (myNN.train)
Predict the output for the value [0.2,0.1] (myNN.sim)
Compute the test error (the true label is known: 0.2+0.1)


[By Amina Delali]
The term Perieptron refers to an input layer of data features
values, with forward weighted connections to an output layer of
one single neuron , or of multiple neurons.
One of the simplest form of a neuron is an LTU.
LTU, for Linear Threshold Unit, it is a component (neuron) that:
Computes a weighted sum of its inputs: a linear function
Applies a step function to the resulting sum, and outputs the
Z = wt
.X = step(Z).
The Bias neuron is used to allow a linear function to be written as a matrix
multiplication: its multiplication by the corresponding weight w0
will represent
the intercept value b: (a1
+ a2
+ … + an
+ b ).
One output
wi xi


[By Amina Delali]
The functionsThe functions
The weighted sum function, is also called the Propagation
The step function, can be :
A non-linear function, in this case, it will be called the
threshold aitivation funition (this is the case in an LTU). For
Heaviside step function:
A linear function, simply called aitivation funition. For
The identity funition: which means that the value computed
by the propagation function, is the output value of the
A semi-linear function, that is monotonous and
diferentiable. Also called aitivation funition.
0 if z<0
1 if z≥0
−1 if z<0
0 if z=0
1 if z>0
Sign function:


[By Amina Delali]
Single Layer PerceptronSingle Layer Perceptron
A Single Layer Perceptron (SLP), is simply a Perceptron, with only
one layer (without counting the input layer ) .
So, it is composed of an input layer and an output layer. The later
one can have one ore more outputs. So, it can be used for binary
and for multi-output classifcation
Considering ANN in general, a Perieptron is considered as a
feedforward neural network. We are going to talk about it in the
next section.
An SLP an apply 2 diferent kind for rules to learn: the perieptron
rule, or the delta rule. Each of the rules is associated with a certain
type of activation function. To apply the delta rule, we need the
activation function to be diferentiable.


[By Amina Delali]
SLP LearningSLP Learning
SLP learning
With Perceptron
rule (variant of
Hebb’s rule)
With the delta
rule (Widrow
Hoff rule)
Uses a threshold
Uses a linear or
Learning algorithm
has the same
behavior of a
stochastic gradient
descent algorithm
Offline training:
batch gradient
Online (or
sequential )
training: Stochastic
Gradient descent


[By Amina Delali]
SLP Learning: Perceptron RuleSLP Learning: Perceptron Rule
To update the weights, the following formula is used:
The concept is that each wrong prediction reinfories the
weight corresponding to the feature that would iontributed to
the iorreit prediition. The computation of the weights is
repeated until the samples are classifed correctly.
wi, j
(next step)
=wi, j+η( yj− ^y j)xi
The weight of the
connection between the
input node i and the output
node j, for the next training
The learning rate
True value the
node j for the
current training
The true value at the
node j, for the current
training sample.
input value (ith
feature) for the
current training
The weight of the connection
between the input node i and the
output node j, for the current training


[By Amina Delali]
Gradient DescentGradient Descent
The concept of the gradient desient is:
With initial parameters of a model, predict an output value
Compute the gradient of the “error” (“loss”) function (function
of the parameters of the learning model) at a certain point= the
slope of the surface of that function at that point calculated by its
derivative at that point.
Update the parameters in order to fnd the loial minima by a
step proportional to the negative of that gradient. (opposite
direction ==> toward the local minima of the function). In the
case of :
A stoihastii gradient desient: with one sample, prediit
→ update the parameters for the next sample → predict with
the next sample with the new parameters
A Batih gradient desient predict for all samples ==1
epoih → update the parameters → predict again with the new
Repeat the process in order to minimize the error.


[By Amina Delali]
SLP Learning: Delta RuleSLP Learning: Delta Rule
With the activation rule being linear or semi-linear but
diferentiable, the gradient descent is used to update the weights.
The weights are updated as follow:
In general:
In a case of a linear aitivation funition, and a Sum-Squared
error function
In Online training
for a given sample:
In Offline training
Δ wi, j=η⋅xi⋅( yj− ^y j)=η⋅δj
wi , j
=wi, j+Δwi, j
This is why it
is called the
delta rule
Δ wi, j=η⋅∑
⋅( yj
− ^yj
Learning rate
The true label that
should have the
output neuron j, for
the sample s
label for
output neuron
j, for the
sample s
The input
feature i value
of the sample s
The difference here with
the perceptron rule is the
type of activation
function used
Δ w=
−η⋅∂ E


[By Amina Delali]
With sklearn: perceptron ruleWith sklearn: perceptron rule
Eta0==0.1 → Learning rate = 0.1
max_iter== 5000 → number of iterations if there
is no convergence
Shuffle == False → there is no shuffle after a
weight is updated for the next sample
The fact that after each weight
update the samples were
shuffled, only 20 iterations were
sufficient to reach convergence.
Sklearn uses Hinge
loss function with
threshol = 0 to compute
the error of the


[By Amina Delali]
With sklearn: perceptron ruleWith sklearn: perceptron rule
Use of stochastic
gradient descent
classifier class
Since there is 3
classes, and
since the
strategy used is
one vs all, the
classifier will run
on 3 times. Each
time with max
epochs == 20


[By Amina Delali]
Example with neurolab: delta ruleExample with neurolab: delta rule
Since we have 4 features → we need an input layer
with 4 neurons. And since we have 3 classes, and
since we will use SoftMax activation function (ideal for
multi-class classification), we will need 3 output neuron:
Class 0 coded as 1 0 0
Class 1 coded as 0 1 0
Class 2 coded as 0 0 1
NB: the classes are not coded in a binary code.
The corresponding formula
of SoftMax function is :
f (xi)=
N is the Number of samples


[By Amina Delali]
Example with neurolab: delta ruleExample with neurolab: delta rule
Third class
→ third
column == 1
Select the index
corresponding to
the maximum
probability as
the predicted
class label
-tion in
uses the
error as
And it
use the
of the


[By Amina Delali]
Multi-Layer PerceptronMulti-Layer Perceptron
An MLP (Multi-Layer Perceptron) is a Perietron with one or
more hidden layers.
It is another Feed Forwad Artifiial neural network:. Each of
the layers (exiept the output layer) includes a bias neuron.
An ANN with more than one hidden layer is a Deep Neural
Network: ( DNN ).
Lets consider this MLP: input layer with 4 neurons,
second(hidden) layer with 2 neurons (for both frst and second
layer the bias neurons are not counted). The last layer is
composed of 3 neurons:
= w1
.X step1
= w2
.X step1
= w3
.h step2
= w4
.h step2
= w5
.h step2
Forward propagation


[By Amina Delali]
It is a generalization of the delta rule. After a forward pass, a
baik:ward pass is applied to update the weights to back
propagate the errors, using gradient desient procedure.
This forward/backward passes are repeated until the error function
is minimized
The formula (of the generalized delta rule) is:
Δwk, h
=ηok δh
The update
value for the
weights of
between the
neuron k and
neuron h
Output of
neuron k
´f act (neth)⋅( yh− ^yh)(h is an output neuron)
´f act (neth)⋅∑
δwh, l
(h is a neuron of a hidden layer)
k h l2
The connection
with the weights
to update
k h
The connection
with the weights
to update
The derivative of
the activation
function of the
neuron h
Inputs of neuron
h: result of
function at h
True label


[By Amina Delali]
Activation function
for the second
(first hidden) layer
Activation function
for the third
(output) layer
TanSig formula:
−2n −1
accuracy of
training is
equal = 1.0
(100%) but
with test data
we got less
good results
as with the
SLP ==>


[By Amina Delali]
FeedForward Neural NetworksFeedForward Neural Networks
We have already seen feedforward neural network:s (SLP and
MLP) ●
One input layer + n hidden layers + one output layer (n>=1)
Connection are only allowed to neurons of the following layers
They can have shortcut connections: the connection are not
set to the following layers but to subsequent layers
1 2
3 4
5 6 7
The corresponding
Hinton Diagram
The white
squares will
to the
from the First
neuron: it has
connections to
the third and
fourth neuron


[By Amina Delali]
Recurrent NetworksRecurrent Networks
Multiple layers with connections allowed to neurons
of the following layers
A neuron can also be connected to itself
1 2
3 4
5 6 7
Direct Recurrence
For visualization
purposes the
connections are
represented by
smaller squares
Indirect Recurrence
1 2
3 4
5 7
Multiple layers with
connections allowed to
neurons of the following
Connections are also
allowed between neurons
and preceding layer6


[By Amina Delali]
Fully connected Neural NetworkFully connected Neural Network
Multiple layers with connections allowed
from any neuron to any other neuron
Direct recurrence is not allowed
Connections must be symmetric


