Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

Artifiial Neural Network:
Introduition to ANN
AAA-Python Edition

2

Plan
●
1- Introduction to ANN
●
2- Perceptron
●
3- Single Layer Perceptron
●
4- Single Layer Perceptron examples
●
5- Multi Layer Perceptron
●
6- ANN topologies

3

3
1-Introduitionto
ANN
[By Amina Delali]
ConceptConcept
●
The idea of an Artifiial Neural Network: (ANN) is to build a
model based on the way the human brain learns new things.
●
It can be used in any type of maihine learning. It learns by
extracting the diferent underlying patterns in a given data.
●
This extraction is performed by stages, called layers. Each layer,
composed by a set of neurons, will identify a certain pattern. The
following layer, will identify another more iomplex pattern, from
its previous layer.
●
The most common architecture is as follow: a frst layer, has the
training data as input. It is called the input layer. In the last one,
the output of the neurons are the fnal output. It is called the
output layer. The layers in between, are called hidden layers.
●
From now, the term neural network: will mean artifiial neural
network:.

4

4
1-Introduitionto
ANN
[By Amina Delali]
NeurolabNeurolab
●
Neurolab is Neural Network library for Python. It supports several
types of neural networks.
●
For installation:
●
Like other machine learning techniques, a neural network need to
be trained. Can be tested. And will be used to prediit results.
●
Here is an example of how to use neurolab to create a neural
network, and how to perform the fore-mentioned tasks:
The samples are uniformally
distributed from -0.5
(included) to 0.5 (excluded)
Input layer with 2 neurons,
hidden layer with 5
neurons, and output layer
with 1 neuron
Length of the outer list
Equals to the number of
Neurons of the input layer

5

5
1-Introduitionto
ANN
[By Amina Delali]
Neurolab example detailsNeurolab example details
●
Data: 10 samples described by 2 features. The labels are the sum
of the 2 features. In fact, the NN tries to model the sum function
for values ranging from -0.5 to 0.5
●
After creating the data, the steps were:
➢
Create an instance of a neural network with specifed number of
layers and neurons (nl.net.newf)
➢
Train the neural network (myNN.train)
➢
Predict the output for the value [0.2,0.1] (myNN.sim)
➢
Compute the test error (the true label is known: 0.2+0.1)

6

6
2-Perieptron
[By Amina Delali]
DefnitionDefnition
●
The term Perieptron refers to an input layer of data features
values, with forward weighted connections to an output layer of
one single neuron , or of multiple neurons.
●
One of the simplest form of a neuron is an LTU.
●
LTU, for Linear Threshold Unit, it is a component (neuron) that:
➢
Computes a weighted sum of its inputs: a linear function
➢
Applies a step function to the resulting sum, and outputs the
results
x1
x2
xn
Z = wt
.X = step(Z).
.
.
1 LTU
The Bias neuron is used to allow a linear function to be written as a matrix
multiplication: its multiplication by the corresponding weight w0
will represent
the intercept value b: (a1
x1
+ a2
x2
+ … + an
xn
+ b ).
One output
value:
hw
(x)=
step(Z)
Bias
neuron
Input
neuron
w0
w1
w2
wn
neuron
∑i=0
i=n
wi xi

7

7
2-Perieptron
[By Amina Delali]
The functionsThe functions
●
The weighted sum function, is also called the Propagation
function.
●
The step function, can be :
➢
A non-linear function, in this case, it will be called the
threshold aitivation funition (this is the case in an LTU). For
example:
➔
Heaviside step function:
heaviside(z)=
➢
A linear function, simply called aitivation funition. For
example:
➔
The identity funition: which means that the value computed
by the propagation function, is the output value of the
neuron.
➢
A semi-linear function, that is monotonous and
diferentiable. Also called aitivation funition.
0 if z<0
1 if z≥0
−1 if z<0
0 if z=0
1 if z>0
➔
Sign function:
sgn(z)=

8

8
2-Perieptron
[By Amina Delali]
Single Layer PerceptronSingle Layer Perceptron
●
A Single Layer Perceptron (SLP), is simply a Perceptron, with only
one layer (without counting the input layer ) .
●
So, it is composed of an input layer and an output layer. The later
one can have one ore more outputs. So, it can be used for binary
and for multi-output classifcation
●
Considering ANN in general, a Perieptron is considered as a
feedforward neural network. We are going to talk about it in the
next section.
●
An SLP an apply 2 diferent kind for rules to learn: the perieptron
rule, or the delta rule. Each of the rules is associated with a certain
type of activation function. To apply the delta rule, we need the
activation function to be diferentiable.

9

9
3-SingleLayer
Perieptron
[By Amina Delali]
SLP LearningSLP Learning
SLP learning
With Perceptron
rule (variant of
Hebb’s rule)
With the delta
rule (Widrow
Hoff rule)
Uses a threshold
activation
function
Uses a linear or
semi-linear
activation
function
Learning algorithm
has the same
behavior of a
stochastic gradient
descent algorithm
Offline training:
batch gradient
descent.
Online (or
sequential )
training: Stochastic
Gradient descent
algorithm

10

10
3-SingleLayer
Perieptron
[By Amina Delali]
SLP Learning: Perceptron RuleSLP Learning: Perceptron Rule
●
To update the weights, the following formula is used:
●
The concept is that each wrong prediction reinfories the
weight corresponding to the feature that would iontributed to
the iorreit prediition. The computation of the weights is
repeated until the samples are classifed correctly.
wi, j
(next step)
=wi, j+η( yj− ^y j)xi
The weight of the
connection between the
input node i and the output
node j, for the next training
sample.
The learning rate
True value the
node j for the
current training
sample.
The true value at the
node j, for the current
training sample.
ith
input value (ith
feature) for the
current training
sample.
The weight of the connection
between the input node i and the
output node j, for the current training
sample

11

11
3-SingleLayer
Perieptron
[By Amina Delali]
Gradient DescentGradient Descent
●
The concept of the gradient desient is:
➢
With initial parameters of a model, predict an output value
➢
Compute the gradient of the “error” (“loss”) function (function
of the parameters of the learning model) at a certain point= the
slope of the surface of that function at that point calculated by its
derivative at that point.
➢
Update the parameters in order to fnd the loial minima by a
step proportional to the negative of that gradient. (opposite
direction ==> toward the local minima of the function). In the
case of :
➔
A stoihastii gradient desient: with one sample, prediit
→ update the parameters for the next sample → predict with
the next sample with the new parameters
➔
A Batih gradient desient predict for all samples ==1
epoih → update the parameters → predict again with the new
parameters
➢
Repeat the process in order to minimize the error.

12

12
3-SingleLayer
Perieptron
[By Amina Delali]
SLP Learning: Delta RuleSLP Learning: Delta Rule
●
With the activation rule being linear or semi-linear but
diferentiable, the gradient descent is used to update the weights.
●
The weights are updated as follow:
➢
In general:
➢
In a case of a linear aitivation funition, and a Sum-Squared
error function
➔
In Online training
for a given sample:
➔
In Offline training
Δ wi, j=η⋅xi⋅( yj− ^y j)=η⋅δj
wi , j
next
=wi, j+Δwi, j
This is why it
is called the
delta rule
Δ wi, j=η⋅∑
s∈X
xi
s
⋅( yj
s
− ^yj
s
)=η⋅∑
s∈X
xi
s
⋅δj
s
Learning rate
The true label that
should have the
output neuron j, for
the sample s
Predicted
label for
output neuron
j, for the
sample s
The input
feature i value
of the sample s
The difference here with
the perceptron rule is the
type of activation
function used
Δ w=
−η⋅∂ E
∂w

13

13
4-SingleLayer
Perieptronexamples
[By Amina Delali]
With sklearn: perceptron ruleWith sklearn: perceptron rule
●
●
Eta0==0.1 → Learning rate = 0.1
●
max_iter== 5000 → number of iterations if there
is no convergence
●
Shuffle == False → there is no shuffle after a
weight is updated for the next sample
The fact that after each weight
update the samples were
shuffled, only 20 iterations were
sufficient to reach convergence.
Sklearn uses Hinge
loss function with
threshol = 0 to compute
the error of the
prediction

14

14
4-SingleLayer
Perieptronexamples
[By Amina Delali]
With sklearn: perceptron ruleWith sklearn: perceptron rule
●
Use of stochastic
gradient descent
classifier class
Since there is 3
classes, and
since the
classification
strategy used is
one vs all, the
classifier will run
on 3 times. Each
time with max
epochs == 20

15

15
4-SingleLayer
Perieptronexamples
[By Amina Delali]
Example with neurolab: delta ruleExample with neurolab: delta rule
Since we have 4 features → we need an input layer
with 4 neurons. And since we have 3 classes, and
since we will use SoftMax activation function (ideal for
multi-class classification), we will need 3 output neuron:
Class 0 coded as 1 0 0
Class 1 coded as 0 1 0
Class 2 coded as 0 0 1
NB: the classes are not coded in a binary code.
The corresponding formula
of SoftMax function is :
f (xi)=
ei
x
∑
j=0
N
ej
x
N is the Number of samples

16

16
4-SingleLayer
Perieptronexamples
[By Amina Delali]
Example with neurolab: delta ruleExample with neurolab: delta rule
●
Third class
→ third
column == 1
Select the index
corresponding to
the maximum
probability as
the predicted
class label
This
implementa
-tion in
neurolab
uses the
sum
squared
error as
cost
function.
And it
doesn’t
use the
derivative
of the
activation
function
used

17

17
5-MultiLayer
Perieptron
[By Amina Delali]
Multi-Layer PerceptronMulti-Layer Perceptron
●
An MLP (Multi-Layer Perceptron) is a Perietron with one or
more hidden layers.
●
It is another Feed Forwad Artifiial neural network:. Each of
the layers (exiept the output layer) includes a bias neuron.
●
An ANN with more than one hidden layer is a Deep Neural
Network: ( DNN ).
●
Lets consider this MLP: input layer with 4 neurons,
second(hidden) layer with 2 neurons (for both frst and second
layer the bias neurons are not counted). The last layer is
composed of 3 neurons:
x1
x2
1
x3
x4
Z1
= w1
t
.X step1
(Z1
)
Z2
= w2
t
.X step1
(Z2
)
1
Z3
= w3
t
.h step2
(Z3
)
Z4
= w4
t
.h step2
(Z4
)
Z5
= w5
t
.h step2
(Z5
)
Forward propagation
Backpropagation

18

18
5-MultiLayer
Perieptron
[By Amina Delali]
BackpropagationBackpropagation
●
It is a generalization of the delta rule. After a forward pass, a
baik:ward pass is applied to update the weights to back
propagate the errors, using gradient desient procedure.
●
This forward/backward passes are repeated until the error function
is minimized
●
The formula (of the generalized delta rule) is:
Δwk, h
=ηok δh
The update
value for the
weights of
the
connection
between the
neuron k and
neuron h
Output of
the
neuron k
δh=
{
´f act (neth)⋅( yh− ^yh)(h is an output neuron)
´f act (neth)⋅∑
l∈L
δwh, l
(h is a neuron of a hidden layer)
k h l2
l3
l1
The connection
with the weights
to update
k h
The connection
with the weights
to update
The derivative of
the activation
function of the
neuron h
Inputs of neuron
h: result of
propagation
function at h
True label
Predicted
label

19

19
5-MultiLayer
Perieptron
[By Amina Delali]
ExampleExample
●
Activation function
for the second
(first hidden) layer
Activation function
for the third
(output) layer
TanSig formula:
y=
2
1+e
−2n −1
The
accuracy of
training is
equal = 1.0
(100%) but
with test data
we got less
good results
as with the
SLP ==>
over-fitting

20

20
6-ANNtopologies
[By Amina Delali]
FeedForward Neural NetworksFeedForward Neural Networks
●
We have already seen feedforward neural network:s (SLP and
MLP) ●
One input layer + n hidden layers + one output layer (n>=1)
●
Connection are only allowed to neurons of the following layers
●
They can have shortcut connections: the connection are not
set to the following layers but to subsequent layers
1 2
3 4
5 6 7
The corresponding
Hinton Diagram
The white
squares will
correspond
to the
existing
connections
Connections
from the First
neuron: it has
only
connections to
the third and
fourth neuron

21

21
6-ANNtopologies
[By Amina Delali]
Recurrent NetworksRecurrent Networks
●
Multiple layers with connections allowed to neurons
of the following layers
●
A neuron can also be connected to itself
1 2
3 4
5 6 7
Direct Recurrence
For visualization
purposes the
recurrent
connections are
represented by
smaller squares
Indirect Recurrence
1 2
3 4
5 7
●
Multiple layers with
connections allowed to
neurons of the following
layers
●
Connections are also
allowed between neurons
and preceding layer6

22

22
6-ANNtopologies
[By Amina Delali]
Fully connected Neural NetworkFully connected Neural Network
●
Multiple layers with connections allowed
from any neuron to any other neuron
●
Direct recurrence is not allowed
●
Connections must be symmetric

23

Referenies
●
Joshi Prateek. Artifcial intelligence with Python. Packt Publishing, 2017.
●
Jake VanderPlas. Python data science handbook: essential tools for working
with data. O’Reilly Media, Inc, 2017.
●
Neural Networks – II, Version 2 CSE IIT, Kharagpur, available at:
https://nptel.ac.in/courses/106105078/pdf/Lesson%2038.pdf
●
Isaac Changhau, Loss Functions in Neural Networks, 2017, available at:
https://isaacchanghau.github.io/post/loss_functions/
●
Sebastian Seung, The delta rule, MIT Department of Brain and Cognitive
Sciences , Introduction to Neural Networks, 2005,
https://ocw.mit.edu/courses/brain-and-cognitive-sciences/9-641j-introduction
-to-neural-networks-spring-2005/lecture-notes/lec19_delta.pdf
●
NeuPy, Neural Networks in Python,http://neupy.com/pages/home.html

24

Thank:
you!
FOR ALL YOUR TIME

More Related Content

Aaa ped-22-Artificial Neural Network: Introduction to ANN

  • 1. Artifiial Neural Network: Introduition to ANN AAA-Python Edition
  • 2. Plan ● 1- Introduction to ANN ● 2- Perceptron ● 3- Single Layer Perceptron ● 4- Single Layer Perceptron examples ● 5- Multi Layer Perceptron ● 6- ANN topologies
  • 3. 3 1-Introduitionto ANN [By Amina Delali] ConceptConcept ● The idea of an Artifiial Neural Network: (ANN) is to build a model based on the way the human brain learns new things. ● It can be used in any type of maihine learning. It learns by extracting the diferent underlying patterns in a given data. ● This extraction is performed by stages, called layers. Each layer, composed by a set of neurons, will identify a certain pattern. The following layer, will identify another more iomplex pattern, from its previous layer. ● The most common architecture is as follow: a frst layer, has the training data as input. It is called the input layer. In the last one, the output of the neurons are the fnal output. It is called the output layer. The layers in between, are called hidden layers. ● From now, the term neural network: will mean artifiial neural network:.
  • 4. 4 1-Introduitionto ANN [By Amina Delali] NeurolabNeurolab ● Neurolab is Neural Network library for Python. It supports several types of neural networks. ● For installation: ● Like other machine learning techniques, a neural network need to be trained. Can be tested. And will be used to prediit results. ● Here is an example of how to use neurolab to create a neural network, and how to perform the fore-mentioned tasks: The samples are uniformally distributed from -0.5 (included) to 0.5 (excluded) Input layer with 2 neurons, hidden layer with 5 neurons, and output layer with 1 neuron Length of the outer list Equals to the number of Neurons of the input layer
  • 5. 5 1-Introduitionto ANN [By Amina Delali] Neurolab example detailsNeurolab example details ● Data: 10 samples described by 2 features. The labels are the sum of the 2 features. In fact, the NN tries to model the sum function for values ranging from -0.5 to 0.5 ● After creating the data, the steps were: ➢ Create an instance of a neural network with specifed number of layers and neurons (nl.net.newf) ➢ Train the neural network (myNN.train) ➢ Predict the output for the value [0.2,0.1] (myNN.sim) ➢ Compute the test error (the true label is known: 0.2+0.1)
  • 6. 6 2-Perieptron [By Amina Delali] DefnitionDefnition ● The term Perieptron refers to an input layer of data features values, with forward weighted connections to an output layer of one single neuron , or of multiple neurons. ● One of the simplest form of a neuron is an LTU. ● LTU, for Linear Threshold Unit, it is a component (neuron) that: ➢ Computes a weighted sum of its inputs: a linear function ➢ Applies a step function to the resulting sum, and outputs the results x1 x2 xn Z = wt .X = step(Z). . . 1 LTU The Bias neuron is used to allow a linear function to be written as a matrix multiplication: its multiplication by the corresponding weight w0 will represent the intercept value b: (a1 x1 + a2 x2 + … + an xn + b ). One output value: hw (x)= step(Z) Bias neuron Input neuron w0 w1 w2 wn neuron ∑i=0 i=n wi xi
  • 7. 7 2-Perieptron [By Amina Delali] The functionsThe functions ● The weighted sum function, is also called the Propagation function. ● The step function, can be : ➢ A non-linear function, in this case, it will be called the threshold aitivation funition (this is the case in an LTU). For example: ➔ Heaviside step function: heaviside(z)= ➢ A linear function, simply called aitivation funition. For example: ➔ The identity funition: which means that the value computed by the propagation function, is the output value of the neuron. ➢ A semi-linear function, that is monotonous and diferentiable. Also called aitivation funition. 0 if z<0 1 if z≥0 −1 if z<0 0 if z=0 1 if z>0 ➔ Sign function: sgn(z)=
  • 8. 8 2-Perieptron [By Amina Delali] Single Layer PerceptronSingle Layer Perceptron ● A Single Layer Perceptron (SLP), is simply a Perceptron, with only one layer (without counting the input layer ) . ● So, it is composed of an input layer and an output layer. The later one can have one ore more outputs. So, it can be used for binary and for multi-output classifcation ● Considering ANN in general, a Perieptron is considered as a feedforward neural network. We are going to talk about it in the next section. ● An SLP an apply 2 diferent kind for rules to learn: the perieptron rule, or the delta rule. Each of the rules is associated with a certain type of activation function. To apply the delta rule, we need the activation function to be diferentiable.
  • 9. 9 3-SingleLayer Perieptron [By Amina Delali] SLP LearningSLP Learning SLP learning With Perceptron rule (variant of Hebb’s rule) With the delta rule (Widrow Hoff rule) Uses a threshold activation function Uses a linear or semi-linear activation function Learning algorithm has the same behavior of a stochastic gradient descent algorithm Offline training: batch gradient descent. Online (or sequential ) training: Stochastic Gradient descent algorithm
  • 10. 10 3-SingleLayer Perieptron [By Amina Delali] SLP Learning: Perceptron RuleSLP Learning: Perceptron Rule ● To update the weights, the following formula is used: ● The concept is that each wrong prediction reinfories the weight corresponding to the feature that would iontributed to the iorreit prediition. The computation of the weights is repeated until the samples are classifed correctly. wi, j (next step) =wi, j+η( yj− ^y j)xi The weight of the connection between the input node i and the output node j, for the next training sample. The learning rate True value the node j for the current training sample. The true value at the node j, for the current training sample. ith input value (ith feature) for the current training sample. The weight of the connection between the input node i and the output node j, for the current training sample
  • 11. 11 3-SingleLayer Perieptron [By Amina Delali] Gradient DescentGradient Descent ● The concept of the gradient desient is: ➢ With initial parameters of a model, predict an output value ➢ Compute the gradient of the “error” (“loss”) function (function of the parameters of the learning model) at a certain point= the slope of the surface of that function at that point calculated by its derivative at that point. ➢ Update the parameters in order to fnd the loial minima by a step proportional to the negative of that gradient. (opposite direction ==> toward the local minima of the function). In the case of : ➔ A stoihastii gradient desient: with one sample, prediit → update the parameters for the next sample → predict with the next sample with the new parameters ➔ A Batih gradient desient predict for all samples ==1 epoih → update the parameters → predict again with the new parameters ➢ Repeat the process in order to minimize the error.
  • 12. 12 3-SingleLayer Perieptron [By Amina Delali] SLP Learning: Delta RuleSLP Learning: Delta Rule ● With the activation rule being linear or semi-linear but diferentiable, the gradient descent is used to update the weights. ● The weights are updated as follow: ➢ In general: ➢ In a case of a linear aitivation funition, and a Sum-Squared error function ➔ In Online training for a given sample: ➔ In Offline training Δ wi, j=η⋅xi⋅( yj− ^y j)=η⋅δj wi , j next =wi, j+Δwi, j This is why it is called the delta rule Δ wi, j=η⋅∑ s∈X xi s ⋅( yj s − ^yj s )=η⋅∑ s∈X xi s ⋅δj s Learning rate The true label that should have the output neuron j, for the sample s Predicted label for output neuron j, for the sample s The input feature i value of the sample s The difference here with the perceptron rule is the type of activation function used Δ w= −η⋅∂ E ∂w
  • 13. 13 4-SingleLayer Perieptronexamples [By Amina Delali] With sklearn: perceptron ruleWith sklearn: perceptron rule ● ● Eta0==0.1 → Learning rate = 0.1 ● max_iter== 5000 → number of iterations if there is no convergence ● Shuffle == False → there is no shuffle after a weight is updated for the next sample The fact that after each weight update the samples were shuffled, only 20 iterations were sufficient to reach convergence. Sklearn uses Hinge loss function with threshol = 0 to compute the error of the prediction
  • 14. 14 4-SingleLayer Perieptronexamples [By Amina Delali] With sklearn: perceptron ruleWith sklearn: perceptron rule ● Use of stochastic gradient descent classifier class Since there is 3 classes, and since the classification strategy used is one vs all, the classifier will run on 3 times. Each time with max epochs == 20
  • 15. 15 4-SingleLayer Perieptronexamples [By Amina Delali] Example with neurolab: delta ruleExample with neurolab: delta rule Since we have 4 features → we need an input layer with 4 neurons. And since we have 3 classes, and since we will use SoftMax activation function (ideal for multi-class classification), we will need 3 output neuron: Class 0 coded as 1 0 0 Class 1 coded as 0 1 0 Class 2 coded as 0 0 1 NB: the classes are not coded in a binary code. The corresponding formula of SoftMax function is : f (xi)= ei x ∑ j=0 N ej x N is the Number of samples
  • 16. 16 4-SingleLayer Perieptronexamples [By Amina Delali] Example with neurolab: delta ruleExample with neurolab: delta rule ● Third class → third column == 1 Select the index corresponding to the maximum probability as the predicted class label This implementa -tion in neurolab uses the sum squared error as cost function. And it doesn’t use the derivative of the activation function used
  • 17. 17 5-MultiLayer Perieptron [By Amina Delali] Multi-Layer PerceptronMulti-Layer Perceptron ● An MLP (Multi-Layer Perceptron) is a Perietron with one or more hidden layers. ● It is another Feed Forwad Artifiial neural network:. Each of the layers (exiept the output layer) includes a bias neuron. ● An ANN with more than one hidden layer is a Deep Neural Network: ( DNN ). ● Lets consider this MLP: input layer with 4 neurons, second(hidden) layer with 2 neurons (for both frst and second layer the bias neurons are not counted). The last layer is composed of 3 neurons: x1 x2 1 x3 x4 Z1 = w1 t .X step1 (Z1 ) Z2 = w2 t .X step1 (Z2 ) 1 Z3 = w3 t .h step2 (Z3 ) Z4 = w4 t .h step2 (Z4 ) Z5 = w5 t .h step2 (Z5 ) Forward propagation Backpropagation
  • 18. 18 5-MultiLayer Perieptron [By Amina Delali] BackpropagationBackpropagation ● It is a generalization of the delta rule. After a forward pass, a baik:ward pass is applied to update the weights to back propagate the errors, using gradient desient procedure. ● This forward/backward passes are repeated until the error function is minimized ● The formula (of the generalized delta rule) is: Δwk, h =ηok δh The update value for the weights of the connection between the neuron k and neuron h Output of the neuron k δh= { ´f act (neth)⋅( yh− ^yh)(h is an output neuron) ´f act (neth)⋅∑ l∈L δwh, l (h is a neuron of a hidden layer) k h l2 l3 l1 The connection with the weights to update k h The connection with the weights to update The derivative of the activation function of the neuron h Inputs of neuron h: result of propagation function at h True label Predicted label
  • 19. 19 5-MultiLayer Perieptron [By Amina Delali] ExampleExample ● Activation function for the second (first hidden) layer Activation function for the third (output) layer TanSig formula: y= 2 1+e −2n −1 The accuracy of training is equal = 1.0 (100%) but with test data we got less good results as with the SLP ==> over-fitting
  • 20. 20 6-ANNtopologies [By Amina Delali] FeedForward Neural NetworksFeedForward Neural Networks ● We have already seen feedforward neural network:s (SLP and MLP) ● One input layer + n hidden layers + one output layer (n>=1) ● Connection are only allowed to neurons of the following layers ● They can have shortcut connections: the connection are not set to the following layers but to subsequent layers 1 2 3 4 5 6 7 The corresponding Hinton Diagram The white squares will correspond to the existing connections Connections from the First neuron: it has only connections to the third and fourth neuron
  • 21. 21 6-ANNtopologies [By Amina Delali] Recurrent NetworksRecurrent Networks ● Multiple layers with connections allowed to neurons of the following layers ● A neuron can also be connected to itself 1 2 3 4 5 6 7 Direct Recurrence For visualization purposes the recurrent connections are represented by smaller squares Indirect Recurrence 1 2 3 4 5 7 ● Multiple layers with connections allowed to neurons of the following layers ● Connections are also allowed between neurons and preceding layer6
  • 22. 22 6-ANNtopologies [By Amina Delali] Fully connected Neural NetworkFully connected Neural Network ● Multiple layers with connections allowed from any neuron to any other neuron ● Direct recurrence is not allowed ● Connections must be symmetric
  • 23. Referenies ● Joshi Prateek. Artifcial intelligence with Python. Packt Publishing, 2017. ● Jake VanderPlas. Python data science handbook: essential tools for working with data. O’Reilly Media, Inc, 2017. ● Neural Networks – II, Version 2 CSE IIT, Kharagpur, available at: https://nptel.ac.in/courses/106105078/pdf/Lesson%2038.pdf ● Isaac Changhau, Loss Functions in Neural Networks, 2017, available at: https://isaacchanghau.github.io/post/loss_functions/ ● Sebastian Seung, The delta rule, MIT Department of Brain and Cognitive Sciences , Introduction to Neural Networks, 2005, https://ocw.mit.edu/courses/brain-and-cognitive-sciences/9-641j-introduction -to-neural-networks-spring-2005/lecture-notes/lec19_delta.pdf ● NeuPy, Neural Networks in Python,http://neupy.com/pages/home.html