Introduction To Neurons and Artificial Neural Networks
Introduction To Neurons and Artificial Neural Networks
The human brain is composed of 86 billion nerve cells called neurons. They are connected
to other thousand cells by Axons. Stimuli from external environment or inputs from sensory
organs are accepted by dendrites. These inputs create electric impulses, which quickly
travel through the neural network. A neuron can then send the message to other neuron to
handle the issue or does not send it forward.
Above is the image of a single neuron which has axon as output part, dendrites as input
part.
Similar to the natural neuron, we use mathematical neuron or node in Artificial Neural
Networks.
Artificial Neural Networks – TechTrunk Ventures Pvt. Ltd.
Similar to natural neuron the mathematical neuron also takes some inputs and does some
mathematical process and output the value.
ANNs are composed of multiple nodes, which imitate biological neurons of human brain.
The neurons are connected by links and they interact with each other. The nodes can take
input data and perform simple operations on the data. The result of these operations is
passed to other neurons. The output at each node is called its activation or node value.
Each link is associated with weight. ANNs are capable of learning, which takes place by
altering weight values.
Artificial Neural Networks- TechTrunk Ventures Pvt. Ltd.
Above is how the two natural neuron connects and pass information from one to other.
The two mathematical neurons combines by weight which shows weightage of that neuron
for making certain decisions.
Artificial Neural Networks – TechTrunk Ventures Pvt. Ltd.
There are three types of layers in a general Neural Network – input layer, hidden layer and
the output layer. Neural Network without hidden layer can be considered as a linear
Regression model or a linear model. As much as hidden layers are added to the neural
network more of non linear pattern are extracted from the data. More than required
number of hidden layer sometimes results into the problem of overfitting.
The choice of number of hidden layers and the number of neurons in those hidden layers as
to be made by the developer.
In the topology diagrams shown, each arrow represents a connection between two neurons
and indicates the pathway for the flow of information. Each connection has a weight, an
integer number that controls the signal between the two neurons.
If the network generates a “good or desired” output, there is no need to adjust the weights.
However, if the network generates a “poor or undesired” output or an error, then the
system alters the weights in order to improve subsequent results.
Artificial Neural Networks- TechTrunk Ventures Pvt. Ltd.
The simplest type of perceptron has a single layer of weights connecting the inputs and
output. In this way, it can be considered the simplest kind of feed-forward network. In a feed
forward network, the information always moves in one direction; it never goes backwards.
1. Input layer neurons are not computational neurons, the job of input layer of neurons
is to load the features in the neural network.
2. At every computational neuron i.e. every neuron at hidden layer and the output
layer two operations occur –
𝒛 = 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐 + 𝒙𝟑 𝒘𝟑 + 𝒃
𝒚𝒉𝒂𝒕 = 𝒇(𝒙)
𝑤ℎ𝑒𝑟𝑒 𝑦ℎ𝑎𝑡 𝑖𝑠 𝑡ℎ𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
Similarly, in a neural network, the bias helps in shifting the decision boundary to achieve
better predictions.
The activation function is mostly used to make a non-linear transformation which allows us
to fit nonlinear hypotheses or to estimate the complex functions. There are multiple
activation functions, like: “Sigmoid”, “Tanh”, ReLu and many other.
Artificial Neural Networks- TechTrunk Ventures Pvt. Ltd.
Sigmoid Function
𝟏
𝝈(𝒛) =
𝟏 + 𝒆−𝒛
• The function is differentiable. That means, we can find the slope of the sigmoid
curve at any two points.
• The logistic sigmoid function can cause a neural network to get stuck at the training
time.
• The softmax function is a more generalized logistic activation function which is used
for multiclass classification.
Artificial Neural Networks – TechTrunk Ventures Pvt. Ltd.
• output is zero centered because its range in between -1 to 1 i.e -1 < output < 1 .
Hence optimization is easier in this method hence in practice it is always preferred
over Sigmoid function .
Artificial Neural Networks- TechTrunk Ventures Pvt. Ltd.
Hence for output layers we should use a Softmax function for a Classification
problem to compute the probabilites for the classes , and for a regression problem it
should simply use a linear function.
Artificial Neural Networks – TechTrunk Ventures Pvt. Ltd.
Leaky ReLu
𝒙 𝒊𝒇 𝒙 < 𝟎
𝒇(𝒙) = { }
𝟎. 𝟎𝟏𝒙 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
▪ The problem with ReLu is that some gradients can be fragile during training and can
die. It can cause a weight update which will makes it never activate on any data
point again. Simply saying that ReLu could result in Dead Neurons.
+1
𝑤10 = −1.5
𝑦ℎ𝑎𝑡
𝑥1
𝑤11 = +1
𝑥2 𝑤12 = +1
𝒙𝟏 𝒙𝟐 Yhat=F(x,w)
0 0 yhat=g{1*-1.5+0*1+0*1}=g{-1.5}=0
0 1 yhat=g{1*-1.5+0*1+1*1}=g{-0.5}=0
1 0 yhat=g{1*-1.5+1*1+0*1}=g{-0.5}=0
1 1 yhat=g{1*-1.5+1*1+1*1}=g{+0.5}=1
Artificial Neural Networks – TechTrunk Ventures Pvt. Ltd.
OR Function
+1
𝑤10 = −0.5
𝑦ℎ𝑎𝑡
𝑥1
𝑤11 = +1
𝑥2 𝑤12 = +1
𝒙𝟏 𝒙𝟐 Yhat=F(x,w)
0 0 yhat=g{1*-0.5+0*1+0*1}=g{-0.5}=0
0 1 yhat=g{1*-0.5+0*1+1*1}=g{+0.5}=1
1 0 yhat=g{1*-0.5+1*1+0*1}=g{+0.5}=1
1 1 yhat=g{1*-0.5+1*1+1*1}=g{+1.5}=1
Artificial Neural Networks- TechTrunk Ventures Pvt. Ltd.
w1x1 + w2x2 - b= 0
The decision boundary is the line that separates the area where y = 0 and where y = 1. It is
created by our current model.
Artificial Neural Networks – TechTrunk Ventures Pvt. Ltd.
𝑥0
(1)
(1)
𝑤10
𝑥1 𝑤11
𝑎1
(2) (𝟐)
𝒂𝟎
(1)
(1)
𝑤31 𝑤21
𝑥0 (1) (2)
𝑤11 (2)
𝑤20 𝑤10
(1) (2)
𝑤12 (1) (2)
𝑎2 𝑤12
𝑥2 𝑤22 𝑎1
(3)
(1)
𝑤32
𝑥0 (2)
(1) 𝑤13
𝑤30
(1) (1)
𝑤13 𝑤23 (2)
𝑎3
𝑥3 (1)
𝑤33
(𝟐) 𝟏 𝟏 ( ) 𝟏 𝟏( ) ( ) ( )
𝒂𝟏 = 𝒈{𝒘𝟏𝟎 𝒙𝟎 + 𝒘𝟏𝟏 𝒙𝟏 + 𝒘𝟏𝟐 𝒙𝟐 + 𝒘𝟏𝟑 𝒙𝟑 }
Artificial Neural Networks- TechTrunk Ventures Pvt. Ltd.
(𝟐) 𝟏 ( ) 𝟏 ( )
𝟏 𝟏 ( ) ( )
𝒂𝟐 = 𝒈{𝒘𝟐𝟎 𝒙𝟎 + 𝒘𝟐𝟏 𝒙𝟏 + 𝒘𝟐𝟐 𝒙𝟐 + 𝒘𝟐𝟑 𝒙𝟑 }
(𝟐) 𝟏 ( ) 𝟏 ( )
𝟏 𝟏 ( ) ( )
𝒂𝟑 = 𝒈{𝒘𝟑𝟎 𝒙𝟎 + 𝒘𝟑𝟏 𝒙𝟏 + 𝒘𝟑𝟐 𝒙𝟐 + 𝒘𝟑𝟑 𝒙𝟑 }
Learning Rate 𝑙𝑟
Learning Rate 𝑙𝑟 controls how big step we take while updating our parameter w.
- If 𝑙𝑟 is too big, gradient descent can overshoot the minimum, it may fail to converge
This is the simplest form of gradient descent technique. Here, vanilla means pure / without
any adulteration. Its main feature is that we take small steps in the direction of the minima
by taking gradient of the cost function.
Here, we tweak the above algorithm in such a way that we pay heed to the prior step
before taking the next step.
Here, our update is the same as that of vanilla gradient descent. But we introduce a new
term called velocity, which considers the previous update and a constant which is called
momentum.
ADAGRAD
ADAGRAD uses adaptive technique for learning rate updation. In this algorithm, on the basis
of how the gradient has been changing for all the previous iterations we try to change the
learning rate.
ADAM
ADAM is one more adaptive technique which builds on adagrad and further reduces it
downside. In other words, you can consider this as momentum + ADAGRAD.
Here beta1 and beta2 are constants to keep changes in gradient and learning rate in check
When applying gradient descent, you can look at these points which might be helpful in
circumventing the problem:
- Error rates – You should check the training and testing error after specific iterations
and make sure both of them decreases. If that is not the case, there might be a
problem!
- Gradient flow in hidden layers – Check if the network doesn’t show a vanishing
gradient problem or exploding gradient problem.
- Learning rate – which you should check when using adaptive techniques.
Artificial Neural Networks- TechTrunk Ventures Pvt. Ltd.
+1
𝑥2 (2)
𝑎2
𝑤22 = −2
+1
𝑤10 = −1.5
𝑦ℎ𝑎𝑡
𝑥1
𝑤11 = +1
𝑥2 𝑤12 = +1
Artificial Neural Networks – TechTrunk Ventures Pvt. Ltd.
+1
𝑤10 = −0.5
𝑦ℎ𝑎𝑡
𝑥1
𝑤11 = +1
𝑥2 𝑤12 = +1
+1
𝑤10 = +1
𝑦ℎ𝑎𝑡
𝑥1
𝑤11 = −2
𝑥2 𝑤12 = −2
0 0 0 1 1
0 1 0 0 0
1 0 0 0 0
1 1 1 0 1
Artificial Neural Networks- TechTrunk Ventures Pvt. Ltd.
If we have two inputs, then the weights define a decision boundary that is a one
dimensional straight line in the two dimensional input space of possible input values
This hyperplane is clearly still linear (i.e. straight/flat) and can still only divide the
space into two regions. We still need more complex transfer functions, or more
complex networks, to deal with XOR type problems
Problems with input patterns which can be classified using a single hyperplane are
said to be linearly separable. Problems (such as XOR) which cannot be classified in
this way are said to be non-linearly separable.
Generally, we will want to deal with input patterns that are not binary, and expect our
neural networks to form complex decision boundaries
We may also wish to classify inputs into many classes (such as the three shown here)
Artificial Neural Networks- TechTrunk Ventures Pvt. Ltd.
1. Forward propagation of a training pattern's input through the neural network in order to
generate the propagation's output activations.
1. Multiply its output delta and input activation to get the gradient of the weight.
This ratio (percentage) influences the speed and quality of learning; it is called the learning
rate. The greater the ratio, the faster the neuron trains; the lower the ratio, the more
accurate the training is. The sign of the gradient of a weight indicates where the error is
increasing, this is why the weight must be updated in the opposite direction.
𝑥0
(1)
(1)
𝑤10
𝑥1 𝑤11
𝑎1
(2)
𝒂𝟎
(𝟐)
(1)
𝑤31
(1) 𝑤21
𝑥0 (1) (2)
𝑤11 (2)
𝑤20 𝑤10
(1) (2)
𝑤12 (1) (2)
𝑎2 𝑤12
𝑥2 𝑤22 (3)
𝑎1
(1)
𝑤32
𝑥0 (2)
(1) 𝑤13
𝑤30
(1) (1)
𝑤13 𝑤23 (2)
𝑎3
𝑥3 (1)
𝑤33
𝜕𝐸
(𝑙)
= 𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 𝑤. 𝑟. 𝑡. 𝑤𝑒𝑖𝑔ℎ𝑡
𝜕𝑤𝑖𝑗
(2)
𝑤11
(3)
(2) 𝑎1
𝑤12
𝒛(𝟑) 𝒈{𝒛(𝟑) }
(2)
𝑤13
𝟏 𝟏 (𝟑)
𝑬 = 𝟐 (𝒕 − 𝒚𝒉𝒂𝒕)𝟐 = 𝟐 (𝒕 − 𝒂𝟏 )𝟐
(𝟑)
𝝏𝑬 𝝏𝑬 𝝏𝒂𝟏 𝝏𝒛(𝟑)
(𝒍) = (𝟑) (𝒍)
𝝏𝒘𝒊𝒋 𝝏𝒂𝟏 𝝏𝒛(𝟑) 𝝏𝒘𝒊𝒋
(𝟑)
𝝏𝒂𝟏 𝝏𝒈{𝒛(𝟑) }
= = 𝒈{𝒛(𝟑) }(𝟏 − 𝒈{𝒛(𝟑) })
𝝏𝒛(𝟑) 𝝏𝒛(𝟑)
( ) ( )
=𝒂𝟏𝟑 (1-𝒂𝟏𝟑 )
𝝏𝑬 𝝏 𝟏 𝟐 𝝏 𝟏 𝟐 (𝟑)𝟑 ( )
(𝟑) = (𝟑) 𝟐 (𝒕 − 𝒚) = (𝟑) 𝟐 (𝒕 − 𝒂𝟏 ) = (𝒂𝟏 − 𝐭)
𝝏𝒂𝟏 𝝏𝒂𝟏 𝝏𝒂𝟏
𝒑𝒖𝒕𝒕𝒊𝒏𝒈 𝒊𝒕 𝒂𝒍𝒕𝒐𝒈𝒆𝒕𝒉𝒆𝒓,
𝝏𝑬 (𝟑) 𝟑 (𝟑) ( ) (𝟐)
(𝒍) = 𝒂𝟏 (1- 𝒂𝟏 ) *(𝒂𝟏 − 𝐭)* 𝒂𝒌
𝝏𝒘𝒊𝒋
𝝏𝑬 (𝟐)
(𝒍) = 𝜹(𝟑) ∗ 𝒂𝒌
𝝏𝒘𝒊𝒋
( ) ( ) ( )
𝒘𝒉𝒆𝒓𝒆 𝜹(𝟑) = 𝒂𝟏𝟑 (1-𝒂𝟏𝟑 ) *(𝒂𝟏𝟑 − 𝐭)
Artificial Neural Networks- TechTrunk Ventures Pvt. Ltd.
Backpropagation Steps
Step 1 Operate Feed Forward Network to find the y actual value of network and calculate
the Error term on output neuron.
Step 2 Calculate 𝛿 for output neuron and hidden neurons, 𝛿 will not be calculated for input
neuron
𝝏𝑬 (𝟐)
(𝒍)
= 𝜹(𝟑) ∗ 𝒂𝒌
𝝏𝒘𝒊𝒋
(𝟐) 𝝏𝑬 (𝟐)
∆𝒘𝒊𝒋 = 𝜶 (𝒍) = 𝜶 ∗ 𝜹(𝟑) ∗ 𝒂𝒌
𝝏𝒘𝒊𝒋
Heuristic
• If the sum of squared errors at the current epoch exceeds the previous value by
more than a predefined ratio (typically 1.04), the learning rate parameter is
decreased (typically by multiplying by 0.7) and new weights and thresholds are
calculated.
• If the error is less than the previous one, the learning rate is increased (typically by
multiplying by 1.05).
Artificial Neural Networks- TechTrunk Ventures Pvt. Ltd.
- Electronics − Code sequence prediction, IC chip layout, chip failure analysis, machine
vision, voice synthesis.
- Financial − Real estate appraisal, loan advisor, mortgage screening, corporate bond
rating, portfolio trading program, corporate financial analysis, currency value
prediction, document readers, credit application evaluators.
- Medical − Cancer cell analysis, EEG and ECG analysis, prosthetic design, transplant
time optimizer.
Artificial Neural Networks – TechTrunk Ventures Pvt. Ltd.
- Time Series Prediction − ANNs are used to make predictions on stocks and natural
calamities.
- Signal Processing − Neural networks can be trained to process an audio signal and
filter it appropriately in the hearing aids.
- Control − ANNs are often used to make steering decisions of physical vehicles.
- Anomaly Detection − As ANNs are expert at recognizing patterns, they can also be
trained to generate an output when something unusual occurs that misfits the
pattern.
Artificial Neural Networks- TechTrunk Ventures Pvt. Ltd.
▪ A neural network can perform tasks that a linear program can not.
▪ When an element of the neural network fails, it can continue without any problem
by their parallel nature.
Summary
▪ Artificial Neural Networks
▪ Introduction to Neurons
▪ Activation Functions