Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
10 views

Module-3 notes

Uploaded by

sandeshssanshi7
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Module-3 notes

Uploaded by

sandeshssanshi7
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 51

Module-3 Notes

Syllabus
Neural Network Representation, problems-Perceptron, Multilayer Neural
Networks and Back propagarion, Algorithms-Genetic Algorithms-Hypothesis
Space Search-Genetic programming. Models of Evolution and learning
Textbook-2 Chapters 4.1-4.6 9.1- 9.5 Tom Mitchel
Neural Networks and genetic algorithms
Brief history and evolution of Neural networks Biological neuron, Basics
Artificial Neural Networks (ANNs) Activation function, MP model
Textbook3 Chapter-6 Anuradha Srinivasaraghavan and Vincy Joseph

Artificial Neural Network (javatpoint.com)

What is Artificial Neural Network?


The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain. Similar to the human brain that has neurons
interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are known as
nodes.

The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.

1
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
Relationship between Biological neural network and artificial neural network:

Biological Neural Network Artificial Neural Network

Dendrites Inputs

Cell nucleus Nodes

Synapse Weights

Axon Output

An Artificial Neural Network in the field of Artificial intelligence where it attempts to


mimic the network of neurons makes up a human brain so that computers will have an option
to understand things and make decisions in a human-like manner. The artificial neural
network is designed by programming computers to behave simply like interconnected brain
cells.
There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such
a manner as to be distributed, and we can extract more than one piece of this data when
necessary from our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors.
We can understand the artificial neural network with an example, consider an example of a
digital logic gate that takes an input and gives an output. "OR" gate, which takes two inputs.
If one or both the inputs are "On," then we get "On" in output. If both the inputs are "Off,"
then we get "Off" in output. Here the output depends upon input. Our brain does not perform
the same task. The outputs to inputs relationship keep changing because of the neurons in our
brain, which are "learning."

2
The architecture of an artificial neural network:
To understand the concept of the architecture of an artificial neural network, we have to
understand what a neural network consists of. In order to define a neural network that
consists of a large number of artificial neurons, which are termed units arranged in a
sequence of layers. Lets us look at various types of layers available in an artificial neural
network.
Artificial Neural Network primarily consists of three layers:

Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations
to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.

It determines weighted total is passed as an input to an activation function to produce the


output. Activation functions choose whether a node should fire or not. Only those who are

3
fired make it to the output layer. There are distinctive activation functions available that can
be applied upon the sort of task we are performing.

Advantages of Artificial Neural Network (ANN)

Parallel processing capability:


Artificial neural networks have a numerical value that can perform more than one task
simultaneously.
Storing data on the entire network:
Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent the
network from working.
Capability to work with incomplete knowledge:
After ANN training, the information may produce output even with inadequate data. The loss
of performance here relies upon the significance of missing data.
Having a memory distribution:
For ANN is to be able to adapt, it is important to determine the examples and to encourage
the network according to the desired output by demonstrating these examples to the network.
The succession of the network is directly proportional to the chosen instances, and if the
event can't appear to the network in all its aspects, it can produce false output.
Having fault tolerance:
Extortion of one or more cells of ANN does not prohibit it from generating output, and this
feature makes the network fault-tolerance.
Disadvantages of Artificial Neural Network:
Assurance of proper network structure:
There is no particular guideline for determining the structure of artificial neural networks.
The appropriate network structure is accomplished through experience, trial, and error.
Unrecognized behavior of the network:
It is the most significant issue of ANN. When ANN produces a testing solution, it does not
provide insight concerning why and how. It decreases trust in the network.

4
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.
Difficulty of showing the issue to the network:
ANNs can work with numerical data. Problems must be converted into numerical values
before being introduced to ANN. The presentation mechanism to be resolved here will
directly impact the performance of the network. It relies on the user's abilities.
The duration of the network is unknown:
The network is reduced to a specific value of the error, and this value does not give us
optimum results.
Science artificial neural networks that have steeped into the world in the mid-20th century are
exponentially developing. In the present time, we have investigated the pros of artificial
neural networks and the issues encountered in the course of their utilization. It should not be
overlooked that the cons of ANN networks, which are a flourishing science branch, are
eliminated individually, and their pros are increasing day by day. It means that artificial
neural networks will turn into an irreplaceable part of our lives progressively important.
How do artificial neural networks work?
Artificial Neural Network can be best represented as a weighted directed graph, where the
artificial neurons form the nodes. The association between the neurons outputs and neuron
inputs can be viewed as the directed edges with weights. The Artificial Neural Network
receives the input signal from the external source in the form of a pattern and image in the
form of a vector. These inputs are then mathematically assigned by the notations x(n) for
every n number of inputs.

5
Afterward, each of the input is multiplied by its corresponding weights ( these weights are the
details utilized by the artificial neural networks to solve a specific problem ). In general
terms, these weights normally represent the strength of the interconnection between neurons
inside the artificial neural network. All the weighted inputs are summarized inside the
computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.
The activation function refers to the set of transfer functions used to achieve the desired
output.
There is a different kind of the activation function, but primarily either linear or non-linear
sets of functions. Some of the commonly used sets of activation functions are the Binary,
linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look at each of them
in details:
Binary:
In binary activation function, the output is either a one or a 0. Here, to accomplish this, there
is a threshold value set up. If the net weighted input of neurons is more than 1, then the final
output of the activation function is returned as one or else the output is returned as 0.
Sigmoidal Hyperbolic:

6
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan
hyperbolic function is used to approximate output from the actual net input. The function is
defined as:
F(x) = (1/1 + exp(-????x))
Where ???? is considered the Steepness parameter.
Types of Artificial Neural Network:
There are various types of Artificial Neural Networks (ANN) depending upon the human
brain neuron and network functions, an artificial neural network similarly performs tasks. The
majority of the artificial neural networks will have some similarities with a more complex
biological partner and are very effective at their expected tasks. For example, segmentation or
classification.
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally. As per the University of Massachusetts, Lowell Centre for Atmospheric
Research. The feedback networks feed information back into itself and are well suited to
solve optimization issues. The Internal system error corrections utilize feedback ANNs.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output
layer, and at least one layer of a neuron. Through assessment of its output by reviewing its
input, the intensity of the network can be noticed based on group behavior of the associated
neurons, and the output is decided. The primary advantage of this network is that it figures
out how to evaluate and recognize input patterns.

Activation Functions in Neural Networks - Javatpoint


Activation Functions in Neural Networks
A paradigm for information processing that draws inspiration from the brain is called an
artificial neural network (ANN). ANNs learn via imitation just like people do. Through a
learning process, an ANN is tailored for a particular purpose, including such pattern
classification or data classification. The synapses interconnections that exist between both the
neurons change because of learning.
What input layer to employ with in hidden layer and at the input level of the network is one
of the decisions you get to make while creating a neural network. This article discusses a few
of the alternatives.

7
The nerve impulse in neurology serves as a model for activation functions within computer
science. A chain reaction permits a neuron to "fire" and send a signal to nearby neurons if the
induced voltage between its interior and exterior exceeds a threshold value known as the
action potential. The next series of activations, known as a "spike train," enables motor
neurons to transfer commands from of the brain to the limbs and sensory neurons too transmit
sensation from the digits to the brain.
Neural Network Components
Layers are the vertically stacked parts that make up a neural network. The image's dotted
lines each signify a layer. A NN has three different types of layers.
/
Input Layer
The input layer is first. The data will be accepted by this layer and forwarded to the
remainder of the network. This layer allows feature input. It feeds the network with data from
the outside world; no calculation is done here; instead, nodes simply transmit the information
(features) to the hidden units.
Hidden Layer
Since they are a component of the abstraction that any neural network provides, the nodes in
this layer are not visible to the outside world. Any features entered through to the input layer
are processed by the hidden layer in any way, with the results being sent to the output layer.
The concealed layer is the name given to the second kind of layer. For a neural network,
either there are one or many hidden layers. The number inside the example above is 1. In
reality, hidden layers are what give neural networks their exceptional performance and
intricacy. They carry out several tasks concurrently, including data transformation and
automatic feature generation.
Output Layer
This layer raises the knowledge that the network has acquired to the outside world. The
output layer is the final kind of layer The output layer contains the answer to the issue. We
receive output from the output layer after passing raw photos to the input layer.
Data science makes extensive use of the rectified unit (ReLU) functional or the category of
sigmoid processes, which also includes the logistic regression model, logistic hyperbolic
tangent, and arctangent function.

Activation Function
Definition
In artificial neural networks, an activation function is one that outputs a smaller value for tiny
inputs and a higher value if its inputs are greater than a threshold. An activation function
"fires" if the inputs are big enough; otherwise, nothing happens. An activation function, then,
is a gate that verifies how an incoming value is higher than a threshold value.
Because they introduce non-linearities in neural networks and enable the neural networks can
learn powerful operations, activation functions are helpful. A feedforward neural network
8
might be refactored into a straightforward linear function or matrix transformation on to its
input if indeed the activation functions were taken out.
By generating a weighted total and then including bias with it, the activation function
determines whether a neuron should be turned on. The activation function seeks to boost a
neuron's output's nonlinearity.
Explanation: As we are aware, neurons in neural networks operate in accordance with
weight, bias, and their corresponding activation functions. Based on the mistake, the values
of the neurons inside a neural network would be modified. This process is known as back-
propagation. Back-propagation is made possible by activation functions since they provide
the gradients and error required to change the biases and weights.
Need of Non-linear Activation Functions
An interconnected regression model without an activation function is all that a neural
network is. Input is transformed nonlinearly by the activation function, allowing the system
to learn and perform more challenging tasks.
It is merely a thing procedure that is used to obtain a node's output. It also goes by the name
Transfer Function.
The mixture of two linear functions yields a linear function, so no matter how several hidden
layers we add to a neural network, they all will behave in the same way. The neuron cannot
learn if all it has is a linear model. It will be able to learn based on the difference with respect
to error with a non-linear activation function.
The mixture of two linear functions yields a linear function in itself, so no matter how several
hidden layers we add to a neural network, they all will behave in the same way. The neuron
cannot learn if all it has is a linear model.

The two main categories of activation functions are:


o Linear Activation Function
o Non-linear Activation Functions
Linear Activation Function
As can be observed, the functional is linear or linear. Therefore, no region will be employed
to restrict the functions' output.

9
The normal data input to neural networks is unaffected by the complexity or other factors.
Non-linear Activation Function
The normal data input to neural networks is unaffected by the complexity or other factors.
Activation Function
o Linear Function
Equation: A linear function's equation, which is y = x, is similar to the eqn of a single
direction.
The ultimate activation function of the last layer is nothing more than a linear function of
input from the first layer, regardless of how many levels we have if they are all linear in
nature. -inf to +inf is the range.

Uses: The output layer is the only location where the activation function's function is applied.
If we separate a linear function to add non-linearity, the outcome will no longer depend on
the input "x," the function will become fixed, and our algorithm won't exhibit any novel
behaviour.
A good example of a regression problem is determining the cost of a house. We can use linear
activation at the output layer since the price of a house may have any huge or little value. The
neural network's hidden layers must perform some sort of non-linear function even in this
circumstance.
o Sigmoid Function

10
It is a functional that is graphed in a "S" shape.
A is equal to 1/(1 + e-x).
Non-linear in nature. Observe that while Y values are fairly steep, X values range from -2 to
2. To put it another way, small changes in x also would cause significant shifts in the value of
Y. spans from 0 to 1.
Uses: Sigmoid function is typically employed in the output nodes of a classi?cation, where
the result may only be either 0 or 1. Since the value for the sigmoid function only ranges
from 0 to 1, the result can be easily anticipated to be 1 if the value is more than 0.5 and 0 if it
is not.
o Tanh Function
The activation that consistently outperforms sigmoid function is known as tangent hyperbolic
function. It's actually a sigmoid function that has been mathematically adjusted. Both are
comparable to and derivable from one another.

Range of values: -1 to +1. non-linear nature


Uses: - Since its values typically range from -1 to 1, the mean again for hidden layer of a
neural network will be 0 or very near to it. This helps to centre the data by getting the mean
close to 0. This greatly facilitates learning for the following layer.
Equation:
max A(x) (0, x). If x is positive, it outputs x; if not, it outputs 0.
Value Interval: [0, inf]
Nature: non-linear, which allows us to simply backpropagate the mistakes and have the ReLU
function activate many layers of neurons.
Uses: Because ReLu includes simpler mathematical processes than tanh and sigmoid, it
requires less computer time to run. The system is sparse and efficient for computation since
only a limited number of neurons are activated at any given time.
Simply said, RELU picks up information considerably more quickly than sigmoid and Tanh
functions.
o ReLU (Rectified Linear Unit) Activation Function
Currently, the ReLU is the activation function that is employed the most globally. Since
practically all convolutional neural networks and deep learning systems employ it.
The derivative and the function are both monotonic.

11
However, the problem is that all negative values instantly become zero, which reduces the
model's capacity to effectively fit or learn from the data. This means that any negative input
to a ReLU activation function immediately becomes zero in the graph, which has an impact
on the final graph by improperly mapping the negative values.
o Softmax Function
Although it is a subclass of the sigmoid function, the softmax function comes in handy when
dealing with multiclass classification issues.
Used frequently when managing several classes. In the output nodes of image classification
issues, the softmax was typically present. The softmax function would split by the sum of the
outputs and squeeze all outputs for each category between 0 and 1.
The output unit of the classifier, where we are actually attempting to obtain the probabilities
to determine the class of each input, is where the softmax function is best applied.
The usual rule of thumb is to utilise RELU, which is a usual perceptron in hidden layers and
is employed in the majority of cases these days, if we really are unsure of what encoder to
apply.
A very logical choice for the output layer is the sigmoid function if your input is for binary
classification. If our output involves multiple classes, Softmax can be quite helpful in
predicting the odds for each class.

Next TopicBoston Housing Kaggle Challenge with Linear Regression

activation function - Search (bing.com)


Activation Functions in Neural Networks
123

Activation functions play a crucial role in neural networks by introducing non-linearity into
the model, allowing it to learn and represent complex patterns in the data. Without non-
linearity, a neural network would essentially behave like a linear regression model, regardless
of the number of layers it has12.
Types of Activation Functions

12
1. Step Function
The step function is one of the simplest activation functions. It considers a threshold value,
and if the net input is greater than the threshold, the neuron is activated. Mathematically:
def step_function(x):
return 1 if x >= 0 else 0
This function is rarely used in practice due to its non-differentiable nature1.
2. Sigmoid Function
The sigmoid function is widely used and defined as: [ \sigma(x) = \frac{1}{1 + e^{-x}} ] It is
non-linear and outputs values between 0 and 1, making it suitable for binary classification12.
import numpy as np

def sigmoid(x):
return 1 / (1 + np.exp(-x))
3. Tanh Function
The tanh function is a shifted version of the sigmoid function and is defined as: [ \tanh(x) = \
frac{2}{1 + e^{-2x}} - 1 ] It outputs values between -1 and 1, making it useful for hidden
layers3.
def tanh(x):
return np.tanh(x)
4. ReLU (Rectified Linear Unit)
ReLU is the most widely used activation function, defined as: [ \text{ReLU}(x) = \max(0,
x) ] It is computationally efficient and helps mitigate the vanishing gradient problem 12.
def relu(x):
return np.maximum(0, x)
5. Leaky ReLU
Leaky ReLU is an improved version of ReLU, allowing a small gradient when the input is
negative: [ \text{Leaky ReLU}(x) = \max(0.01x, x) ] This helps prevent the "dying ReLU"
problem1.
def leaky_relu(x, alpha=0.01):
return np.where(x > 0, x, x * alpha)
6. Softmax Function
The softmax function is used for multi-class classification problems. It converts logits into
probabilities: [ \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}} ]

13
def softmax(x):
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
Choosing the Right Activation Function
 Binary Classification: Sigmoid function is commonly used in the output layer as it
maps any input to a probability range [0, 1]1.
 Multi-Class Classification: Softmax function is used to predict the probabilities of
each class3.
 Hidden Layers: ReLU is generally preferred due to its simplicity and efficiency1.
Activation functions are essential for the performance and learning capability of neural
networks. They introduce non-linearity, enabling the network to learn complex patterns and
make accurate predictions12
Activation Functions - GeeksforGeeks
Activation Functions
Last Updated : 12 Aug, 2024
To put it in simple terms, an artificial neuron calculates the ‘weighted sum’ of its inputs and
adds a bias, as shown in the figure below by the net input.

Mathematically,

Now the value of net input can be any anything from -inf to +inf. The neuron doesn’t really
know how to bound to value and thus is not able to decide the firing pattern. Thus the

14
activation function is an important part of an artificial neural network. They basically decide
whether a neuron should be activated or not. Thus it bounds the value of the net input. The
activation function is a non-linear transformation that we do over the input before sending it
to the next layer of neurons or finalizing it as output.
Types of Activation Functions –
Several different types of activation functions are used in Deep Learning. Some of them are
explained below:
Step Function: Step Function is one of the simplest kind of activation functions. In this, we
consider a threshold value and if the value of net input say y is greater than the threshold then
the neuron is activated. Mathematically,

Given below is the graphical representation of step function.

Sigmoid Function: Sigmoid function is a widely used activation function. It is defined as:

Graphically,

15
This is a smooth function and is continuously differentiable. The biggest advantage that it has
over step and linear function is that it is non-linear. This is an incredibly cool feature of the
sigmoid function. This essentially means that when I have multiple neurons having sigmoid
function as their activation function – the output is non linear as well. The function ranges
from 0-1 having an S shape.

ReLU: The ReLU function is the Rectified linear unit. It is the most widely used activation
function. It is defined as:

Graphically,

16
The main advantage of using the ReLU function over other activation functions is that it does
not activate all the neurons at the same time. What does this mean ? If you look at the ReLU
function if the input is negative it will convert it to zero and the neuron does not get activated.

Leaky ReLU: Leaky ReLU function is nothing but an improved version of the ReLU
function. Instead of defining the Relu function as 0 for x less than 0, we define it as a small
linear component of x. It can be defined as:

Graphically,

17
Read More:
 Activation functions in Neural Networks
 Understanding Activation Functions in Depth
 Types Of Activation Function in ANN
Practices Questions – Activation Functions
1. Sigmoid Function Calculation
– Formula: sigma(x) = 1 / (1 + exp(-x))
– Compute the output of the sigmoid activation function for the input values: -1, 0, and
1.
2. Tanh Function Properties
– Output Range:
– Sigmoid: [0, 1]
– Tanh: [-1, 1]
– Gradient:
– Sigmoid: The gradient is sigma(x) * (1 – sigma(x)), which can be very small for large
positive or negative values of x, leading to vanishing gradients.
– Tanh: The gradient is 1 – tanh^2(x), which is typically larger than the sigmoid’s
gradient, especially for values near zero.
3. ReLU Activation
– Formula: ReLU(x) = max(0, x)

18
– Calculate the output of the ReLU activation function for the input values: -3, 0, and 3.
4. Leaky ReLU Implementation
– Formula: Leaky ReLU(x) = max(0.01 * x, x)
– Implement the Leaky ReLU activation function for the input value 0.5 with a negative
slope coefficient of 0.01.
5. Softmax Function Calculation
– Formula: softmax(x_i) = exp(x_i) / sum(exp(x_j) for all j)
– For the input vector [1, 2, 3], compute the output of the softmax activation function.
6. Activation Function Derivatives
– Sigmoid Gradient:
– Formula: The derivative of the sigmoid function with respect to its input x is:
d(sigma(x))/dx = sigma(x) * (1 – sigma(x))
7. Choosing Activation Functions
– Binary Classification:
– Reason: The sigmoid function is commonly used in the output layer for binary
classification because it maps any input to a probability range [0, 1], making it suitable
for representing the probability of a binary outcome.
8. Vanishing Gradient Problem
– Description: The vanishing gradient problem occurs when gradients become very
small during backpropagation, leading to slow or stalled training of deep neural
networks.
– Susceptible Activation Functions: Sigmoid and tanh are most susceptible to this issue
because their gradients can become very small for large positive or negative inputs,
especially in deep networks.
9. Swish Activation Function
– Formula: Swish(x) = x * sigma(x)
– Compute the output of the Swish activation function for an input value of 2.
10. Comparing Activation Functions
– ReLU:
– Formula: ReLU(x) = max(0, x)
– Advantages: Simplicity, computational efficiency, avoids vanishing gradients.
– Disadvantages: Can suffer from the “dying ReLU” problem, where neurons can
become inactive and stop learning.
– ELU (Exponential Linear Unit):
19
– Formula: ELU(x) = x if x > 0, else alpha * (exp(x) – 1)
– Advantages: Helps avoid the dying ReLU problem, has a smooth curve for negative
inputs.
– Disadvantages: Slightly more computationally expensive than ReLU.
Activation functions in Neural Networks (geeksforgeeks.org)
Activation functions in Neural Networks
Last Updated : 16 Jul, 2024

It is recommended to understand Neural Networks before reading this article.


In the process of building a neural network, one of the choices you get to make is
what Activation Function to use in the hidden layer as well as at the output layer of the
network. This article discusses Activation functions in Neural Networks.
Table of Content
 What is an Activation Function?
 Elements of a Neural Network
 Why do we need Non-linear activation function?
 Variants of Activation Function
o Linear Function
o Sigmoid Function
o Tanh Function
o RELU Function
o Softmax Function
What is an Activation Function?
An activation function in the context of neural networks is a mathematical function applied to
the output of a neuron. The purpose of an activation function is to introduce non-linearity into
the model, allowing the network to learn and represent complex patterns in the data. Without
non-linearity, a neural network would essentially behave like a linear regression model,
regardless of the number of layers it has.
The activation function decides whether a neuron should be activated or not by calculating
the weighted sum and further adding bias to it. The purpose of the activation function is to
introduce non-linearity into the output of a neuron.

20
Explanation: We know, the neural network has neurons that work in correspondence
with weight, bias, and their respective activation function. In a neural network, we would
update the weights and biases of the neurons on the basis of the error at the output. This
process is known as back-propagation. Activation functions make the back-propagation
possible since the gradients are supplied along with the error to update the weights and
biases.
Elements of a Neural Network
Input Layer: This layer accepts input features. It provides information from the outside
world to the network, no computation is performed at this layer, nodes here just pass on the
information(features) to the hidden layer.
Hidden Layer: Nodes of this layer are not exposed to the outer world, they are part of the
abstraction provided by any neural network. The hidden layer performs all sorts of
computation on the features entered through the input layer and transfers the result to the
output layer.
Output Layer: This layer bring up the information learned by the network to the outer
world.
Why do we need Non-linear activation function?
A neural network without an activation function is essentially just a linear regression model.
The activation function does the non-linear transformation to the input making it capable to
learn and perform more complex tasks.
Mathematical proof
Suppose we have a Neural net like this :-

21
Elements of the diagram are as follows:
Hidden layer i.e. layer 1:
z(1) = W(1)X + b(1) a(1)
Here,
 z(1) is the vectorized output of layer 1
 W(1) be the vectorized weights assigned to neurons of hidden layer i.e. w1, w2, w3
and w4
 X be the vectorized input features i.e. i1 and i2
 b is the vectorized bias assigned to neurons in hidden layer i.e. b1 and b2
 a(1) is the vectorized form of any linear function.
(Note: We are not considering activation function here)

Layer 2 i.e. output layer :-


Note : Input for layer 2 is output from layer 1
z(2) = W(2)a(1) + b(2)
a(2) = z(2)
Calculation at Output layer
z(2) = (W(2) * [W(1)X + b(1)]) + b(2)
z(2) = [W(2) * W(1)] * X + [W(2)*b(1) + b(2)]
Let,
[W(2) * W(1)] = W
[W(2)*b(1) + b(2)] = b
Final output : z(2) = W*X + b
which is again a linear function
This observation results again in a linear function even after applying a hidden layer, hence
we can conclude that, doesn’t matter how many hidden layer we attach in neural net, all
layers will behave same way because the composition of two linear function is a linear
function itself. Neuron can not learn with just a linear function attached to it. A non-linear
activation function will let it learn as per the difference w.r.t error. Hence we need an
activation function.
Variants of Activation Function
Linear Function

22
 Equation : Linear function has the equation similar to as of a straight line i.e. y = x
 No matter how many layers we have, if all are linear in nature, the final activation
function of last layer is nothing but just a linear function of the input of first layer.
 Range : -inf to +inf
 Uses : Linear activation function is used at just one place i.e. output layer.
 Issues : If we will differentiate linear function to bring non-linearity, result will no
more depend on input “x” and function will become constant, it won’t introduce any
ground-breaking behavior to our algorithm.
For example : Calculation of price of a house is a regression problem. House price may have
any big/small value, so we can apply linear activation at output layer. Even in this case neural
net must have any non-linear function at hidden layers.
Sigmoid Function

 It is a function which is plotted as ‘S’ shaped graph.


 Equation : A = 1/(1 + e-x)
 Nature : Non-linear. Notice that X values lies between -2 to 2, Y values are very
steep. This means, small changes in x would also bring about large changes in the
value of Y.

23
 Value Range : 0 to 1
 Uses : Usually used in output layer of a binary classification, where result is either 0
or 1, as value for sigmoid function lies between 0 and 1 only so, result can be
predicted easily to be 1 if value is greater than 0.5 and 0 otherwise.
Tanh Function

 The activation that works almost always better than sigmoid function is Tanh function
also known as Tangent Hyperbolic function. It’s actually mathematically shifted
version of the sigmoid function. Both are similar and can be derived from each other.
 Equation :-
f(x) = tanh(x) = 2/(1 + e-2x) – 1
24
OR
tanh(x) = 2 * sigmoid(2x) – 1
 Value Range :- -1 to +1
 Nature :- non-linear
 Uses :- Usually used in hidden layers of a neural network as it’s values lies
between -1 to 1 hence the mean for the hidden layer comes out be 0 or very
close to it, hence helps in centering the data by bringing mean close to 0. This
makes learning for the next layer much easier.
RELU Function
 It Stands for Rectified linear unit. It is the most widely used activation function.
Chiefly implemented in hidden layers of Neural network.
 Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
 Value Range :- [0, inf)
 Nature :- non-linear, which means we can easily backpropagate the errors and have
multiple layers of neurons being activated by the ReLU function.
 Uses :- ReLu is less computationally expensive than tanh and sigmoid because it
involves simpler mathematical operations. At a time only a few neurons are activated
making the network sparse making it efficient and easy for computation.
In simple words, RELU learns much faster than sigmoid and Tanh function.
Softmax Function

The softmax function is also a type of sigmoid function but is handy when we are trying to
handle multi- class classification problems.
 Nature :- non-linear

25
 Uses :- Usually used when trying to handle multiple classes. the softmax function was
commonly found in the output layer of image classification problems.The softmax
function would squeeze the outputs for each class between 0 and 1 and would also
divide by the sum of the outputs.
 Output:- The softmax function is ideally used in the output layer of the classifier
where we are actually trying to attain the probabilities to define the class of each
input.
 The basic rule of thumb is if you really don’t know what activation function to use,
then simply use RELU as it is a general activation function in hidden layers and is
used in most cases these days.
 If your output is for binary classification then, sigmoid function is very natural choice
for output layer.
 If your output is for multi-class classification then, Softmax is very useful to predict
the probabilities of each classes.

Table of activation functions


[edit]
The following table compares the properties of several activation functions that are functions
of one fold x from the previous layer or layers:
The following table lists activation functions that are not functions of a single fold x from the
previous layer or layers:
Artificial Neural Networks and its Applications - GeeksforGeeks
Artificial Neural Networks and its Applications
Last Updated : 07 Aug, 2024

As you read this article, which organ in your body is thinking about it? It’s the brain of
course! But do you know how the brain works? Well, it has neurons or nerve cells that are the
primary units of both the brain and the nervous system. These neurons receive sensory input
from the outside world which they process and then provide the output which might act as the
input to the next neuron.
Each of these neurons is connected to other neurons in complex arrangements at synapses.
Now, are you wondering how this is related to Artificial Neural Networks ? Let’s check out
what they are in detail and how they learn information.
Well, Artificial Neural Networks are modeled after the neurons in the human brain. If you
want to gain practical skills in Artificial Neural Networks and explore their diverse

26
applications through our interactive live data science course , perfect for aspiring data
scientists.
Artificial Neural Networks
Artificial Neural Networks contain artificial neurons which are called units . These units are
arranged in a series of layers that together constitute the whole Artificial Neural Network in a
system. A layer can have only a dozen units or millions of units as this depends on how the
complex neural networks will be required to learn the hidden patterns in the dataset.
Commonly, Artificial Neural Network has an input layer, an output layer as well as hidden
layers. The input layer receives data from the outside world which the neural network needs
to analyze or learn about. Then this data passes through one or multiple hidden layers that
transform the input into data that is valuable for the output layer. Finally, the output layer
provides an output in the form of a response of the Artificial Neural Networks to input data
provided.
In the majority of neural networks, units are interconnected from one layer to another. Each
of these connections has weights that determine the influence of one unit on another unit. As
the data transfers from one unit to another, the neural network learns more and more about
the data which eventually results in an output from the output layer.

Neural Networks Architecture


The structures and operations of human neurons serve as the basis for artificial neural
networks. It is also known as neural networks or neural nets. The input layer of an artificial
neural network is the first layer, and it receives input from external sources and releases it to
the hidden layer, which is the second layer. In the hidden layer, each neuron receives input
from the previous layer neurons, computes the weighted sum, and sends it to the neurons in
the next layer. These connections are weighted means effects of the inputs from the previous

27
layer are optimized more or less by assigning different-different weights to each input and it
is adjusted during the training process by optimizing these weights for improved model
performance.
Artificial neurons vs Biological neurons
The concept of artificial neural networks comes from biological neurons found in animal
brains So they share a lot of similarities in structure and function wise.
 Structure : The structure of artificial neural networks is inspired by biological
neurons. A biological neuron has a cell body or soma to process the impulses,
dendrites to receive them, and an axon that transfers them to other neurons. The input
nodes of artificial neural networks receive input signals, the hidden layer nodes
compute these input signals, and the output layer nodes compute the final output by
processing the hidden layer’s results using activation functions.

Biological Neuron Artificial Neuron

Dendrite Inputs

Cell nucleus or Soma Nodes

Synapses Weights

Axon Output

 Synapses : Synapses are the links between biological neurons that enable the
transmission of impulses from dendrites to the cell body. Synapses are the weights
that join the one-layer nodes to the next-layer nodes in artificial neurons. The strength
of the links is determined by the weight value.
 Learning : In biological neurons, learning happens in the cell body nucleus or soma,
which has a nucleus that helps to process the impulses. An action potential is
produced and travels through the axons if the impulses are powerful enough to reach
the threshold. This becomes possible by synaptic plasticity, which represents the
ability of synapses to become stronger or weaker over time in reaction to changes in
their activity. In artificial neural networks, backpropagation is a technique used
for learning, which adjusts the weights between nodes according to the error or
differences between predicted and actual outcomes.
Biological Artificial Neuron
Neuron
Synaptic plasticity Backpropagations
 Activation : In biological neurons, activation is the firing rate of the neuron which
happens when the impulses are strong enough to reach the threshold. In artificial

28
neural networks, A mathematical function known as an activation function maps the
input to the output, and executes activations.
Guide to Activation Functions in Artificial Neural Networks | EJable
What Does Node’s Firing Mean

The phrase “node will fire or not” is a metaphorical way of describing how a neuron in an artificial
neural network processes input signals. In the context of neural networks, a “node” is a
computational unit that simulates the behavior of biological neurons. These nodes “fire” when a
certain threshold of activation is reached. To elaborate:

In a biological neuron, when the inputs—received through the dendrites from other neurons—
accumulate to a certain level of electric potential, the neuron activates and sends an electric spike
along its axon to other neurons.

Similarly, in an artificial neural network, a node receives numerical inputs, which are typically
weighted sums of outputs from nodes in the previous layer. The role of the activation function is to
decide whether this weighted sum is sufficient to activate the node.

When we say that an activation function determines if a “node will fire or not,” we are
analogizing this process to its biological counterpart. In practice, for a node in a neural
network, “firing” means producing a non-zero output.
If the weighted sum of inputs does not meet a certain threshold—determined by the activation
function—the node will not activate and will output a value corresponding to “not firing,” which
could be zero or another value depending on the function used.

Activation functions like the sigmoid or ReLU (Rectified Linear Unit) effectively decide whether the
signal that a node is processing is significant enough to be passed on to the next network layer. This
process is critical in allowing the network to learn complex patterns, as different nodes will be
responsible for activating different parts of the network based on the features present in the input
data.

29
30
Biological neurons to Artificial neurons
How do Artificial Neural Networks learn?
Artificial neural networks are trained using a training set. For example, suppose you want to
teach an ANN to recognize a cat. Then it is shown thousands of different images of cats so
that the network can learn to identify a cat. Once the neural network has been trained enough
using images of cats, then you need to check if it can identify cat images correctly. This is
done by making the ANN classify the images it is provided by deciding whether they are cat
images or not. The output obtained by the ANN is corroborated by a human-provided
description of whether the image is a cat image or not. If the ANN identifies incorrectly
then back-propagation is used to adjust whatever it has learned during
training. Backpropagation is done by fine-tuning the weights of the connections in ANN units
based on the error rate obtained. This process continues until the artificial neural network can
correctly recognize a cat in an image with minimal possible error rates.
What are the types of Artificial Neural Networks?
 Feedforward Neural Network : The feedforward neural network is one of the most
basic artificial neural networks. In this ANN, the data or the input provided travels in
a single direction. It enters into the ANN through the input layer and exits through the
output layer while hidden layers may or may not exist. So the feedforward neural
network has a front-propagated wave only and usually does not have
backpropagation.
 Convolutional Neural Network : A Convolutional neural network has some
similarities to the feed-forward neural network, where the connections between units
have weights that determine the influence of one unit on another unit. But a CNN has
one or more than one convolutional layer that uses a convolution operation on the
input and then passes the result obtained in the form of output to the next layer. CNN
has applications in speech and image processing which is particularly useful in
computer vision.
 Modular Neural Network: A Modular Neural Network contains a collection of
different neural networks that work independently towards obtaining the output with
no interaction between them. Each of the different neural networks performs a
different sub-task by obtaining unique inputs compared to other networks. The
advantage of this modular neural network is that it breaks down a large and complex
computational process into smaller components, thus decreasing its complexity while
still obtaining the required output.
 Radial basis function Neural Network: Radial basis functions are those functions
that consider the distance of a point concerning the center. RBF functions have two
layers. In the first layer, the input is mapped into all the Radial basis functions in the
hidden layer and then the output layer computes the output in the next step. Radial
basis function nets are normally used to model the data that represents any underlying
trend or function.
 Recurrent Neural Network: The Recurrent Neural Network saves the output of a
layer and feeds this output back to the input to better predict the outcome of the layer.

31
The first layer in the RNN is quite similar to the feed-forward neural network and the
recurrent neural network starts once the output of the first layer is computed. After
this layer, each unit will remember some information from the previous step so that it
can act as a memory cell in performing computations.
Applications of Artificial Neural Networks
1. Social Media: Artificial Neural Networks are used heavily in Social Media. For
example, let’s take the ‘People you may know’ feature on Facebook that suggests
people that you might know in real life so that you can send them friend requests.
Well, this magical effect is achieved by using Artificial Neural Networks that analyze
your profile, your interests, your current friends, and also their friends and various
other factors to calculate the people you might potentially know. Another common
application of Machine Learning in social media is facial recognition . This is done
by finding around 100 reference points on the person’s face and then matching them
with those already available in the database using convolutional neural networks.
2. Marketing and Sales: When you log onto E-commerce sites like Amazon and
Flipkart, they will recommend your products to buy based on your previous browsing
history. Similarly, suppose you love Pasta, then Zomato, Swiggy, etc. will show you
restaurant recommendations based on your tastes and previous order history. This is
true across all new-age marketing segments like Book sites, Movie services,
Hospitality sites, etc. and it is done by implementing personalized marketing . This
uses Artificial Neural Networks to identify the customer likes, dislikes, previous
shopping history, etc., and then tailor the marketing campaigns accordingly.
3. Healthcare : Artificial Neural Networks are used in Oncology to train algorithms that
can identify cancerous tissue at the microscopic level at the same accuracy as trained
physicians. Various rare diseases may manifest in physical characteristics and can be
identified in their premature stages by using Facial Analysis on the patient photos. So
the full-scale implementation of Artificial Neural Networks in the healthcare
environment can only enhance the diagnostic abilities of medical experts and
ultimately lead to the overall improvement in the quality of medical care all over the
world.
4. Personal Assistants: I am sure you all have heard of Siri, Alexa, Cortana, etc., and
also heard them based on the phones you have!!! These are personal assistants and an
example of speech recognition that uses Natural Language Processing to interact
with the users and formulate a response accordingly. Natural Language Processing
uses artificial neural networks that are made to handle many tasks of these personal
assistants such as managing the language syntax, semantics, correct speech, the
conversation that is going on, etc.

32
Artificial Neural Networks - Javatpoint
Importance of Neural Network:
o Without Neural Network: Let's have a look at the example given below. Here we
have a machine, such that we have trained it with four types of cats, as you can see in
the image below. And once we are done with the training, we will provide a random
image to that particular machine that has a cat. Since this cat is not similar to the cats
through which we have trained our system, so without the neural network, our
machine would not identify the cat in the picture. Basically, the machine will get
confused in figuring out where the cat is.

o With Neural Network: However, when we talk about the case with a neural network,
even if we have not trained our machine with that particular cat. But still, it can
identify certain features of a cat that we have trained on, and it can match those
features with the cat that is there in that particular image and can also identify the cat.
So, with the help of this example, you can clearly see the importance of the concept of
a neural network.
Working of Artificial Neural Networks

33
The Perceptron’s Input-Output Principles
The Perceptron, which is historically possibly the earliest artificial neuron that was proposed
[Rosenblatt, 1958], is also the basic building block of nearly all ANNs. The Artron may share
the claim for the oldest artificial neuron. However, it lacks the generality of the Perceptron
and of its closely related Adaline, and it was not as influential in the later history of ANN
except in its introduction of the statistical switch. Its discussion follows in Sec. 5 below.
Here, it suffices to say that its basic structure is as in Fig. 2.5 of Sec. 2, namely, it is a very
gross but simple model of the biologlogy

Instead of directly getting into the working of Artificial Neural Networks, lets breakdown and
try to understand Neural Network's basic unit, which is called a Perceptron.
So, a perceptron can be defined as a neural network with a single layer that classifies the
linear data. It further constitutes four major components, which are as follows;
Advertisement
1. Inputs
2. Weights and Bias
3. Summation Functions
4. Activation or transformation function

The main logic behind the concept of Perceptron is as follows:


The inputs (x) are fed into the input layer, which undergoes multiplication with the allotted
weights (w) followed by experiencing addition in order to form weighted sums. Then these
inputs weighted sums with their corresponding weights are executed on the pertinent
activation function.

34
Weights and Bias
As and when the input variable is fed into the network, a random value is given as a weight of
that particular input, such that each individual weight represents the importance of that input
in order to make correct predictions of the result.
However, bias helps in the adjustment of the curve of activation function so as to accomplish
a precise output.
Summation Function
After the weights are assigned to the input, it then computes the product of each input and
weights. Then the weighted sum is calculated by the summation function in which all of the
products are added.
Activation Function
The main objective of the activation function is to perform a mapping of a weighted sum
upon the output. The transformation function comprises of activation functions such as tanh,
ReLU, sigmoid, etc.
The activation function is categorized into two main parts:

1. Linear Activation Function


2. Non-Linear Activation Function
Linear Activation Function
In the linear activation function, the output of functions is not restricted in between any
range. Its range is specified from -infinity to infinity. For each individual neuron, the inputs
get multiplied with the weight of each respective neuron, which in turn leads to the creation
of output signal proportional to the input. If all the input layers are linear in nature, then the
final activation of the last layer will actually be the linear function of the initial layer's input.

35
Non- linear function
These are one of the most widely used activation function. It helps the model in generalizing
and adapting any sort of data in order to perform correct differentiation among the output. It
solves the following problems faced by linear activation functions:
o Since the non-linear function comes up with derivative functions, so the problems
related to backpropagation has been successfully solved.
o For the creation of deep neural networks, it permits the stacking up of several layers
of the neurons.

36
The non-linear activation function is further divided into the following parts:
1. Sigmoid or Logistic Activation Function
It provides a smooth gradient by preventing sudden jumps in the output values. It has
an output value range between 0 and 1 that helps in the normalization of each neuron's
output. For X, if it has a value above 2 or below -2, then the values of y will be much
steeper. In simple language, it means that even a small change in the X can bring a lot
of change in Y.
It's value ranges between 0 and 1 due to which it is highly preferred by binary

37
classification whose result is either 0 or 1.

2. Tanh or Hyperbolic Tangent Activation Function


The tanh activation function works much better than that of the sigmoid function, or
simply we can say it is an advanced version of the sigmoid activation function. Since
it has a value range between -1 to 1, so it is utilized by the hidden layers in the neural

38
network, and because of this reason, it has made the process of learning much easier.

3. ReLU(Rectified Linear Unit) Activation Function


ReLU is one of the most widely used activation function by the hidden layer in the
neural network. Its value ranges from 0 to infinity. It clearly helps in solving out the
problem of backpropagation. It tends out to be more expensive than the sigmoid, as
well as the tanh activation function. It allows only a few neurons to get activated at a

39
particular instance that leads to effectual as well as easier computations.

4. Softmax Function
It is one of a kind of sigmoid function whereby solving the problems of
classifications. It is mainly used to handle multiple classes for which it squeezes the
output of each class between 0 and 1, followed by dividing it by the sum of outputs.
This kind of function is specially used by the classifier in the output layer.

40
Gradient Descent Algorithm

41
Gradient descent is an optimization algorithm that is utilized to minimize the cost function
used in various machine learning algorithms so as to update the parameters of the learning
model. In linear regression, these parameters are coefficients, whereas, in the neural network,
they are weights.
Procedure:
Advertisement
It all starts with the coefficient's initial value or function's coefficient that may be either 0.0 or
any small arbitrary value.
coefficient = 0.0
For estimating the cost of the coefficients, they are plugged into the function that helps in
evaluating.
cost = f(coefficient)
or, cost = evaluate(f(coefficient))
Next, the derivate will be calculated, which is one of the concepts of calculus that relates to
the function's slope at any given instance. In order to know the direction in which the values
of the coefficient will move, we need to calculate the slope so as to accomplish a low cost in
the next iteration.
delta = derivative(cost)
Now that we have found the downhill direction, it will further help in updating the values of
coefficients. Next, we will need to specify alpha, which is a learning rate parameter, as it
handles the amount of amendments made by coefficients on each update.
coefficient = coefficient - (alpha * delta)
Until the cost of the coefficient reaches 0.0 or somewhat close enough to it, the whole process
will reiterate again and again.
It can be concluded that gradient descent is a very simple as well as straightforward concept.
It just requires you to know about the gradient of the cost function or simply the function that
you are willing to optimize.
Batch Gradient Descent
Advertisement
For every repetition of gradient descent, the main aim of batch gradient descent is to
processes all of the training examples. In case we have a large number of training examples,
then batch gradient descent tends out to be one of the most expensive and less preferable too.
Algorithm for Batch Gradient Descent
Let m be the number of training examples and n be the number of features.
Now assume that hƟ represents the hypothesis for linear regression and ∑ computes the sum
of all training examples from i=1 to m. Then the cost of function will be computed by:

42
Jtrain (Ɵ) = (1/2m) ∑ (hƟ(x(i)) - (y(i))2
Repeat {
Ɵj = Ɵj - (learning rate/m) * ∑ (hƟ(x(i)) - y(i)) xj(i)
For every j = 0...n
}
Here x(i) indicates the jth feature of the ith training example. In case if m is very large, then
derivative will fail to converge at a global minimum.
Stochastic Gradient Descent
At a single repetition, the stochastic gradient descent processes only one training example,
which means it necessitates for all the parameters to update after the one single training
example is processed per single iteration. It tends to be much faster than that of the batch
gradient descent, but when we have a huge number of training examples, then also it
processes a single example due to which system may undergo a large no of repetitions. To
evenly train the parameters provided by each type of data, properly shuffle the dataset.
Algorithm for Stochastic Gradient Descent
Suppose that (x(i), y(i)) be the training example
Cost (Ɵ, (x(i), y(i))) = (1/2) ∑ (hƟ(x(i)) - (y(i))2
Jtrain (Ɵ) = (1/m) ∑ Cost (Ɵ, (x(i), y(i)))
Repeat {
For i=1 to m{
Ɵj = Ɵj - (learning rate) * ∑ (hƟ(x(i)) - y(i)) xj(i)
For every j=0...n
}
}
Convergence trends in different variants of Gradient Descent
The Batch Gradient Descent algorithm follows a straight-line path towards the minimum. The
algorithm converges towards the global minimum, in case the cost function is convex, else
towards the local minimum, if the cost function is not convex. Here the learning rate is
typically constant.
However, in the case of Stochastic Gradient Descent, the algorithm fluctuates all over the
global minimum rather than converging. The learning rate is changed slowly so that it can
converge. Since it processes only one example in one iteration, it tends out to be noisy.
Backpropagation
The backpropagation consists of an input layer of neurons, an output layer, and at least one
hidden layer. The neurons perform a weighted sum upon the input layer, which is then used

43
by the activation function as an input, especially by the sigmoid activation function. It also
makes use of supervised learning to teach the network. It constantly updates the weights of
the network until the desired output is met by the network. It includes the following factors
that are responsible for the training and performance of the network:
o Random (initial) values of weights.
o A number of training cycles.
o A number of hidden neurons.
o The training set.
o Teaching parameter values such as learning rate and momentum.

Working of Backpropagation
Consider the diagram given below.

1. The preconnected paths transfer the inputs X.


2. Then the weights W are randomly selected, which are used to model the input.
3. After then, the output is calculated for every individual neuron that passes from the
input layer to the hidden layer and then to the output layer.
4. Lastly, the errors are evaluated in the outputs. ErrorB= Actual Output - Desired
Output
5. The errors are sent back to the hidden layer from the output layer for adjusting the
weights to lessen the error.
6. Until the desired result is achieved, keep iterating all of the processes.

44
Need of Backpropagation
o Since it is fast as well as simple, it is very easy to implement.
o Apart from no of inputs, it does not encompass of any other parameter to perform
tuning.
o As it does not necessitate any kind of prior knowledge, so it tends out to be more
flexible.
o It is a standard method that results well.

An Introduction to Artificial Neural Networks | by Srivignesh Rajan | Towards Data Science

Why Neural Networks?


 Traditional Machine Learning algorithms tend to perform at the same level when the
data size increases but ANN outperforms traditional Machine Learning algorithms
when the data size is huge as shown in the graph below.
 Feature Learning. The ANN tries to learn hierarchically in an incremental manner
layer by layer. Due to this reason, it is not necessary to perform feature engineering
explicitly.
 Neural Networks can handle unstructured data like images, text, and speech. When
the data contains unstructured data the neural network algorithms such as CNN
(Convolutional Neural Networks) and RNN (Recurrent Neural Networks) are used.
How ANN works
The working of ANN can be broken down into two phases,
 Forward Propagation
 Back Propagation
Forward Propagation
 Forward propagation involves multiplying feature values with weights, adding bias,
and then applying an activation function to each neuron in the neural network.
 Multiplying feature values with weights and adding bias to each neuron is basically
applying Linear Regression. If we apply Sigmoid function to it then each neuron is
basically performing a Logistic Regression.

45
Activation functions
 The purpose of an activation function is to introduce non-linearity to the data.
Introducing non-linearity helps to identify the underlying patterns which are complex.
It is also used to scale the value to a particular interval. For example, the sigmoid
activation function scales the value between 0 and 1.
Logistic or Sigmoid function
 Logistic/ Sigmoid function scales the values between 0 and 1.
 It is used in the output layer for Binary classification.
 It may cause a vanishing gradient problem during backpropagation and slows the
training time.

Sigmoid function
Tanh function
 Tanh is the short form for Hyperbolic Tangent. Tanh function scales the values
between -1 and 1.

46
Hyperbolic Tangent function
ReLU function
 ReLU (Rectified Linear Unit) outputs the same number if x>0 and outputs 0 if x<0.
 It prevents the vanishing gradient problem but introduces an exploding gradient
problem during backpropagation. The exploding gradient problem can be prevented
by capping gradients.

ReLU function
Leaky ReLU function
 Leaky ReLU is very much similar to ReLU but when x<0 it returns (0.01 * x) instead
of 0.
 If the data is normalized using Z-Score it may contain negative values and ReLU
would fail to consider it but leaky ReLU overcomes this problem.

Leaky ReLU function

47
Backpropagation
 Backpropagation is done to find the optimal value for parameters for the model by
iteratively updating parameters by partially differentiating gradients of the loss
function with respect to the parameters.
 An optimization function is applied to perform backpropagation. The objective of an
optimization function is to find the optimal value for parameters.
The optimization functions available are,
 Gradient Descent
 Adam optimizer
 Gradient Descent with momentum
 RMS Prop (Root Mean Square Prop)
The Chain rule of Calculus plays an important role in backpropagation. The formula below
denotes partial differentiation of Loss (L) with respect to Weights/ parameters (w).
A small change in weights ‘w’ influences the change in the value ‘z’ (∂𝑧/∂𝑤). A small
change in the value ‘z’ influences the change in the activation ‘a’ (∂a/∂z). A small change in
the activation ‘a’ influences the change in the Loss function ‘L’ (∂L/∂a).

Chain rule

Description of the values in the Chain rule


Terminologies:

48
Metrics
 A metric is used to gauge the performance of the model.
 Metric functions are similar to cost functions, except that the results from evaluating a
metric are not used when training the model. Note that you may use any cost function
as a metric.

 We have used Mean Squared Logarithmic Error as a metric and


cost function.

Mean Squared Logarithmic Error (MSLE) and Root Mean Squared


Logarithmic Error(RMSLE)
Epoch
 A single pass through the training data is called an epoch. The training data is fed to
the model in mini-batches and when all the mini-batches of the training data are fed to
the model that constitutes an epoch.
Hyperparameters
Hyperparameters are the tunable parameters that are not produced by a model which means
the users must provide a value for these parameters. The values of hyperparameters that we
provide affect the training process so hyperparameter optimization comes to the rescue.
The Hyperparameters used in this ANN model are,
 Number of layers
 Number of units/ neurons in a layer
 Activation function
 Initialization of weights
 Loss function
 Metric
 Optimizer

49
 Number of epochs
Coding ANN in Tensorflow
Load the preprocessed data
The data you feed to the ANN must be preprocessed thoroughly to yield reliable results. The
training data has been preprocessed already. The preprocessing steps involved are,
 MICE Imputation
 Log transformation
 Square root transformation
 Ordinal Encoding
 Target Encoding
 Z-Score Normalization
For the detailed implementation of the above-mentioned steps refer my notebook on data
preprocessing
Here's the explanation of the steps to build a neural network in table form:
Step Explanation
1. Import Import necessary libraries like numpy for numerical operations and
Libraries tensorflow.keras for building the neural network.
2. Load Load the MNIST dataset using keras.datasets.mnist.load_data(), which
Dataset provides training and test sets of 28x28 grayscale images of handwritten
digits.
3. Preprocess Normalize the pixel values (0–255) to the range [0, 1] and flatten the
Data 28x28 images into 1D vectors (784 features). Convert labels into one-hot
encoding for multi-class classification.
4. Define the Use keras.Sequential() to define a feedforward neural network with: (1) an
Model input layer for 784 features, (2) a hidden layer with 128 neurons and ReLU
activation, and (3) an output layer with 10 neurons and softmax activation
(for classification).
5. Compile Compile the model by defining: (1) optimizer as 'adam' (which adjusts
the Model weights to minimize loss), (2) loss as 'categorical_crossentropy' (for multi-
class classification), and (3) metrics as 'accuracy' (to track performance).
6. Train the Train the neural network using model.fit() on the training data. Use 10
Model epochs (iterations over the dataset), with a batch size of 32. Split 20% of
the training data for validation during training.
7. Evaluate Use model.evaluate() to assess the trained model's performance on the test
the Model data by calculating accuracy and loss.

50
8. Make Use model.predict() to generate predictions on new or unseen data. For
Predictions example, you can predict the label of the first test sample by examining the
model's output.
9. Save the Use model.save() to store the trained model for later use, so it can be
Model reloaded without retraining.
This table summarizes each step and the purpose behind it, from loading the data to making
predictions and saving the model.
4o

51

You might also like