Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

ML Unit-Ii

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

NEURAL NETWORKS AND GENETIC ALGORITHMS

What is Artificial Neural Network?


The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain. Similar to the human brain that has neurons
interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are known
as nodes.
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.

Relationship between Biological neural network and artificial neural network:

We can understand the artificial neural network with an example, consider an example of a
digital logic gate that takes an input and gives an output. "OR" gate, which takes two inputs. If
one or both the inputs are "On," then we get "On" in output. If both the inputs are "Off," then
we get "Off" in output. Here the output depends upon input. Our brain does not perform the
same task. The outputs to inputs relationship keep changing because of the neurons in our
brain, which are "learning."

The architecture of an artificial neural network:


To understand the concept of the architecture of an artificial neural network, we have to
understand what a neural network consists of. In order to define a neural network that
consists of a large number of artificial neurons, which are termed units arranged in a
sequence of layers. Lets us look at various types of layers available in an artificial neural
network.

Artificial Neural Network primarily consists of three layers:

Input Layer:

As the name suggests, it accepts inputs in several different formats provided by the
programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.

The artificial neural network takes input and computes the weighted sum of the inputs
and includes a bias. This computation is represented in the form of a transfer function.
It determines weighted total is passed as an input to an activation function to produce the
output. Activation functions choose whether a node should fire or not. Only those who are fired
make it to the output layer. There are distinctive activation functions available that can be
applied upon the sort of task we are performing.

Advantages of Artificial Neural Network (ANN)


Parallel processing capability:

Artificial neural networks have a numerical value that can perform more than one task
simultaneously.

Storing data on the entire network:

Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent
the network from working.

Capability to work with incomplete knowledge:

After ANN training, the information may produce output even with inadequate data. The
loss of performance here relies upon the significance of missing data.

Having a memory distribution:

For ANN is to be able to adapt, it is important to determine the examples and to


encourage the network according to the desired output by demonstrating these examples
to the network. The succession of the network is directly proportional to the chosen
instances, and if the event can't appear to the network in all its aspects, it can produce
false output.

Having fault tolerance:

Extortion of one or more cells of ANN does not prohibit it from generating output, and
this feature makes the network fault-tolerance.

Working of artificial neural networks:


Artificial Neural Network can be best represented as a weighted directed graph, where
the artificial neurons form the nodes. The association between the neurons outputs and
neuron inputs can be viewed as the directed edges with weights. The Artificial Neural
Network receives the input signal from the external source in the form of a pattern and
image in the form of a vector. These inputs are then mathematically assigned by the
notations x(n) for every n number of inputs.

Afterward, each of the input is multiplied by its corresponding weights ( these weights are
the details utilized by the artificial neural networks to solve a specific problem ). In general
terms, these weights normally represent the strength of the interconnection between
neurons inside the artificial neural network. All the weighted inputs are summarized inside
the computing unit.

If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.
The activation function refers to the set of transfer functions used to achieve the desired
output. There is a different kind of the activation function, but primarily either linear or
non-linear sets of functions. Some of the commonly used sets of activation functions are
the Binary, linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look at
each of them in details:

Binary:
In binary activation function, the output is either a one or a 0. Here, to accomplish this,
there is a threshold value set up. If the net weighted input of neurons is more than 1, then
the final output of the activation function is returned as one or else the output is returned
as 0.

Sigmoidal Hyperbolic:
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan
hyperbolic function is used to approximate output from the actual net input. The function
is defined as:

F(x) = (1/1 + exp(-????x))

Where ???? is considered the Steepness parameter.

Types of Artificial Neural Network:


There are various types of Artificial Neural Networks (ANN) depending upon the human
brain neuron and network functions, an artificial neural network similarly performs tasks.
The majority of the artificial neural networks will have some similarities with a more
complex biological partner and are very effective at their expected tasks. For example,
segmentation or classification.

Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally. As per the University of Massachusetts, Lowell Centre for Atmospheric
Research. The feedback networks feed information back into itself and are well suited to
solve optimization issues. The Internal system error corrections utilize feedback ANNs.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output layer,
and at least one layer of a neuron. Through assessment of its output by reviewing its input, the
intensity of the network can be noticed based on group behavior of the associated neurons, and
the output is decided. The primary advantage of this network is that it figures out how to
evaluate and recognize input patterns.

Appropriate Problems for ANN


 Training data is noisy, complex sensor data also problems where symbolic
algos are used (decision tree learning (DTL)) - ANN and DTL produce results of
comparable accuracy
 Instances are attribute-value pairs, attributes may be highly correlated or
independent, values can be any real value : -

The target function to be learned is defined over instances that can be described
by a vector of predefined features.

 Target function may be discrete-valued, real-valued or a vector :-


Target function output may be discrete-valued, real-valued, or a vector of several
real- or discrete-valued attributes

 Training examples may contain errors :-


Training examples may contain errors: ANN learning methods are quite robust to
noise in the training data.

 Long training times are acceptable :-


Long training times are acceptable: Network training algorithms typically require
longer training times than, say, decision tree learning algorithms. Training times
can range from a few seconds to many hours, depending on factors such as the
number of weights in the network, the number of training examples considered,
and the settings of various learning algorithm parameters.

 Requires fast eval. of learned target function :-


Fast evaluation of the learned target function may be required. Although ANN
learning times are relatively long, evaluating the learned network, in order to
apply it to a subsequent instance, is typically very fast.

 Humans do NOT need to understand the learned target function :-


The ability for humans to understand the learned target function is not important.
The weights learned by neural networks are often difficult for humans to
interpret. Learned neural networks are less easily communicated to humans than
learned rules.

Perceptron in Machine Learning


In Machine Learning and Artificial Intelligence, Perceptron is the most commonly used
term for all folks. It is the primary step to learn Machine Learning and Deep Learning
technologies, which consists of a set of weights, input values or scores, and a threshold.

Perceptron is a linear Machine Learning algorithm used for supervised learning for various
binary classifiers.

What is the Perceptron model in Machine Learning?


Perceptron is Machine Learning algorithm for supervised learning of various binary
classification tasks. Further, Perceptron is also understood as an Artificial Neuron or
neural network unit that helps to detect certain input data computations in business
intelligence.

Perceptron model is also treated as one of the best and simplest types of Artificial Neural
networks. However, it is a supervised learning algorithm of binary classifiers. Hence, we
can consider it as a single-layer neural network with four main parameters, i.e., input
values, weights and Bias, net sum, and an activation function.

Binary classifier in Machine Learning


In Machine Learning, binary classifiers are defined as the function that helps in deciding
whether input data can be represented as vectors of numbers and belongs to some
specific class.

Binary classifiers can be considered as linear classifiers. In simple words, we can understand it as
a classification algorithm that can predict linear predictor function in terms of weight and
feature vectors.

Binary classification is a fundamental task in machine learning, where the goal is to categorize data
into one of two classes or categories.
Binary classification is used in a wide range of applications, such as spam email detection, medical
diagnosis, sentiment analysis, fraud detection, and many more.

In this article, we'll explore binary classification using TensorFlow, one of the most popular deep
learning libraries.

Before getting into the Binary Classification, let's discuss a little about classification problem in
Machine Learning.

Basic Components of Perceptron


o Input Nodes or Input Layer:

This is the primary component of Perceptron which accepts the initial data into the system
for further processing. Each input node contains a real numerical value.

o Wight and Bias:

Weight parameter represents the strength of the connection between units. This is
another most important parameter of Perceptron components. Weight is directly
proportional to the strength of the associated input neuron in deciding the output.
Further, Bias can be considered as the line of intercept in a linear equation.

o Activation Function:

These are the final and important components that help to determine whether the neuron
will fire or not. Activation Function can be considered primarily as a step function.

Types of Activation functions:

o Sign function
o Step function, and
o Sigmoid function

How does Perceptron work?


In Machine Learning, Perceptron is considered as a single-layer neural network that
consists of four main parameters named input values (Input nodes), weights and Bias, net
sum, and an activation function. The perceptron model begins with the multiplication of
all input values and their weights, then adds these values together to create the weighted
sum. Then this weighted sum is applied to the activation function 'f' to obtain the desired
output. This activation function is also known as the step function and is represented
by 'f'.

This step function or Activation function plays a vital role in ensuring that output is
mapped between required values (0,1) or (-1,1). It is important to note that the weight of
input is indicative of the strength of a node. Similarly, an input's bias value gives the ability
to shift the activation function curve up or down.

Perceptron model works in two important steps as follows:

Step-1

In the first step first, multiply all input values with corresponding weight values and then
add them to determine the weighted sum. Mathematically, we can calculate the weighted
sum as follows:

∑wi*xi = x1*w1 + x2*w2 +…wn*xn

Add a special term called bias 'b' to this weighted sum to improve the model's
performance.

∑wi*xi + b

Step-2

In the second step, an activation function is applied with the above-mentioned weighted
sum, which gives us output either in binary form or a continuous value as follows:

Y = f(∑wi*xi + b)

Types of Perceptron Models


Based on the layers, Perceptron models are divided into two types. These are as follows:

1. Single-layer Perceptron Model


2. Multi-layer Perceptron model
Single Layer Perceptron Model:
This is one of the easiest Artificial neural networks (ANN) types. A single-layered
perceptron model consists feed-forward network and also includes a threshold transfer
function inside the model. The main objective of the single-layer perceptron model is to
analyze the linearly separable objects with binary outcomes.

In a single layer perceptron model, its algorithms do not contain recorded data, so it
begins with inconstantly allocated input for weight parameters. Further, it sums up all
inputs (weight). After adding all inputs, if the total sum of all inputs is more than a pre-
determined value, the model gets activated and shows the output value as +1.

If the outcome is same as pre-determined or threshold value, then the performance of


this model is stated as satisfied, and weight demand does not change. However, this
model consists of a few discrepancies triggered when multiple weight inputs values are
fed into the model. Hence, to find desired output and minimize errors, some changes
should be necessary for the weights input.

"Single-layer perceptron can learn only linearly separable patterns."

Multi-Layered Perceptron Model:


Like a single-layer perceptron model, a multi-layer perceptron model also has the same
model structure but has a greater number of hidden layers.

The multi-layer perceptron model is also known as the Backpropagation algorithm, which
executes in two stages as follows:

o Forward Stage: Activation functions start from the input layer in the forward stage
and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are modified as
per the model's requirement. In this stage, the error between actual output and
demanded originated backward on the output layer and ended on the input layer.

Hence, a multi-layered perceptron model has considered as multiple artificial neural


networks having various layers in which activation function does not remain linear, similar
to a single layer perceptron model. Instead of linear, activation function can be executed
as sigmoid, TanH, ReLU, etc., for deployment.
A multi-layer perceptron model has greater processing power and can process linear and
non-linear patterns. Further, it can also implement logic gates such as AND, OR, XOR,
NAND, NOT, XNOR, NOR.

Perceptron Function
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with the
learned weight coefficient 'w'.

Mathematically, we can express it as follows:

f(x)=1; if w.x+b>0

otherwise, f(x)=0

o 'w' represents real-valued weights vector


o 'b' represents the bias
o 'x' represents a vector of input x values.

MULTI LAYER NETWORKS


BACKPROPOGATION ALGORITHM
Backpropagation is one of the important concepts of a neural network. Our task is to
classify our data best. For this, we have to update the weights of parameter and bias, but
how can we do that in a deep neural network? In the linear regression model, we use
gradient descent to optimize the parameter. Similarly here we also use gradient descent
algorithm using Backpropagation.

For a single training example, Backpropagation algorithm calculates the gradient of


the error function. Backpropagation can be written as a function of the neural network.
Backpropagation algorithms are a set of methods used to efficiently train artificial neural
networks following a gradient descent approach which exploits the chain rule.

The main features of Backpropagation are the iterative, recursive and efficient method
through which it calculates the updated weight to improve the network until it is not able
to perform the task for which it is being trained. Derivatives of the activation function to
be known at network design time is required to Backpropagation.

Now, how error function is used in Backpropagation and how Backpropagation works?
Let start with an example and do it mathematically to understand how exactly updates
the weight using Backpropagation.
Input values
X1=0.05
X2=0.10

Initial weight
W1=0.15 w5=0.40
W2=0.20 w6=0.45
W3=0.25 w7=0.50
W4=0.30 w8=0.55

Bias Values
b1=0.35 b2=0.60

Target Values
T1=0.01
T2=0.99

Now, we first calculate the values of H1 and H2 by a forward pass.

Forward Pass
To find the value of H1 we first multiply the input value from the weights as

H1=x1×w1+x2×w2+b1
H1=0.05×0.15+0.10×0.20+0.35
H1=0.3775

To calculate the final result of H1, we performed the sigmoid function as


We will calculate the value of H2 in the same way as H1

H2=x1×w3+x2×w4+b1
H2=0.05×0.25+0.10×0.30+0.35
H2=0.3925

To calculate the final result of H1, we performed the sigmoid function as

Now, we calculate the values of y1 and y2 in the same way as we calculate the H1 and H2.

To find the value of y1, we first multiply the input value i.e., the outcome of H1 and H2
from the weights as

y1=H1×w5+H2×w6+b2
y1=0.593269992×0.40+0.596884378×0.45+0.60
y1=1.10590597

To calculate the final result of y1 we performed the sigmoid function as


We will calculate the value of y2 in the same way as y1

y2=H1×w7+H2×w8+b2
y2=0.593269992×0.50+0.596884378×0.55+0.60
y2=1.2249214

To calculate the final result of H1, we performed the sigmoid function as

Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched with our target
values T1 and T2.Now, we will find the total error, which is simply the difference between
the outputs from the target outputs. The total error is calculated as

So, the total error is


We have updated all the weights. We found the error 0.298371109 on the network when
we fed forward the 0.05 and 0.1 inputs. In the first round of Backpropagation, the total
error is down to 0.291027924. After repeating this process 10,000, the total error is
down to 0.0000351085. At this point, the outputs neurons generate 0.159121960 and
0.984065734 i.e., nearby our target value when we feed forward the 0.05 and 0.1.
ADVANCED TOPICS IN ARTIFICIAL NEURAL
NETWORKS
4.8.1 Alternative Error Functions
As noted earlier, gradient descent can be performed for any function E that is differentiable with respect to
the parameterized hypothesis space. While the basic BAcWROPAGATION algorithm defines E in terms
of the sum of squared errors of the network, other definitions have been suggested in order to incorporate
other constraints into the weight-tuning rule. For each new definition of E a new weight-tuning rule for
gradient descent must be derived. Examples of alternative definitions of E include a Adding a penalty term
for weight magnitude. As discussed above, we can add a term to E that increases with the magnitude of the
weight vector.
This causes the gradient descent search to seek weight vectors with small magnitudes, thereby reducing
the risk of overfitting. One way to do this is to redefine E as

which yields a weight update rule identical to the BACKPROPAGATruIOleN, except that each weight is
multiplied by the constant (1 - 2yq) upon each iteration. Thus, choosing this definition of E is equivalent to
using a weight decay strategy (see Exercise 4.10.) a Adding a term for errors in the slope, or derivative of the
target function.
In some cases, training information may be available regarding desired derivatives of the target function, as
well as desired values. For example, Simard et al. (1992) describe an application to character recognition in
which certain training derivatives are used to constrain the network to learn character recognition functions
that are invariant of translation within the image.
Mitchell and Thrun (1993) describe methods for calculating training derivatives based on the learner's prior
knowledge. In both of these systems (described in Chapter 12), the error function is modified to add a term
measuring the discrepancy between these training derivatives and the actual derivatives of the learned network.
One example of such an error function is
Genetic Algorithm in Machine Learning
A genetic algorithm is an adaptive heuristic search algorithm inspired by "Darwin's
theory of evolution in Nature." It is used to solve optimization problems in machine
learning. It is one of the important algorithms as it helps solve complex problems that
would take a long time to solve.

Genetic Algorithms are being widely used in different real-world applications, for
example, Designing electronic circuits, code-breaking, image processing, and artificial
creativity.

What is a Genetic Algorithm?


Before understanding the Genetic algorithm, let's first understand basic terminologies to better
understand this algorithm:

o Population: Population is the subset of all possible or probable solutions, which


can solve the given problem.
o Chromosomes: A chromosome is one of the solutions in the population for the
given problem, and the collection of gene generate a chromosome.
o Gene: A chromosome is divided into a different gene, or it is an element of the
chromosome.
o Allele: Allele is the value provided to the gene within a particular chromosome.
o Fitness Function: The fitness function is used to determine the individual's fitness
level in the population. It means the ability of an individual to compete with other
individuals. In every iteration, individuals are evaluated based on their fitness
function.
o Genetic Operators: In a genetic algorithm, the best individual mate to regenerate
offspring better than parents. Here genetic operators play a role in changing the
genetic composition of the next generation.
o Selection

After calculating the fitness of every existent in the population, a selection process is used
to determine which of the individualities in the population will get to reproduce and
produce the seed that will form the coming generation.

Types of selection styles available

o Roulette wheel selection


o Event selection
o Rank- grounded selection

So, now we can define a genetic algorithm as a heuristic search algorithm to solve
optimization problems. It is a subset of evolutionary algorithms, which is used in
computing. A genetic algorithm uses genetic and natural selection concepts to solve
optimization problems.

Working of Genetic Algorithm


The genetic algorithm works on the evolutionary generational cycle to generate high-
quality solutions. These algorithms use different operations that either enhance or replace
the population to give an improved fit solution.

It basically involves five phases to solve the complex optimization problems, which are
given as below:

o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination

1. Initialization
The process of a genetic algorithm starts by generating the set of individuals, which is
called population. Here each individual is the solution for the given problem. An individual
contains or is characterized by a set of parameters called Genes. Genes are combined into
a string and generate chromosomes, which is the solution to the problem. One of the
most popular techniques for initialization is the use of random binary strings.

2. Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the ability of an
individual to compete with other individuals. In every iteration, individuals are evaluated
based on their fitness function. The fitness function provides a fitness score to each
individual. This score further determines the probability of being selected for
reproduction. The high the fitness score, the more chances of getting selected for
reproduction.

3. Selection
The selection phase involves the selection of individuals for the reproduction of offspring.
All the selected individuals are then arranged in a pair of two to increase reproduction.
Then these individuals transfer their genes to the next generation.

There are three types of Selection methods available, which are:

o Roulette wheel selection


o Tournament selection
o Rank-based selection
4. Reproduction
After the selection process, the creation of a child occurs in the reproduction step. In this
step, the genetic algorithm uses two variation operators that are applied to the parent
population. The two operators involved in the reproduction phase are given below:

o Crossover: The crossover plays a most significant role in the reproduction phase of the
genetic algorithm. In this process, a crossover point is selected at random within the genes.
Then the crossover operator swaps genetic information of two parents from the current
generation to produce a new individual representing the offspring.

The genes of parents are exchanged among themselves until the crossover point is met.
These newly generated offspring are added to the population. This process is also called
or crossover. Types of crossover styles available:
o One point crossover
o Two-point crossover
o Livery crossover
o Inheritable Algorithms crossover

Mutation
The mutation operator inserts random genes in the offspring (new child) to maintain the
diversity in the population. It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances
diversification. The below image shows the mutation process:
Types of mutation styles available,
o Flip bit mutation
o Gaussian mutation
o Exchange/Swap mutation
5. Termination
After the reproduction phase, a stopping criterion is applied as a base for termination.
The algorithm terminates after the threshold fitness solution is reached. It will identify the
final solution as the best solution in the population.

General Workflow of a Simple Genetic Algorithm


Advantages of Genetic Algorithm
o The parallel capabilities of genetic algorithms are best.
o It helps in optimizing various problems such as discrete functions, multi-objective
problems, and continuous functions.
o It provides a solution for a problem that improves over time.
o A genetic algorithm does not need derivative information.
Limitations of Genetic Algorithms
o Genetic algorithms are not efficient algorithms for solving simple problems.
o It does not guarantee the quality of the final solution to a problem.
o Repetitive calculation of fitness values may generate some computational challenges.

Difference between Genetic Algorithms and Traditional


Algorithms
o A search space is the set of all possible solutions to the problem. In the traditional
algorithm, only one set of solutions is maintained, whereas, in a genetic algorithm, several
sets of solutions in search space can be used.
o Traditional algorithms need more information in order to perform a search, whereas
genetic algorithms need only one objective function to calculate the fitness of an
individual.
o Traditional Algorithms cannot work parallelly, whereas genetic Algorithms can work
parallelly (calculating the fitness of the individualities are independent).
o One of the big differences between traditional algorithm and genetic algorithm is that it
does not directly operate on candidate solutions.
o Traditional Algorithms can only generate one result in the end, whereas Genetic
Algorithms can generate multiple optimal results from different generations.
o The traditional algorithm is not more likely to generate optimal results, whereas Genetic
algorithms do not guarantee to generate optimal global results, but also there is a great
possibility of getting the optimal result for a problem as it uses genetic operators such as
Crossover and Mutation.
o Traditional algorithms are deterministic in nature, whereas Genetic algorithms are
probabilistic and stochastic in nature.

You might also like