ML Unit-Ii
ML Unit-Ii
ML Unit-Ii
We can understand the artificial neural network with an example, consider an example of a
digital logic gate that takes an input and gives an output. "OR" gate, which takes two inputs. If
one or both the inputs are "On," then we get "On" in output. If both the inputs are "Off," then
we get "Off" in output. Here the output depends upon input. Our brain does not perform the
same task. The outputs to inputs relationship keep changing because of the neurons in our
brain, which are "learning."
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs
and includes a bias. This computation is represented in the form of a transfer function.
It determines weighted total is passed as an input to an activation function to produce the
output. Activation functions choose whether a node should fire or not. Only those who are fired
make it to the output layer. There are distinctive activation functions available that can be
applied upon the sort of task we are performing.
Artificial neural networks have a numerical value that can perform more than one task
simultaneously.
Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent
the network from working.
After ANN training, the information may produce output even with inadequate data. The
loss of performance here relies upon the significance of missing data.
Extortion of one or more cells of ANN does not prohibit it from generating output, and
this feature makes the network fault-tolerance.
Afterward, each of the input is multiplied by its corresponding weights ( these weights are
the details utilized by the artificial neural networks to solve a specific problem ). In general
terms, these weights normally represent the strength of the interconnection between
neurons inside the artificial neural network. All the weighted inputs are summarized inside
the computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.
The activation function refers to the set of transfer functions used to achieve the desired
output. There is a different kind of the activation function, but primarily either linear or
non-linear sets of functions. Some of the commonly used sets of activation functions are
the Binary, linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look at
each of them in details:
Binary:
In binary activation function, the output is either a one or a 0. Here, to accomplish this,
there is a threshold value set up. If the net weighted input of neurons is more than 1, then
the final output of the activation function is returned as one or else the output is returned
as 0.
Sigmoidal Hyperbolic:
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan
hyperbolic function is used to approximate output from the actual net input. The function
is defined as:
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally. As per the University of Massachusetts, Lowell Centre for Atmospheric
Research. The feedback networks feed information back into itself and are well suited to
solve optimization issues. The Internal system error corrections utilize feedback ANNs.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output layer,
and at least one layer of a neuron. Through assessment of its output by reviewing its input, the
intensity of the network can be noticed based on group behavior of the associated neurons, and
the output is decided. The primary advantage of this network is that it figures out how to
evaluate and recognize input patterns.
The target function to be learned is defined over instances that can be described
by a vector of predefined features.
Perceptron is a linear Machine Learning algorithm used for supervised learning for various
binary classifiers.
Perceptron model is also treated as one of the best and simplest types of Artificial Neural
networks. However, it is a supervised learning algorithm of binary classifiers. Hence, we
can consider it as a single-layer neural network with four main parameters, i.e., input
values, weights and Bias, net sum, and an activation function.
Binary classifiers can be considered as linear classifiers. In simple words, we can understand it as
a classification algorithm that can predict linear predictor function in terms of weight and
feature vectors.
Binary classification is a fundamental task in machine learning, where the goal is to categorize data
into one of two classes or categories.
Binary classification is used in a wide range of applications, such as spam email detection, medical
diagnosis, sentiment analysis, fraud detection, and many more.
In this article, we'll explore binary classification using TensorFlow, one of the most popular deep
learning libraries.
Before getting into the Binary Classification, let's discuss a little about classification problem in
Machine Learning.
This is the primary component of Perceptron which accepts the initial data into the system
for further processing. Each input node contains a real numerical value.
Weight parameter represents the strength of the connection between units. This is
another most important parameter of Perceptron components. Weight is directly
proportional to the strength of the associated input neuron in deciding the output.
Further, Bias can be considered as the line of intercept in a linear equation.
o Activation Function:
These are the final and important components that help to determine whether the neuron
will fire or not. Activation Function can be considered primarily as a step function.
o Sign function
o Step function, and
o Sigmoid function
This step function or Activation function plays a vital role in ensuring that output is
mapped between required values (0,1) or (-1,1). It is important to note that the weight of
input is indicative of the strength of a node. Similarly, an input's bias value gives the ability
to shift the activation function curve up or down.
Step-1
In the first step first, multiply all input values with corresponding weight values and then
add them to determine the weighted sum. Mathematically, we can calculate the weighted
sum as follows:
Add a special term called bias 'b' to this weighted sum to improve the model's
performance.
∑wi*xi + b
Step-2
In the second step, an activation function is applied with the above-mentioned weighted
sum, which gives us output either in binary form or a continuous value as follows:
Y = f(∑wi*xi + b)
In a single layer perceptron model, its algorithms do not contain recorded data, so it
begins with inconstantly allocated input for weight parameters. Further, it sums up all
inputs (weight). After adding all inputs, if the total sum of all inputs is more than a pre-
determined value, the model gets activated and shows the output value as +1.
The multi-layer perceptron model is also known as the Backpropagation algorithm, which
executes in two stages as follows:
o Forward Stage: Activation functions start from the input layer in the forward stage
and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are modified as
per the model's requirement. In this stage, the error between actual output and
demanded originated backward on the output layer and ended on the input layer.
Perceptron Function
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with the
learned weight coefficient 'w'.
f(x)=1; if w.x+b>0
otherwise, f(x)=0
The main features of Backpropagation are the iterative, recursive and efficient method
through which it calculates the updated weight to improve the network until it is not able
to perform the task for which it is being trained. Derivatives of the activation function to
be known at network design time is required to Backpropagation.
Now, how error function is used in Backpropagation and how Backpropagation works?
Let start with an example and do it mathematically to understand how exactly updates
the weight using Backpropagation.
Input values
X1=0.05
X2=0.10
Initial weight
W1=0.15 w5=0.40
W2=0.20 w6=0.45
W3=0.25 w7=0.50
W4=0.30 w8=0.55
Bias Values
b1=0.35 b2=0.60
Target Values
T1=0.01
T2=0.99
Forward Pass
To find the value of H1 we first multiply the input value from the weights as
H1=x1×w1+x2×w2+b1
H1=0.05×0.15+0.10×0.20+0.35
H1=0.3775
H2=x1×w3+x2×w4+b1
H2=0.05×0.25+0.10×0.30+0.35
H2=0.3925
Now, we calculate the values of y1 and y2 in the same way as we calculate the H1 and H2.
To find the value of y1, we first multiply the input value i.e., the outcome of H1 and H2
from the weights as
y1=H1×w5+H2×w6+b2
y1=0.593269992×0.40+0.596884378×0.45+0.60
y1=1.10590597
y2=H1×w7+H2×w8+b2
y2=0.593269992×0.50+0.596884378×0.55+0.60
y2=1.2249214
Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched with our target
values T1 and T2.Now, we will find the total error, which is simply the difference between
the outputs from the target outputs. The total error is calculated as
which yields a weight update rule identical to the BACKPROPAGATruIOleN, except that each weight is
multiplied by the constant (1 - 2yq) upon each iteration. Thus, choosing this definition of E is equivalent to
using a weight decay strategy (see Exercise 4.10.) a Adding a term for errors in the slope, or derivative of the
target function.
In some cases, training information may be available regarding desired derivatives of the target function, as
well as desired values. For example, Simard et al. (1992) describe an application to character recognition in
which certain training derivatives are used to constrain the network to learn character recognition functions
that are invariant of translation within the image.
Mitchell and Thrun (1993) describe methods for calculating training derivatives based on the learner's prior
knowledge. In both of these systems (described in Chapter 12), the error function is modified to add a term
measuring the discrepancy between these training derivatives and the actual derivatives of the learned network.
One example of such an error function is
Genetic Algorithm in Machine Learning
A genetic algorithm is an adaptive heuristic search algorithm inspired by "Darwin's
theory of evolution in Nature." It is used to solve optimization problems in machine
learning. It is one of the important algorithms as it helps solve complex problems that
would take a long time to solve.
Genetic Algorithms are being widely used in different real-world applications, for
example, Designing electronic circuits, code-breaking, image processing, and artificial
creativity.
After calculating the fitness of every existent in the population, a selection process is used
to determine which of the individualities in the population will get to reproduce and
produce the seed that will form the coming generation.
So, now we can define a genetic algorithm as a heuristic search algorithm to solve
optimization problems. It is a subset of evolutionary algorithms, which is used in
computing. A genetic algorithm uses genetic and natural selection concepts to solve
optimization problems.
It basically involves five phases to solve the complex optimization problems, which are
given as below:
o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination
1. Initialization
The process of a genetic algorithm starts by generating the set of individuals, which is
called population. Here each individual is the solution for the given problem. An individual
contains or is characterized by a set of parameters called Genes. Genes are combined into
a string and generate chromosomes, which is the solution to the problem. One of the
most popular techniques for initialization is the use of random binary strings.
2. Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the ability of an
individual to compete with other individuals. In every iteration, individuals are evaluated
based on their fitness function. The fitness function provides a fitness score to each
individual. This score further determines the probability of being selected for
reproduction. The high the fitness score, the more chances of getting selected for
reproduction.
3. Selection
The selection phase involves the selection of individuals for the reproduction of offspring.
All the selected individuals are then arranged in a pair of two to increase reproduction.
Then these individuals transfer their genes to the next generation.
o Crossover: The crossover plays a most significant role in the reproduction phase of the
genetic algorithm. In this process, a crossover point is selected at random within the genes.
Then the crossover operator swaps genetic information of two parents from the current
generation to produce a new individual representing the offspring.
The genes of parents are exchanged among themselves until the crossover point is met.
These newly generated offspring are added to the population. This process is also called
or crossover. Types of crossover styles available:
o One point crossover
o Two-point crossover
o Livery crossover
o Inheritable Algorithms crossover
Mutation
The mutation operator inserts random genes in the offspring (new child) to maintain the
diversity in the population. It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances
diversification. The below image shows the mutation process:
Types of mutation styles available,
o Flip bit mutation
o Gaussian mutation
o Exchange/Swap mutation
5. Termination
After the reproduction phase, a stopping criterion is applied as a base for termination.
The algorithm terminates after the threshold fitness solution is reached. It will identify the
final solution as the best solution in the population.