Class Notes Unit 2
Class Notes Unit 2
Artificial Neural Network Tutorial provides basic and advanced concepts of ANNs. Our Artificial
Neural Network tutorial is developed for beginners as well as professions. The term "Artificial
neural network" refers to a biologically inspired sub-field of artificial intelligence modeled after
the brain. An Artificial neural network is usually a computational network based on biological
neural networks that construct the structure of the human brain. Similar to a human brain has
neurons interconnected to each other, artificial neural networks also have neurons that are linked
to each other in various layers of the networks. These neurons are known as nodes.
Artificial neural network tutorial covers all the aspects related to the artificial neural network. In
this tutorial, we will discuss ANNs, Adaptive resonance theory, Kohonen self-organizing map,
Building blocks, unsupervised learning, Genetic algorithm, etc.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
Dendrites Inputs
Synapse Weights
Axon Output
There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in
such a manner as to be distributed, and we can extract more than one piece of this data
when necessary from our memory parallelly. We can say that the human brain is made up
of incredibly amazing parallel processors.
The architecture of an artificial neural network:
To understand the concept of the architecture of an artificial neural network, we have to
understand what a neural network consists of. In order to define a neural network that
consists of a large number of artificial neurons, which are termed units arranged in a
sequence of layers. Lets us look at various types of layers available in an artificial neural
network.
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs
and includes a bias. This computation is represented in the form of a transfer function.
Artificial neural networks have a numerical value that can perform more than one task
simultaneously.
Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent
the network from working.
After ANN training, the information may produce output even with inadequate data. The
loss of performance here relies upon the significance of missing data.
Extortion of one or more cells of ANN does not prohibit it from generating output, and
this feature makes the network fault-tolerance.
There is no particular guideline for determining the structure of artificial neural networks.
The appropriate network structure is accomplished through experience, trial, and error.
It is the most significant issue of ANN. When ANN produces a testing solution, it does not
provide insight concerning why and how. It decreases trust in the network.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.
ANNs can work with numerical data. Problems must be converted into numerical values
before being introduced to ANN. The presentation mechanism to be resolved here will
directly impact the performance of the network. It relies on the user's abilities.
The network is reduced to a specific value of the error, and this value does not give us
optimum results.
Science artificial neural networks that have steeped into the world in the mid-20 th century are
exponentially developing. In the present time, we have investigated the pros of artificial neural
networks and the issues encountered in the course of their utilization. It should not be
overlooked that the cons of ANN networks, which are a flourishing science branch, are
eliminated individually, and their pros are increasing day by day. It means that artificial neural
networks will turn into an irreplaceable part of our lives progressively important.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.
The activation function refers to the set of transfer functions used to achieve the desired
output. There is a different kind of the activation function, but primarily either linear or
non-linear sets of functions. Some of the commonly used sets of activation functions are
the Binary, linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look at
each of them in details:
Binary:
In binary activation function, the output is either a one or a 0. Here, to accomplish this,
there is a threshold value set up. If the net weighted input of neurons is more than 1, then
the final output of the activation function is returned as one or else the output is returned
as 0.
Sigmoidal Hyperbolic:
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan
hyperbolic function is used to approximate output from the actual net input. The function
is defined as:
Few Common Activation Functions That Are Used In Artificial Neural Network
Are:
#1) Identity Function
It can be defined as f(x) = x for all values of x. This is a linear function where the output
is the same as the input.
It is of two types:
Binary Sigmoid function: It is also called as the unipolar sigmoid function or
logistic sigmoid function. The range of sigmoidal functional is 0 to 1.
Bipolar Sigmoid: The bipolar sigmoidal function ranges from -1 to +1. It is
similar to the hyperbolic tangent function.
[image source]
#5) RampFunction
The weighted sum of Inputs means the “product of the weight of input and value of
input” summed together for all inputs.
Let I= {I1, I2, I3… In} be the input pattern to neuron.
Let W= {W1, W2, W3… Wn} be the weight associated with each input to the node.
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally. As per the University of Massachusetts, Lowell Centre for Atmospheric
Research. The feedback networks feed information back into itself and are well suited to
solve optimization issues. The Internal system error corrections utilize feedback ANNs.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an
output layer, and at least one layer of a neuron. Through assessment of its output by
reviewing its input, the intensity of the network can be noticed based on group behavior
of the associated neurons, and the output is decided. The primary advantage of this
network is that it figures out how to evaluate and recognize input patterns.
Basic Models Of ANN
The artificial neural network models consist of 3 entities:
Weights or synaptic connections
The learning rule used for adjusting the weights
Activation functions of the neuron
Neural Network Architecture
In ANN the neurons are interconnected and the output of each neuron is connected to
the next neuron through weights. The architecture of these interconnections is important
in an ANN. This arrangement is in the form of layers and the connection between the
layers and within the layer is the neural network architecture.
All the input nodes are connected to each of the output nodes. The term feed-forward
depicts that there is no feedback sent from the output layer to the input layer. This forms
a single layer feed-forward network.
[image source]
#2) Multi-Layer Feed-Forward Network
The Multi-layer network consists of one or more layers between the input and output.
The input layer just receives a signal and buffers it while the output layer shows the
output. The layers between the input and output are called the hidden layers.
The hidden layers are not in contact with the external environment. With more number
of hidden layers, the output response is more efficient. The nodes in the previous layer
are connected to each node in the next layer.
As there is no output layer connected to the input or hidden layers, it forms a multi-layer
feed-forward network.
The input has 3 neurons X1, X2 and X3, and single output Y.
The weights associated with the inputs are: {0.2, 0.1, -0.3}
X is -0.07
Based on the methods and way of learning, machine learning is divided into mainly four
types, which are:
Let's understand supervised learning with an example. Suppose we have an input dataset
of cats and dog images. So, first, we will provide the training to the machine to understand
the images, such as the shape & size of the tail of cat and dog, Shape of eyes, colour,
height (dogs are taller, cats are smaller), etc. After completion of training, we input the
picture of a cat and ask the machine to identify the object and predict the output. Now,
the machine is well trained, so it will check all the features of the object, such as height,
shape, colour, eyes, ears, tail, etc., and find that it's a cat. So, it will put it in the Cat
category. This is the process of how the machine identifies the objects in Supervised
Learning.
The main goal of the supervised learning technique is to map the input variable(x)
with the output variable(y). Some real-world applications of supervised learning
are Risk Assessment, Fraud Detection, Spam filtering, etc.
Categories of Supervised Machine Learning
Supervised machine learning can be classified into two types of problems, which are given
below:
o Classification
o Regression
a) Classification
Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The
classification algorithms predict the categories present in the dataset. Some real-world
examples of classification algorithms are Spam Detection, Email filtering, etc.
b) Regression
Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous
output variables, such as market trends, weather prediction, etc.
o Since supervised learning work with the labelled dataset so we can have an exact
idea about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior
experience.
Disadvantages:
o Image Segmentation:
o Medical Diagnosis:
Supervised algorithms are also used in the medical field for diagnosis purposes. It
is done by using medical images and past labelled data with labels for disease
conditions. With such a process, the machine can identify a disease for the new
patients.
In unsupervised learning, the models are trained with the data that is neither classified
nor labelled, and the model acts on that data without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Machines
are instructed to find the hidden patterns from the input dataset.
Let's take an example to understand it more preciously; suppose there is a basket of fruit
images, and we input it into the machine learning model. The images are totally unknown
to the model, and the task of the machine is to find the patterns and categories of the
objects.
So, now the machine will discover its patterns and differences, such as colour difference,
shape difference, and predict the output when it is tested with the test dataset.
o Clustering
o Association
1) Clustering
The clustering technique is used when we want to find the inherent groups from the data.
It is a way to group the objects into a cluster such that the objects with the most
similarities remain in one group and have fewer or no similarities with the objects of other
groups. An example of the clustering algorithm is grouping the customers by their
purchasing behaviour. Some of the popular clustering algorithms are given below:
Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-
growth algorithm.
o These algorithms can be used for complicated tasks compared to the supervised
ones because these algorithms work on the unlabeled dataset.
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled
dataset is easier as compared to the labelled dataset.
Disadvantages:
o The output of an unsupervised algorithm can be less accurate as the dataset is not
labelled, and algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled
dataset that does not map with the output.
3. Semi-Supervised Learning
Semi-Supervised learning is a type of Machine Learning algorithm that lies between
Supervised and Unsupervised machine learning. It represents the intermediate ground
between Supervised (With Labelled training data) and Unsupervised learning (with no
labelled training data) algorithms and uses the combination of labelled and unlabeled
datasets during the training period.
Disadvantages:
In reinforcement learning, there is no labelled data like supervised learning, and agents
learn from their experiences only.
The reinforcement learning process is similar to a human being; for example, a child learns
various things by experiences in his day-to-day life. An example of reinforcement learning
is to play a game, where the Game is the environment, moves of an agent at each step
define states, and the goal of the agent is to get a high score. Agent receives feedback in
terms of punishment and rewards.
Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.
o VideoGames:
RL algorithms are much popular in gaming applications. It is used to gain super-
human performance. Some popular games that use RL algorithms
are AlphaGO and AlphaGO Zero.
o ResourceManagement:
The "Resource Management with Deep Reinforcement Learning" paper showed
that how to use RL in computer to automatically learn and schedule resources to
wait for different jobs in order to minimize average job slowdown.
o Robotics:
RL is widely being used in Robotics applications. Robots are used in the industrial
and manufacturing area, and these robots are made more powerful with
reinforcement learning. There are different industries that have their vision of
building intelligent robots using AI and Machine learning technology.
o TextMining
Text-mining, one of the great applications of NLP, is now being implemented with
the help of Reinforcement Learning by Salesforce company.
Disadvantage
The curse of dimensionality limits reinforcement learning for real physical systems.
Winner-take-all (computing)
Winner-take-all networks are commonly used in computational models of the brain, particularly
for distributed decision-making or action selection in the cortex. Winner-take-all is a
computational principle applied in computational models of neural networks by
which neurons compete with each other for activation. In the classical form, only the neuron with
the highest activation stays active while all other neurons shut down; however, other variations
allow more than one neuron to be active, for example the soft winner take-all, by which a power
function is applied to the neurons.
Neural networks
In the theory of artificial neural networks, winner-take-all networks are a case of competitive
learning in recurrent neural networks. Output nodes in the network mutually inhibit each other,
while simultaneously activating themselves through reflexive connections. After some time, only
one node in the output layer will be active, namely the one corresponding to the strongest input.
Thus the network uses nonlinear inhibition to pick out the largest of a set of inputs. Winner-take-
all is a general computational primitive that can be implemented using different types of neural
network models, including both continuous-time and spiking networks.
Winner-take-all networks are commonly used in computational models of the brain, particularly
for distributed decision-making or action selection in the cortex. Important examples include
hierarchical models of vision,[3] and models of selective attention and recognition.[4][5] They are also
common in artificial neural networks and neuromorphic analog VLSI circuits. It has been formally
proven that the winner-take-all operation is computationally powerful compared to other nonlinear
operations, such as thresholding.
In many practical cases, there is not only one single neuron which becomes active but there are
exactly k neurons which become active for a fixed number k. This principle is referred to as k-
winners-take-all.
WTA focus
In many practical situations, a single neuron is not the only active neuron; instead, there are
precisely k neurons that become active for a defined number k. This concept is also known as k-
Winners-take-all. Following the taxonomy outlined by Scharstein et al. (IJCV 2002), WTA is a
local method for computing disparity in stereo matching algorithms. The discrepancy associated
with the lowest or maximum cost value is chosen at each pixel using a WTA approach. As a result,
early dominating firms in the electronic commerce market, like AOL or Yahoo!, get most of the
benefits.
Applications
WTA is a theory that encourages neurons to compete for learning opportunities. Only the top
neuron will remain active for some input following the competition, and the remainder will
gradually stop responding to that input. The generalizability and discriminatory powers of WTA
and other related learning approaches merit consideration. Biologically plausible learning methods
can be dense, local, or sparse.
As it only activates the unit that best matches the input pattern and suppresses the others through
set inhibitory connections, competitive learning, such as WTA, is a local learning rule.
The amount of discriminable input states that researchers can severely constrain this kind of
"grandmother cell" representation. It is also difficult to generalize because the winning unit only
activates when the input is somewhat close to its preferred input. Dense coding, which makes a lot
of units active for each input pattern, might be considered the other extreme. As a result, it can
encode many different discriminable input states. But as time goes on, using straightforward
neuron-like units to execute the mapping and learning becomes more challenging.
Conclusion
Researchers have shown that WTA is far more powerful than the threshold and sigmoidal gates
frequently utilized in conventional neural networks. It is a single k-WTA unit that may compute
any Boolean function. Furthermore, they demonstrated that any continuous function might be a
single soft WTA team, which takes values based on the rank of the associated input in linear order.
Another benefit is that linear-size approximation WTA computation can be carried out relatively
quickly in analog VLSI circuits. Thus, a single competitive WTA stage can replace complicated
feedforward multi-layered perceptron circuits, resulting in low-power analog VLSI processors.