Supervised Learning

SUPERVISED LEARNING
As the name suggests, supervised learning takes place under the

supervision of a teacher. This learning process is dependent. During the
training of ANN under supervised learning, the input vector is presented
to the network, which will produce an output vector. This output vector is
compared with the desired/target output vector. An error signal is
generated if there is a difference between the actual output and the
desired/target output vector. On the basis of this error signal, the
weights would be adjusted until the actual output is matched with the
desired output.
Perceptron
Developed by Frank Rosenblatt by using McCulloch and Pitts model,
perceptron is the basic operational unit of artificial neural networks. It
employs supervised learning rule and is able to classify the data into two
classes.
Operational characteristics of the perceptron: It consists of a single

neuron with an arbitrary number of inputs along with adjustable weights,
but the output of the neuron is 1 or 0 depending upon the threshold. It
also consists of a bias whose weight is always 1. Following figure gives a
schematic representation of the perceptron.
Perceptron thus has the following three basic elements −
 Links − It would have a set of connection links, which carries a weight

including a bias always having weight 1.
 Adder − It adds the input after they are multiplied with their respective
weights.
 Activation function − It limits the output of neuron. The most basic

activation function is a Heaviside step function that has two possible outputs.
This function returns 1, if the input is positive, and 0 for any negative input.
Adaptive Linear Neuron (Adaline)

Adaline which stands for Adaptive Linear Neuron, is a network having
a single linear unit. It was developed by Widrow and Hoff in 1960. Some
important points about Adaline are as follows −
 It uses bipolar activation function.
 It uses delta rule for training to minimize the Mean-Squared Error (MSE)
between the actual output and the desired/target output.
 The weights and the bias are adjustable.
Architecture
The basic structure of Adaline is similar to perceptron having an
extra feedback loop with the help of which the actual output is compared
with the desired/target output. After comparison on the basis of training
algorithm, the weights and bias will be updated.
Multiple Adaptive Linear Neuron (Madaline)

Madaline which stands for Multiple Adaptive Linear Neuron, is a
network which consists of many Adalines in parallel. It will have a single
output unit. Some important points about Madaline are as follows −
 It is just like a multilayer perceptron, where Adaline will act as a hidden unit
between the input and the Madaline layer.
 The weights and the bias between the input and Adaline layers, as in we see
in the Adaline architecture, are adjustable.
 The Adaline and Madaline layers have fixed weights and bias of 1.
 Training can be done with the help of Delta rule.
Architecture
The architecture of Madaline consists of “n” neurons of the input
layer, “m”neurons of the Adaline layer, and 1 neuron of the Madaline
layer. The Adaline layer can be considered as the hidden layer as it is
between the input layer and the output layer, i.e. the Madaline layer.
Back Propagation Neural Networks

Back Propagation Neural (BPN) is a multilayer neural network
consisting of the input layer, at least one hidden layer and output layer.
As its name suggests, back propagating will take place in this network.
The error which is calculated at the output layer, by comparing the target
output and the actual output, will be propagated back towards the input
layer.
Architecture
As shown in the diagram, the architecture of BPN has three
interconnected layers having weights on them. The hidden layer as well
as the output layer also has bias, whose weight is always 1, on them. As
is clear from the diagram, the working of BPN is in two phases. One
phase sends the signal from the input layer to the output layer, and the
other phase back propagates the error from the output layer to the input
layer.
Generalized Delta Learning Rule

Delta rule works only for the output layer. On the other hand,
generalized delta rule, also called as back-propagation rule, is a way of
creating the desired values of the hidden layer.
UNSUPERVISED LEARNING
As the name suggests, this type of learning is done without the

supervision of a teacher. This learning process is independent. During the
training of ANN under unsupervised learning, the input vectors of similar
type are combined to form clusters. When a new input pattern is applied,
then the neural network gives an output response indicating the class to
which input pattern belongs. In this, there would be no feedback from the
environment as to what should be the desired output and whether it is
correct or incorrect. Hence, in this type of learning the network itself must
discover the patterns, features from the input data and the relation for
the input data over the output.
Winner-Takes-All Networks
These kinds of networks are based on the competitive learning rule

and will use the strategy where it chooses the neuron with the greatest
total inputs as a winner. The connections between the output neurons
show the competition between them and one of them would be ‘ON’ which
means it would be the winner and others would be ‘OFF’.
Following are some of the networks based on this simple concept

using unsupervised learning.
Hamming Network
In most of the neural networks using unsupervised learning, it is

essential to compute the distance and perform comparisons. This kind of
network is Hamming network, where for every given input vectors, it
would be clustered into different groups. Following are some important
features of Hamming Networks −
 Lippmann started working on Hamming networks in 1987.

 It is a single layer network.
 The inputs can be either binary {0, 1} of bipolar {-1, 1}.
 The weights of the net are calculated by the exemplar vectors.
 It is a fixed weight network which means the weights would remain
the same even during training.
Max Net
This is also a fixed weight network, which serves as a subnet for

selecting the node having the highest input. All the nodes are fully
interconnected and there exists symmetrical weights in all these weighted
interconnections.
Architecture
It uses the mechanism which is an iterative process and each node

receives inhibitory inputs from all other nodes through connections. The
single node whose value is maximum would be active or winner and the
activations of all other nodes would be inactive. Max Net uses identity
activation function with
f(x)={x0ifx>0ifx≤0f(x)={xifx>00ifx≤0
The task of this net is accomplished by the self-excitation weight of

+1 and mutual inhibition magnitude, which is set like [0 < ɛ < 1m1m]
where “m” is the total number of the nodes.
Competitive Learning in ANN
It is concerned with unsupervised training in which the output nodes

try to compete with each other to represent the input pattern. To
understand this learning rule we will have to understand competitive net
which is explained as follows −
Basic Concept of Competitive Network
This network is just like a single layer feed-forward network having

feedback connection between the outputs. The connections between the
outputs are inhibitory type, which is shown by dotted lines, which means
the competitors never support themselves.
Basic Concept of Competitive Learning Rule
As said earlier, there would be competition among the output nodes

so the main concept is - during training, the output unit that has the
highest activation to a given input pattern, will be declared the winner.
This rule is also called Winner-takes-all because only the winning neuron
is updated and the rest of the neurons are left unchanged.
Mathematical Formulation
Following are the three important factors for mathematical formulation of

this learning rule −
 Condition to be a winner
Suppose if a neuron yk wants to be the winner, then there would be

the following condition
yk={10ifvk>vjforallj,j≠kotherwiseyk={1ifvk>vjforallj,j≠k0otherwis
e
It means that if any neuron, say, yk wants to win, then its induced

local
field theoutputofthesummationunittheoutputofthesummationunit,
say vk, must be the largest among all the other neurons in the
network.
 Condition of the sum total of weight
Another constraint over the competitive learning rule is the sum

total of weights to a particular output neuron is going to be 1. For
example, if we consider neuron k then
∑kwkj=1forallk∑kwkj=1forallk
 Change of weight for the winner
If a neuron does not respond to the input pattern, then no learning

takes place in that neuron. However, if a particular neuron wins,
then the corresponding weights are adjusted as follows −
Δwkj={−α(xj−wkj),0ifneuronkwinsifneuronklossesΔwkj={−α(xj−wk
j),ifneuronkwins0ifneuronklosses
Here αα is the learning rate.
This clearly shows that we are favoring the winning neuron by

adjusting its weight and if a neuron is lost, then we need not bother
to re-adjust its weight.
K-means Clustering Algorithm
K-means is one of the most popular clustering algorithm in which

we use the concept of partition procedure. We start with an initial
partition and repeatedly move patterns from one cluster to another, until
we get a satisfactory result.
Algorithm
Step 1 − Select k points as the initial centroids.

Initialize k prototypes (w1,…,wk), for example we can identifying them
with randomly chosen input vectors −
Wj=ip,wherej∈{1,....,k}andp∈{1,....,n}Wj=ip,wherej∈{1,....,k}andp∈{1,..
..,n}
Each cluster Cj is associated with prototype wj.
Step 2 − Repeat step 3-5 until E no longer decreases, or the cluster
membership no longer changes.
Step 3 − For each input vector ip where p ∈ {1,…,n}, put ip in the

cluster Cj* with the nearest prototype wj* having the following relation
|ip−wj∗|≤|ip−wj|,j∈{1,....,k}|ip−wj∗|≤|ip−wj|,j∈{1,....,k}
Step 4 − For each cluster Cj, where j ∈ { 1,…,k}, update the

prototype wj to be the centroid of all samples currently in Cj , so that
wj=∑ip∈Cjip|Cj|wj=∑ip∈Cjip|Cj|
Step 5 − Compute the total quantization error as follows −
E=∑j=1k∑ip∈wj|ip−wj|2E=∑j=1k∑ip∈wj|ip−wj|2
Neocognitron
It is a multilayer feedforward network, which was developed by

Fukushima in 1980s. This model is based on supervised learning and is
used for visual pattern recognition, mainly hand-written characters. It is
basically an extension of Cognitron network, which was also developed by
Fukushima in 1975.
Architecture
It is a hierarchical network, which comprises many layers and there is a

pattern of connectivity locally in those layers.
As we have seen in the above diagram, neocognitron is divided into

different connected layers and each layer has two cells. Explanation of
these cells is as follows −
S-Cell − It is called a simple cell, which is trained to respond to a

particular pattern or a group of patterns.
C-Cell − It is called a complex cell, which combines the output from S-

cell and simultaneously lessens the number of units in each array. In
another sense, C-cell displaces the result of S-cell.
 What is Supervised Learning?
Supervised learning is one of the methods associated with machine
learning which involves allocating labeled data so that a certain pattern or
function can be deduced from that data. It is worth noting that supervised
learning involves allocating an input object, a vector, while at the same
time anticipating the most desired output value, which is mostly referred
to as the supervisory signal. The bottom line property of supervised
learning is that the input data is known and labeled appropriately.
 What is Unsupervised Learning?
Unsupervised learning is the second method of machine learning
algorithm where inferences are drawn from unlabeled input data. The goal
of unsupervised learning is to determine the hidden patterns or grouping
in data from unlabeled data. It is mostly used in exploratory data
analysis. One of the defining characters of unsupervised learning is that
both the input and output are not known.
Differences Between Supervised Learning and Unsupervised Learning
1. Input Data in Supervised Learning and Unsupervised

Learning
The primary difference between supervised learning and
unsupervised learning is the data used in either method of machine
learning. It is worth noting that both methods of machine learning require
data, which they will analyze to produce certain functions or data groups.
However, the input data used in supervised learning is well known and is
labeled. This means that the machine is only tasked with the role of
determining the hidden patterns from already labeled data. However, the
data used in unsupervised learning is not known nor labeled. It is the

work of the machine to categorize and label the raw data before
determining the hidden patterns and functions of the input data.
2. Computational Complexity in Supervised Learning and

Unsupervised Learning
Machine learning is a complex affair and any person involved must
be prepared for the task ahead. One of the stand out differences between
supervised learning and unsupervised learning is computational
complexity. Supervised learning is said to be a complex method of
learning while unsupervised method of learning is less complex. One of
the reason that makes supervised learning affair is the fact that one has
to understand and label the inputs while in unsupervised learning, one is
not required to understand and label the inputs. This explains why many
people have been preferring unsupervised learning as compared to the
supervised method of machine learning.
3. Accuracy of the Results of Supervised Learning and

The other prevailing difference between supervised learning and
unsupervised learning is the accuracy of the results produced after every
cycle of machine analysis. All the results generated from supervised
method of machine learning are more accurate and reliable as compared
to the results generated from the unsupervised method of machine
learning. One of the factor that explains why supervised method of
machine learning produces accurate and reliable results is because the
input data is well known and labeled which means that the machine will
only analyze the hidden patterns. This is unlike unsupervised method of
learning where the machine has to define and label the input data before
determining the hidden patterns and functions.

4. Number of Classes in Supervised Learning and
It is also worth noting that there is a significant difference when it
comes to the number of classes. It is worth noting that all the classes
used in supervised learning are known which means that also the answers
in the analysis are likely to be known. The only goal of supervised
learning is therefore to determine the unknown cluster. However, there is
no prior knowledge in unsupervised method of machine learning. In
addition, the numbers of classes are not known which clearly means that
no information is known and the results generated after the analysis
cannot be ascertained. Moreover, the people involved in unsupervised
method of learning are not aware of any information concerning the raw
data and the expected results.
5. Real Time Learning in Supervised Learning and

Among other differences, there exist the time after which each
method of learning takes place. It is important to highlight that

supervised method of learning takes place off-line while unsupervised
method of learning takes place in real time. People involved in
preparation and labeling of the input data do so off-line while the analysis
of the hidden pattern is done online which denies the people involved in
machine learning an opportunity to interact with the machine as it
analyzes the discrete data. However, unsupervised method of machine
learning takes place in real time such that all the input data is analyzed
and labeled in the presence of learners which helps them to understand
different methods of learning and classification of raw data. Real time
data analysis remains to be the most significant merit of unsupervised
method of learning.
Diff b/w Supervised and Unsupervised Learning:
Supervised Learning Unsupervised

Learning
Input Data Uses Known and Uses Unknown Input

Labeled Input Data Data
Computational Very Complex in Less Computational

Complexity Computation Complexity
Real Time Uses off-line analysis Uses Real Time Analysis

of Data
Number of Classes Number of Classes is Number of Classes is

Known not Known
Accuracy of Results Accurate and Reliable Moderate Accurate and

Results Reliable Results
Summary of Supervised Learning and Unsupervised Learning
 Data mining is becoming an essential aspect in the current business

world due to increased raw data that organizations need to analyze and
process so that they can make sound and reliable decisions.
 This explains why the need for machine learning is growing and
thus requiring people with sufficient knowledge of both supervised
machine learning and unsupervised machine learning.
 It is worth understanding that each method of learning offers its
own advantages and disadvantages. This means that one has to be
conversant with both methods of machine learning before determine
which method one will use to analyze data.

Supervised Learning

Uploaded by

Copyright:

Available Formats

Supervised Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Supervised Learning

Uploaded by

Copyright:

Available Formats

SUPERVISED LEARNING

As the name suggests, supervised learning takes place under the

Operational characteristics of the perceptron: It consists of a single

Perceptron thus has the following three basic elements −

 Links − It would have a set of connection links, which carries a weight

 Activation function − It limits the output of neuron. The most basic

Adaptive Linear Neuron (Adaline)

 It uses bipolar activation function.

 The weights and the bias are adjustable.

Multiple Adaptive Linear Neuron (Madaline)

 Training can be done with the help of Delta rule.

Back Propagation Neural Networks

Generalized Delta Learning Rule

As the name suggests, this type of learning is done without the

These kinds of networks are based on the competitive learning rule

Following are some of the networks based on this simple concept

In most of the neural networks using unsupervised learning, it is

 Lippmann started working on Hamming networks in 1987.

This is also a fixed weight network, which serves as a subnet for

It uses the mechanism which is an iterative process and each node

The task of this net is accomplished by the self-excitation weight of

Competitive Learning in ANN

It is concerned with unsupervised training in which the output nodes

Basic Concept of Competitive Network

This network is just like a single layer feed-forward network having

As said earlier, there would be competition among the output nodes

Following are the three important factors for mathematical formulation of

Suppose if a neuron yk wants to be the winner, then there would be

It means that if any neuron, say, yk wants to win, then its induced

 Condition of the sum total of weight

Another constraint over the competitive learning rule is the sum

 Change of weight for the winner

If a neuron does not respond to the input pattern, then no learning

Here αα is the learning rate.

This clearly shows that we are favoring the winning neuron by

K-means Clustering Algorithm

K-means is one of the most popular clustering algorithm in which

Step 1 − Select k points as the initial centroids.

Each cluster Cj is associated with prototype wj.

Step 3 − For each input vector ip where p ∈ {1,…,n}, put ip in the

Step 4 − For each cluster Cj, where j ∈ { 1,…,k}, update the

It is a multilayer feedforward network, which was developed by

It is a hierarchical network, which comprises many layers and there is a

As we have seen in the above diagram, neocognitron is divided into

S-Cell − It is called a simple cell, which is trained to respond to a

C-Cell − It is called a complex cell, which combines the output from S-

Supervised learning is one of the methods associated with machine

learning which involves allocating labeled data so that a certain pattern or

learning involves allocating an input object, a vector, while at the same

to as the supervisory signal. The bottom line property of supervised

learning is that the input data is known and labeled appropriately.

 What is Unsupervised Learning?

Unsupervised learning is the second method of machine learning

of unsupervised learning is to determine the hidden patterns or grouping

in data from unlabeled data. It is mostly used in exploratory data

analysis. One of the defining characters of unsupervised learning is that

both the input and output are not known.

Differences Between Supervised Learning and Unsupervised Learning

1. Input Data in Supervised Learning and Unsupervised