Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Supervised Learning

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

SUPERVISED LEARNING

As the name suggests, supervised learning takes place under the


supervision of a teacher. This learning process is dependent. During the
training of ANN under supervised learning, the input vector is presented
to the network, which will produce an output vector. This output vector is
compared with the desired/target output vector. An error signal is
generated if there is a difference between the actual output and the
desired/target output vector. On the basis of this error signal, the
weights would be adjusted until the actual output is matched with the
desired output.

Perceptron
Developed by Frank Rosenblatt by using McCulloch and Pitts model,
perceptron is the basic operational unit of artificial neural networks. It
employs supervised learning rule and is able to classify the data into two
classes.

Operational characteristics of the perceptron: It consists of a single


neuron with an arbitrary number of inputs along with adjustable weights,
but the output of the neuron is 1 or 0 depending upon the threshold. It
also consists of a bias whose weight is always 1. Following figure gives a
schematic representation of the perceptron.

Perceptron thus has the following three basic elements −

 Links − It would have a set of connection links, which carries a weight


including a bias always having weight 1.
 Adder − It adds the input after they are multiplied with their respective
weights.

 Activation function − It limits the output of neuron. The most basic


activation function is a Heaviside step function that has two possible outputs.
This function returns 1, if the input is positive, and 0 for any negative input.

Adaptive Linear Neuron (Adaline)


Adaline which stands for Adaptive Linear Neuron, is a network having
a single linear unit. It was developed by Widrow and Hoff in 1960. Some
important points about Adaline are as follows −

 It uses bipolar activation function.

 It uses delta rule for training to minimize the Mean-Squared Error (MSE)
between the actual output and the desired/target output.

 The weights and the bias are adjustable.

Architecture
The basic structure of Adaline is similar to perceptron having an
extra feedback loop with the help of which the actual output is compared
with the desired/target output. After comparison on the basis of training
algorithm, the weights and bias will be updated.

Multiple Adaptive Linear Neuron (Madaline)


Madaline which stands for Multiple Adaptive Linear Neuron, is a
network which consists of many Adalines in parallel. It will have a single
output unit. Some important points about Madaline are as follows −
 It is just like a multilayer perceptron, where Adaline will act as a hidden unit
between the input and the Madaline layer.

 The weights and the bias between the input and Adaline layers, as in we see
in the Adaline architecture, are adjustable.

 The Adaline and Madaline layers have fixed weights and bias of 1.

 Training can be done with the help of Delta rule.

Architecture
The architecture of Madaline consists of “n” neurons of the input
layer, “m”neurons of the Adaline layer, and 1 neuron of the Madaline
layer. The Adaline layer can be considered as the hidden layer as it is
between the input layer and the output layer, i.e. the Madaline layer.

Back Propagation Neural Networks


Back Propagation Neural (BPN) is a multilayer neural network
consisting of the input layer, at least one hidden layer and output layer.
As its name suggests, back propagating will take place in this network.
The error which is calculated at the output layer, by comparing the target
output and the actual output, will be propagated back towards the input
layer.

Architecture
As shown in the diagram, the architecture of BPN has three
interconnected layers having weights on them. The hidden layer as well
as the output layer also has bias, whose weight is always 1, on them. As
is clear from the diagram, the working of BPN is in two phases. One
phase sends the signal from the input layer to the output layer, and the
other phase back propagates the error from the output layer to the input
layer.

Generalized Delta Learning Rule


Delta rule works only for the output layer. On the other hand,
generalized delta rule, also called as back-propagation rule, is a way of
creating the desired values of the hidden layer.
UNSUPERVISED LEARNING

As the name suggests, this type of learning is done without the


supervision of a teacher. This learning process is independent. During the
training of ANN under unsupervised learning, the input vectors of similar
type are combined to form clusters. When a new input pattern is applied,
then the neural network gives an output response indicating the class to
which input pattern belongs. In this, there would be no feedback from the
environment as to what should be the desired output and whether it is
correct or incorrect. Hence, in this type of learning the network itself must
discover the patterns, features from the input data and the relation for
the input data over the output.

Winner-Takes-All Networks

These kinds of networks are based on the competitive learning rule


and will use the strategy where it chooses the neuron with the greatest
total inputs as a winner. The connections between the output neurons
show the competition between them and one of them would be ‘ON’ which
means it would be the winner and others would be ‘OFF’.

Following are some of the networks based on this simple concept


using unsupervised learning.

Hamming Network

In most of the neural networks using unsupervised learning, it is


essential to compute the distance and perform comparisons. This kind of
network is Hamming network, where for every given input vectors, it
would be clustered into different groups. Following are some important
features of Hamming Networks −

 Lippmann started working on Hamming networks in 1987.


 It is a single layer network.
 The inputs can be either binary {0, 1} of bipolar {-1, 1}.
 The weights of the net are calculated by the exemplar vectors.
 It is a fixed weight network which means the weights would remain
the same even during training.

Max Net

This is also a fixed weight network, which serves as a subnet for


selecting the node having the highest input. All the nodes are fully
interconnected and there exists symmetrical weights in all these weighted
interconnections.
Architecture

It uses the mechanism which is an iterative process and each node


receives inhibitory inputs from all other nodes through connections. The
single node whose value is maximum would be active or winner and the
activations of all other nodes would be inactive. Max Net uses identity
activation function with

f(x)={x0ifx>0ifx≤0f(x)={xifx>00ifx≤0

The task of this net is accomplished by the self-excitation weight of


+1 and mutual inhibition magnitude, which is set like [0 < ɛ < 1m1m]
where “m” is the total number of the nodes.

Competitive Learning in ANN

It is concerned with unsupervised training in which the output nodes


try to compete with each other to represent the input pattern. To
understand this learning rule we will have to understand competitive net
which is explained as follows −

Basic Concept of Competitive Network

This network is just like a single layer feed-forward network having


feedback connection between the outputs. The connections between the
outputs are inhibitory type, which is shown by dotted lines, which means
the competitors never support themselves.
Basic Concept of Competitive Learning Rule

As said earlier, there would be competition among the output nodes


so the main concept is - during training, the output unit that has the
highest activation to a given input pattern, will be declared the winner.
This rule is also called Winner-takes-all because only the winning neuron
is updated and the rest of the neurons are left unchanged.

Mathematical Formulation

Following are the three important factors for mathematical formulation of


this learning rule −

 Condition to be a winner

Suppose if a neuron yk wants to be the winner, then there would be


the following condition

yk={10ifvk>vjforallj,j≠kotherwiseyk={1ifvk>vjforallj,j≠k0otherwis
e

It means that if any neuron, say, yk wants to win, then its induced


local
field theoutputofthesummationunittheoutputofthesummationunit,
say vk, must be the largest among all the other neurons in the
network.

 Condition of the sum total of weight

Another constraint over the competitive learning rule is the sum


total of weights to a particular output neuron is going to be 1. For
example, if we consider neuron k then
∑kwkj=1forallk∑kwkj=1forallk

 Change of weight for the winner

If a neuron does not respond to the input pattern, then no learning


takes place in that neuron. However, if a particular neuron wins,
then the corresponding weights are adjusted as follows −

Δwkj={−α(xj−wkj),0ifneuronkwinsifneuronklossesΔwkj={−α(xj−wk
j),ifneuronkwins0ifneuronklosses

Here αα is the learning rate.

This clearly shows that we are favoring the winning neuron by


adjusting its weight and if a neuron is lost, then we need not bother
to re-adjust its weight.

K-means Clustering Algorithm

K-means is one of the most popular clustering algorithm in which


we use the concept of partition procedure. We start with an initial
partition and repeatedly move patterns from one cluster to another, until
we get a satisfactory result.

Algorithm

Step 1 − Select k points as the initial centroids.


Initialize k prototypes (w1,…,wk), for example we can identifying them
with randomly chosen input vectors −

Wj=ip,wherej∈{1,....,k}andp∈{1,....,n}Wj=ip,wherej∈{1,....,k}andp∈{1,..
..,n}

Each cluster Cj is associated with prototype wj.

Step 2 − Repeat step 3-5 until E no longer decreases, or the cluster
membership no longer changes.

Step 3 − For each input vector ip where p ∈ {1,…,n}, put ip in the


cluster Cj* with the nearest prototype wj* having the following relation

|ip−wj∗|≤|ip−wj|,j∈{1,....,k}|ip−wj∗|≤|ip−wj|,j∈{1,....,k}

Step 4 − For each cluster Cj, where j ∈ { 1,…,k}, update the


prototype wj to be the centroid of all samples currently in Cj , so that

wj=∑ip∈Cjip|Cj|wj=∑ip∈Cjip|Cj|
Step 5 − Compute the total quantization error as follows −

E=∑j=1k∑ip∈wj|ip−wj|2E=∑j=1k∑ip∈wj|ip−wj|2

Neocognitron

It is a multilayer feedforward network, which was developed by


Fukushima in 1980s. This model is based on supervised learning and is
used for visual pattern recognition, mainly hand-written characters. It is
basically an extension of Cognitron network, which was also developed by
Fukushima in 1975.

Architecture

It is a hierarchical network, which comprises many layers and there is a


pattern of connectivity locally in those layers.

As we have seen in the above diagram, neocognitron is divided into


different connected layers and each layer has two cells. Explanation of
these cells is as follows −

S-Cell − It is called a simple cell, which is trained to respond to a


particular pattern or a group of patterns.

C-Cell − It is called a complex cell, which combines the output from S-


cell and simultaneously lessens the number of units in each array. In
another sense, C-cell displaces the result of S-cell.
 What is Supervised Learning?

Supervised learning is one of the methods associated with machine

learning which involves allocating labeled data so that a certain pattern or

function can be deduced from that data. It is worth noting that supervised

learning involves allocating an input object, a vector, while at the same

time anticipating the most desired output value, which is mostly referred

to as the supervisory signal. The bottom line property of supervised

learning is that the input data is known and labeled appropriately.

 What is Unsupervised Learning?

Unsupervised learning is the second method of machine learning

algorithm where inferences are drawn from unlabeled input data. The goal

of unsupervised learning is to determine the hidden patterns or grouping

in data from unlabeled data. It is mostly used in exploratory data

analysis. One of the defining characters of unsupervised learning is that

both the input and output are not known.

Differences Between Supervised Learning and Unsupervised Learning

1. Input Data in Supervised Learning and Unsupervised


Learning

The primary difference between supervised learning and

unsupervised learning is the data used in either method of machine

learning. It is worth noting that both methods of machine learning require

data, which they will analyze to produce certain functions or data groups.

However, the input data used in supervised learning is well known and is

labeled. This means that the machine is only tasked with the role of

determining the hidden patterns from already labeled data. However, the

data used in unsupervised learning is not known nor labeled. It is the


work of the machine to categorize and label the raw data before

determining the hidden patterns and functions of the input data.

2. Computational Complexity in Supervised Learning and


Unsupervised Learning

Machine learning is a complex affair and any person involved must

be prepared for the task ahead. One of the stand out differences between

supervised learning and unsupervised learning is computational

complexity. Supervised learning is said to be a complex method of

learning while unsupervised method of learning is less complex. One of

the reason that makes supervised learning affair is the fact that one has

to understand and label the inputs while in unsupervised learning, one is

not required to understand and label the inputs. This explains why many

people have been preferring unsupervised learning as compared to the

supervised method of machine learning.

3. Accuracy of the Results of Supervised Learning and


Unsupervised Learning

The other prevailing difference between supervised learning and

unsupervised learning is the accuracy of the results produced after every

cycle of machine analysis. All the results generated from supervised

method of machine learning are more accurate and reliable as compared

to the results generated from the unsupervised method of machine

learning. One of the factor that explains why supervised method of

machine learning produces accurate and reliable results is because the

input data is well known and labeled which means that the machine will

only analyze the hidden patterns. This is unlike unsupervised method of

learning where the machine has to define and label the input data before

determining the hidden patterns and functions.


4. Number of Classes in Supervised Learning and
Unsupervised Learning

It is also worth noting that there is a significant difference when it

comes to the number of classes. It is worth noting that all the classes

used in supervised learning are known which means that also the answers

in the analysis are likely to be known. The only goal of supervised

learning is therefore to determine the unknown cluster. However, there is

no prior knowledge in unsupervised method of machine learning. In

addition, the numbers of classes are not known which clearly means that

no information is known and the results generated after the analysis

cannot be ascertained. Moreover, the people involved in unsupervised

method of learning are not aware of any information concerning the raw

data and the expected results.

5. Real Time Learning in Supervised Learning and


Unsupervised Learning

Among other differences, there exist the time after which each

method of learning takes place. It is important to highlight that


supervised method of learning takes place off-line while unsupervised

method of learning takes place in real time. People involved in

preparation and labeling of the input data do so off-line while the analysis

of the hidden pattern is done online which denies the people involved in

machine learning an opportunity to interact with the machine as it

analyzes the discrete data. However, unsupervised method of machine

learning takes place in real time such that all the input data is analyzed

and labeled in the presence of learners which helps them to understand

different methods of learning and classification of raw data. Real time

data analysis remains to be the most significant merit of unsupervised

method of learning.
Diff b/w Supervised and Unsupervised Learning:

Supervised Learning Unsupervised


Learning

Input Data Uses Known and Uses Unknown Input


Labeled Input Data Data

Computational Very Complex in Less Computational


Complexity Computation Complexity

Real Time Uses off-line analysis Uses Real Time Analysis


of Data

Number of Classes Number of Classes is Number of Classes is


Known not Known

Accuracy of Results Accurate and Reliable Moderate Accurate and


Results Reliable Results

Summary of Supervised Learning and Unsupervised Learning

 Data mining is becoming an essential aspect in the current business


world due to increased raw data that organizations need to analyze and
process so that they can make sound and reliable decisions.
 This explains why the need for machine learning is growing and
thus requiring people with sufficient knowledge of both supervised
machine learning and unsupervised machine learning.
 It is worth understanding that each method of learning offers its
own advantages and disadvantages. This means that one has to be
conversant with both methods of machine learning before determine
which method one will use to analyze data.

You might also like