DSA Presentation Group 6
DSA Presentation Group 6
DSA Presentation Group 6
UNSUPERVISED
LEARNING
HIMANI KHANDELWAL 201091070
RIJAB FATIMA 201091005
TANIA SHARMA 201091002
PARINA JAIN 201091066
AYUSHI WAKODE 201091034
PRATIKSHA NAIK 201091069
Introduction
Of Unsupervised
learning
Unsupervised learning is the training of a machine
using information that is neither classified nor
labeled and allowing the algorithm to act on that
information without guidance. Here the task of the
machine is to group unsorted information
according to similarities, patterns, and differences
without any prior training of data.
Unlike supervised learning, no teacher is provided
that means no training will be given to the machine.
Therefore the machine is restricted to find the
hidden structure in unlabeled data by itself.
EXAMPLE OF For instance, suppose it is given an image
having both dogs and cats which it has never
UNSUPERVISED seen. Thus the machine has no idea about the
features of dogs and cats so we can’t categorize
LEARNING it as ‘dogs and cats ‘. But it can categorize them
according to their similarities, patterns, and
differences, i.e., we can easily categorize the
above picture into two parts. The first may
contain all pics having dogs in them and the
second part may contain all pics having cats in
them. Here you didn’t learn anything before,
which means no training data or examples.
It allows the model to work on its own to
discover patterns and information that were
previously undetected. It mainly deals with
unlabelled data.
BLOCK DIAGRAM OF UNSUPERVISED LEARNING
Importance of unsupervised learning
Annotating large datasets is very costly and hence we can label
only a few examples manually. Example: Speech Recognition
1. Clustering
2. Association
Association Rule Learning
Association rule learning is a kind of unsupervised learning technique
that tests for the reliance of one data element on another data
element and design appropriately so that it can be more cost-
effective.
It tries to discover some interesting relations or associations between
the variables of the dataset. It depends on various rules to find
interesting relations between variables in the database.
The association rule learning is the most important approach of
machine learning, and it is employed in Market Basket analysis, Web
usage mining, continuous production, etc.
How Association Rule works
There are few key terms that we need to be familiar with to understand
how the association rules work.
Apriori: We will be using Apriori for building all the rules in this blog.
Itemsets: It refers to the collection of items. N item set means set of n
items. Simply, it is the set of item purchased by customers.
Support: It is percentage of time X and Y occur together out of all
transaction.
Confidence: It is percent of transactions that contains both X and Y out
of all transaction that contains X.
Lift: It measures how many times more often X and Y occur together
then expected if they are statistically independent to each other
Minlen: the minimum number of items in the rule
Maxlen: the maximum number of items in the rule
Target: indicates the type of association mined.
Frequent Itemsets Generation: Find the most frequent itemsets from the data
based on predetermined support and minimum item and maximum item.
Rule Generation:
LHS > RHS: Left hand side and Right-hand side are usually used to understand
how often item A and item B occur together.
Association rule learning works on the concept of If and Else Statement, such as
if A then B.
Clustering
Clustering can be considered the most
important unsupervised learning problem; so,
as every other problem of this kind, it deals
with finding a structure in a collection of
unlabeled data. Clustering is a method of
grouping the objects into clusters such that
objects with most similarities remains into a
group and has less or no similarities with the
objects of another group.
Cluster analysis finds the commonalities between the data objects and
categorizes them as per the presence and absence of those commonalities.
The Goals of Clustering
The goal of clustering is to determine the internal grouping in a set of
unlabeled data. But how to decide what constitutes a good
clustering?
After clustering, each cluster is assigned a number called a cluster ID. Now, you can
condense the entire feature set for an example into its cluster ID. Representing a
complex example by a simple cluster ID makes clustering powerful. Extending the
idea, clustering data can simplify large datasets.
Clustering Methods
Partitioning Clustering
Density-Based Clustering
Distribution Model-Based Clustering
Kmeans Clustering
Hierarchical Clustering
Competitive Learning
Partitioning Clustering
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each
category.
Step-5: Assign the new data points to that category for which the number of
the neighbor is maximum.
Step-1: Create each data point as a single cluster. Let's say there are N data
points, so the number of clusters will also be N.
Step-2: Take two closest data points or clusters and merge them to form one
cluster. So, there will now be N-1 clusters.
Step-3: Again, take the two closest clusters and merge them together to form
one cluster. There will be N-2 clusters.
Step-4: Repeat Step 3 until only one cluster left. So, we will get the following
clusters. Consider the below images.
Step-5: Once all the clusters are combined into one big cluster, develop the
dendrogram to divide the clusters as per the problem.
Step 1 Step 2 Step 3
Step 4
Competitve Learning
Competitive learning is a form of
unsupervised learning in artificial
neural networks, in which nodes
compete for the right to respond to a
subset of the input data. A variant of
Hebbian learning, competitive
learning works by increasing the
specialization of each node in the
network. It is well suited to finding
clusters within data.
Competitive Learning Algorithm
Step 1: Assign the first data item, P1 to cluster C1. This data set will be the leader of
the cluster C1.
Step 2:Now move to the next data item say P2 and calculate its distance from the
leader P1. If the distance between P2 and leader P1 is less than a user specified
threshold (t) then data point P2 is assigned to this cluster (Cluster C1). If the distance
between leader P1 and data item P2 is more than the user specified threshold t, then
form a new cluster C2 and assign P2 to this new cluster. P2 will be the leader of the
cluster C2.
Step3: For all the remaining data items the distance between the data point and the leader of
the clusters is calculated. If the distance between the data items and the any of the leader is
less then the user specified threshold, the data point is assigned to that cluster. However, If the
distance between the data point and the any of the cluster's leader is more than the user
specified threshold, a new cluster is created and that particular data point is assigned to that
cluster and considered the leader of the cluster.
Step 4: Repeat Step 3 till all the data items are assigned to clusters.
Applications of unsupervised
learning
News Sections: Google News uses unsupervised learning to categorize
articles on the same story from various online news outlets. For
example, the results of a presidential election could be categorized
under their label for “US” news.
Computer vision: Unsupervised learning algorithms are used for visual
perception tasks, such as object recognition.
Medical imaging: Unsupervised machine learning provides essential
features to medical imaging devices, such as image detection,
classification and segmentation, used in radiology and pathology to
diagnose patients quickly and accurately.
Anomaly detection: Unsupervised learning models can comb through
large amounts of data and discover a typical data points within a
dataset. These anomalies can raise awareness around faulty equipment,
human error, or breaches in security.
Customer personas: Defining customer personas makes it easier to
understand common traits and business clients' purchasing habits.
Unsupervised learning allows businesses to build better buyer persona
profiles, enabling organizations to align their product messaging more
appropriately.
Recommendation Engines: Using past purchase behavior data,
unsupervised learning can help to discover data trends that can be used
to develop more effective cross-selling strategies. This is used to make
relevant add-on recommendations to customers during the checkout
process for online retailers.