Clustering Slides

This document discusses unsupervised machine learning techniques including parametric and non-parametric methods. It covers statistical clustering approaches like k-means and ISODATA algorithms as well as hierarchical clustering methods. Key aspects covered include similarity measures, criterion functions, and cluster validation.

Uploaded by

Richa Halder

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Clustering Slides

Uploaded by

Richa Halder

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Machine Learning

and Network Analysis

MA4207
Unsupervised Learning
◼ Parametric unsupervised learning
–Equivalent to density estimation with a mixture of (Gaussian) components
–Through the use of algorithms like EM, the identity of the component that
originated each data point was treated as a missing feature
◼ Non-parametric unsupervised learning
–No density functions are considered in these methods
–Instead, we are concerned with finding natural groupings (clusters) in a dataset
◼ Non-parametric clustering involves three steps
–Defining a measure of (dis)similarity between examples
–Defining a criterion function for clustering
–Defining an algorithm to minimize (or maximize) the criterion function
Statistical Clustering
◼ Similarity measures
◼ Criterion functions
◼ Cluster validity
◼ Flat clustering algorithms
–k-means
–ISODATA
◼ Hierarchical clustering algorithms
–Divisive
–Agglomerative
Similarity Measure
Definition of metric
A measuring rule 𝑑(𝑥,𝑦) for the distance between two vectors 𝑥 and 𝑦 is considered a
metric if it satisfies the following properties
𝑑(𝑥,𝑦)≥0
𝑑(𝑥,𝑦)=0 iff 𝑥=𝑦
𝑑(𝑥,𝑦)=𝑑(𝑦,𝑥)
𝑑(𝑥,𝑦)≤𝑑(𝑥,𝑧)+ 𝑑(𝑧,𝑦)
If the metric has the property 𝑑(𝑎𝑥,𝑎𝑦)=|𝑎|𝑑(𝑥,𝑦) then it is called a norm and
denoted 𝑑(𝑥,𝑦)=||𝑥−𝑦||
The most general form of distance metric is the power norm

◼ 𝑝 controls the weight placed on any dimension dissimilarity, whereas 𝑟 controls

the distance growth of patterns that are further apart
–Notice that the definition of norm must be relaxed, allowing a power factor for |𝑎|
Distance Metrics
◼ Minkowski metric (𝐿𝑘 norm)

◼ The choice of an appropriate value of 𝑘 depends on the amount of emphasis that you would like to
give to the larger differences between dimensions
◼ Manhattan or city-block distance (𝐿1 norm)

◼ When used with binary vectors, the L1 norm is known as the Hamming distance
◼ Euclidean norm (𝐿2 norm)

◼ Chebyshev distance (𝐿∞ norm)

Distance Metrics
◼ Quadratic distance

The Mahalanobis distance is a particular case of this distance

◼ Canberra metric (for non-negative features)

◼ Non-linear distance

where 𝑇 is a threshold and 𝐻 is a distance

An appropriate choice for 𝐻 and 𝑇 for feature selection is that they should satisfy

and that 𝑇 satisfies the unbiasedness and consistency conditions of the Parzen estimator: 𝑇𝑃𝑁→∞,𝑇→0
𝑎𝑠 𝑁→∞
Distance Metrics
The above distance metrics are measures of dissimilarity, some measures of similarity
also exist
◼ Inner product

The inner product is used when the vectors 𝑥 and 𝑦 are normalized, so that they have the same length

◼ Correlation coefficient

◼ Tanimoto measure (for binary-valued vectors)

Criterion Function
After computing a (dis)similarity measure, a criterion function needs to be optimized
The most widely used clustering criterion is the sum-of-square-error.

This criterion measures how well the data set 𝑋={𝑥1…𝑥𝑁} is represented by the
cluster centers 𝜇={𝜇1…𝜇𝐶} (𝐶<𝑁)
Clustering methods that use this criterion are called minimum variance
Other criterion functions exist, based on the scatter matrices used in Linear
Discriminant Analysis
Cluster Validity
◼ The validity of the final cluster solution is highly subjective
◼ This is in contrast with supervised training, where a clear objective function is known: Bayes risk
◼ Note that the choice of (dis)similarity measure and criterion function will have a major impact on the
final clustering produced by the algorithms
◼ Example
◼ Which are the meaningful clusters in these cases?
◼ How many clusters should be considered?

A number of quantitative methods for cluster validity are proposed in [Theodoridis

and Koutrombas, 1999]
Optimization
Find partition of the data set that minimizes the criterion
◼ Exhaustive enumeration of all partitions, which guarantees the optimal solution, is unfeasible
For example, a problem with 5 clusters and 100 examples yields 1067 partitioning
The common approach is to proceed in an iterative fashion
◼ Find some reasonable initial partition
◼ Move samples from one cluster to another in order to reduce the criterion function
These iterative methods produce sub-optimal solutions but are computationally
tractable
◼ Flat clustering algorithms
◼ These algorithms produce a set of disjoint clusters
◼ Two algorithms are widely used: k-means and ISODATA
◼ Hierarchical clustering algorithms:
◼ The result is a hierarchy of nested clusters
◼ These algorithms can be broadly divided into agglomerative and divisive approaches
K-Means Algorithm
k-means is a simple clustering procedure that attempts to minimize the criterion
function 𝐽𝑀𝑆𝐸 in an iterative fashion

1. Define the number of clusters

2. Initialize clusters by
◼ an arbitrary assignment of examples to clusters or
◼ an arbitrary set of cluster centers (some examples used as centers)
3. Compute the sample mean of each cluster
4. Reassign each example to the cluster with the nearest mean
5. If the classification of all samples has not changed, stop, else go to step 3

k-means is a particular case of the EM algorithm for mixture models

K-Means Demo
Vector Quantization
◼ An application of k-means to signal
processing and communication
◼ Univariate signal values are usually
quantized into a number of levels
◼ Typically a power of 2 so the signal can be
transmitted in binary format
◼ Same idea can be extended for multiple
channels
◼ We could quantize each separate channel
◼ Instead, we can obtain a more efficient coding
if we quantize the overall multidimensional
vector by finding a number of multidimensional
prototypes (cluster centers)
◼ The set of cluster centers is called a
codebook, and the problem of finding
this codebook is normally solved using
the k-means algorithm
ISODATA
◼ Iterative Self-Organizing Data Analysis (ISODATA)
◼ An extension to the k-means algorithm with some heuristics to automatically select the
number of clusters
◼ ISODATA requires the user to select a number of parameters
◼ N𝑀𝐼𝑁_𝐸𝑋 minimum number of examples per cluster
◼ 𝑁𝐷 desired (approximate) number of clusters
◼ 𝜎𝑆2 maximum spread parameter for splitting
◼ 𝐷𝑀𝐸𝑅𝐺𝐸 maximum distance separation for merging
◼ 𝑁𝑀𝐸𝑅𝐺𝐸 maximum number of clusters that can be merged
◼ The algorithm works in an iterative fashion
1. Perform k-means clustering
2. Split any clusters whose samples are sufficiently dissimilar
3. Merge any two clusters sufficiently close
4. Go to 1
ISODATA
ISODATA has been shown to be an extremely powerful heuristic
◼ Advantages are
◼ Self-organizing capabilities
◼ Flexibility in eliminating clusters that have very few examples
◼ Ability to divide clusters that are too dissimilar
◼ Ability to merge clusters that are sufficiently similar
◼ Limitations
◼ Data must be linearly separable (long narrow or curved clusters are not handled properly)
◼ It is difficult to know a priori the “optimal” parameters
◼ Performance is highly dependent on these parameters
◼ For large datasets and large number of clusters, ISODATA is less efficient than other linear methods
◼ Convergence is unknown, although it appears to work well for non-overlapping clusters
In practice, ISODATA is run multiple times with different values of the parameters
and the clustering with minimum SSE is selected
Hierarchical Clustering
k-means and ISODATA create disjoint clusters, resulting in a flat
data representation
Often a hierarchical representation of data, with clusters and
sub-clusters arranged in a tree-structured fashion is required
Hierarchical representations are commonly used in the sciences
(e.g., biological taxonomy)
◼ Hierarchical clustering methods can be grouped in two
general classes
◼ Agglomerative
◼ Also known as bottom-up or merging
◼ Starting with N singleton clusters, successively merge clusters until one cluster is left

◼ Divisive
◼ Also known as top-down or splitting
◼ Starting with a unique cluster, successively split the clusters until N singleton examples are left
Dendograms
◼ A binary tree that shows the structure of the clusters
◼ Dendrograms are the preferred representation for hierarchical clusters
◼ In addition to the binary tree, the dendrogram provides the similarity measure between clusters (the
vertical axis)
◼ An alternative representation is based on sets
◼ {{𝑥1,{𝑥2,𝑥3}},{{{𝑥4,𝑥5},{𝑥6,𝑥7}},𝑥8}}
◼ However, unlike the dendrogram, sets cannot express quantitative information
Divisive Clustering
1. Start with one large cluster
◼ Define
2. Find “worst” cluster
◼ –𝑁𝐶 Number of clusters
◼ –𝑁𝐸𝑋 Number of examples
3. Split it
4. If 𝑁𝐶<𝑁𝐸𝑋 go to 2
◼ How to choose the “worst” cluster
◼ –Largest number of examples
◼ –Largest variance
◼ –Largest sum-squared-error…
◼ How to split clusters
◼ Mean-median in one feature direction
◼ Perpendicular to the direction of largest variance…
◼ The computations required by divisive clustering are more intensive than for
agglomerative clustering methods
◼ For this reason, agglomerative approaches are more popular
Agglomerative Clustering
◼ Define
◼ 𝑁𝐶 - Number of clusters
◼ 𝑁𝐸𝑋 - Number of examples
1. Start with 𝑁𝐸𝑋 singleton clusters
2. Find nearest clusters
3. Merge them
4. If 𝑁𝐶>1 go to 2

◼ How to find the “nearest” pair of clusters

◼Minimum Distance dmin(𝜔𝑖,𝜔𝑗)=min||𝑥−𝑦||, 𝑥∈𝜔𝑖 ,y∈𝜔𝑗

◼Maximum Distance dmax(𝜔𝑖,𝜔𝑗)=max||𝑥−𝑦||, 𝑥∈𝜔𝑖 ,y∈𝜔𝑗

◼Average Distance davg(𝜔𝑖,𝜔𝑗)=1/NiNjS 𝑥∈𝜔𝑖 S y∈𝜔 ||𝑥−𝑦||

𝑗

◼Mean Distance dmean(𝜔𝑖,𝜔𝑗)=||mi−mj||

Agglomerative Clustering
◼ Minimum distance
◼ When 𝑑𝑚𝑖𝑛 is used to measure distance between clusters, the algorithm is called the nearest-
neighbor or single-linkage clustering algorithm
◼ If the algorithm is allowed to run until only one cluster remains, the result is a minimum spanning
tree (MST)
◼ This algorithm favors elongated classes
◼ Maximum distance
◼ When 𝑑𝑚𝑎𝑥 is used to measure distance between clusters, the algorithm is called the farthest-
neighbor or complete-linkage clustering algorithm
◼ From a graph-theoretic point of view, each cluster constitutes a complete sub-graph
◼ This algorithm favors compact classes
◼ Average and mean distance
◼ 𝑑𝑚𝑖𝑛 and 𝑑𝑚𝑎𝑥 are extremely sensitive to outliers since their measurement of between-cluster
distance involves minima or maxima
◼ 𝑑𝑎𝑣gand 𝑑𝑚𝑒𝑎𝑛 are more robust to outliers
◼ Of the two, 𝑑𝑚𝑒𝑎𝑛 is more attractive computationally
Notice that 𝑑𝑎𝑣𝑒 involves the computation of 𝑁𝑖𝑁𝑗 pairwise distances
Agglomerative Clustering Example
◼ Perform agglomerative clustering on 𝑋 using the single-linkage metric
𝑋 = {1,3,4,9,10,13,21,23,28,29}
In case of ties, always merge the pair of clusters with the largest mean
Indicate the order in which the merging operations occur

6.75

Partition
No ratings yet
Partition
52 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
31 pages
K-mean
No ratings yet
K-mean
11 pages
Lecture 13 - Unsupervised Learning, PCA ICA
No ratings yet
Lecture 13 - Unsupervised Learning, PCA ICA
50 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
BA2 7 Cluster
No ratings yet
BA2 7 Cluster
33 pages
Data mining and machine learning
No ratings yet
Data mining and machine learning
48 pages
Clustering
No ratings yet
Clustering
11 pages
Module 5.Docx Aiml
No ratings yet
Module 5.Docx Aiml
28 pages
lecture_06
No ratings yet
lecture_06
51 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
Cluster Analysis
No ratings yet
Cluster Analysis
5 pages
Week-9-Part-2 Agglomerative Clustering
No ratings yet
Week-9-Part-2 Agglomerative Clustering
40 pages
Unit 2
No ratings yet
Unit 2
33 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Chapter 4 _ Clustering
No ratings yet
Chapter 4 _ Clustering
21 pages
Unit - 5 Cluster Analysis
No ratings yet
Unit - 5 Cluster Analysis
83 pages
Clustering
No ratings yet
Clustering
20 pages
A Famous Example of Cluster Analysis
No ratings yet
A Famous Example of Cluster Analysis
5 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
M5
No ratings yet
M5
40 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Validation Slides
No ratings yet
Validation Slides
18 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
Clustering
No ratings yet
Clustering
23 pages
1 T Coffee Dalign 18
No ratings yet
1 T Coffee Dalign 18
31 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
Cluster Analysis: Abu Bashar
No ratings yet
Cluster Analysis: Abu Bashar
18 pages
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
No ratings yet
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
41 pages
Data Mining Unit 3 Cluster Analysis: Types of Clusters
No ratings yet
Data Mining Unit 3 Cluster Analysis: Types of Clusters
11 pages
clustering
No ratings yet
clustering
6 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
15 pages
Cluster Center Initialization Algorithm For K-Means Clustering
No ratings yet
Cluster Center Initialization Algorithm For K-Means Clustering
10 pages
Data Science Unit 5
No ratings yet
Data Science Unit 5
105 pages
Chapter 7
No ratings yet
Chapter 7
29 pages
Day3
No ratings yet
Day3
37 pages
Clustering
No ratings yet
Clustering
10 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
11-12-K Means Using SPSS
No ratings yet
11-12-K Means Using SPSS
4 pages
M5
No ratings yet
M5
40 pages
An Introduction To Clustering Methods
No ratings yet
An Introduction To Clustering Methods
8 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
DA-Unit V
No ratings yet
DA-Unit V
152 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
The Others in The Cluster But With Differences Between Clusters
No ratings yet
The Others in The Cluster But With Differences Between Clusters
5 pages
Lec7 - Multiple Sequence Alignment
No ratings yet
Lec7 - Multiple Sequence Alignment
22 pages
Preethi
No ratings yet
Preethi
11 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
19 - Sessionppt - Clusteringalgos
No ratings yet
19 - Sessionppt - Clusteringalgos
36 pages
ML-UNIT-III
No ratings yet
ML-UNIT-III
12 pages
IDS Unit-3 L2
No ratings yet
IDS Unit-3 L2
26 pages
som-new
No ratings yet
som-new
21 pages
Clustering
No ratings yet
Clustering
7 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Divisive_Hierarchical_Clustering
No ratings yet
Divisive_Hierarchical_Clustering
11 pages
Unit 5 Mfds
No ratings yet
Unit 5 Mfds
4 pages
07 Hierarchical Clustering
No ratings yet
07 Hierarchical Clustering
19 pages
r20 Datamining Lab (2-2 Sem Lab)
No ratings yet
r20 Datamining Lab (2-2 Sem Lab)
41 pages
Wa0000.
No ratings yet
Wa0000.
26 pages
BRM Multivariate Notes
No ratings yet
BRM Multivariate Notes
22 pages
Bi Ut2 Answers
No ratings yet
Bi Ut2 Answers
23 pages
Review Questions On Clustering DBSCAN and HAC
No ratings yet
Review Questions On Clustering DBSCAN and HAC
2 pages
DM GTU Study Material Presentations Unit-5 21052021124400PM
No ratings yet
DM GTU Study Material Presentations Unit-5 21052021124400PM
63 pages
Data Mining MCQ
No ratings yet
Data Mining MCQ
4 pages
Lab Assignment 3 Ai
No ratings yet
Lab Assignment 3 Ai
1 page
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
confusion matrix problem solution
No ratings yet
confusion matrix problem solution
6 pages
Prrethy-Dr. Huma Lone - AL
No ratings yet
Prrethy-Dr. Huma Lone - AL
7 pages
Clustering Financial Time Series With Variance Ratio Statistics
No ratings yet
Clustering Financial Time Series With Variance Ratio Statistics
28 pages
UE20CS302 Unit4 Slides
No ratings yet
UE20CS302 Unit4 Slides
312 pages
A Diana Algoritma
No ratings yet
A Diana Algoritma
2 pages
Clustering Based Method Anomaly Definition in EV Charging Curves
No ratings yet
Clustering Based Method Anomaly Definition in EV Charging Curves
6 pages
Hierarchical Cluster Analysis - R Tutorial
No ratings yet
Hierarchical Cluster Analysis - R Tutorial
3 pages
Machine Learning 3
No ratings yet
Machine Learning 3
65 pages
03 Hierarchical Clustering
100% (1)
03 Hierarchical Clustering
15 pages
Birchwood MDS Brochure (2) - Min
No ratings yet
Birchwood MDS Brochure (2) - Min
18 pages
IRS-Class - Unit-3
No ratings yet
IRS-Class - Unit-3
95 pages
Week 10 Lecture - Introduction to Clustering(1)
No ratings yet
Week 10 Lecture - Introduction to Clustering(1)
35 pages
Cluto Clusterring Manual
No ratings yet
Cluto Clusterring Manual
71 pages
ROCK Clustering Example
100% (2)
ROCK Clustering Example
4 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
QMM: Exercise Sheet 5 - Clustering: Fabien Baeriswyl, J Er Ome Reboulleau, Tom Ruszkiewicz
No ratings yet
QMM: Exercise Sheet 5 - Clustering: Fabien Baeriswyl, J Er Ome Reboulleau, Tom Ruszkiewicz
1 page
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages