What Is Cluster Analysis?
What Is Cluster Analysis?
What Is Cluster Analysis?
o x
x x
x x
o x
x x x o
o x
x o o o
oo
o o
o o o x o
x
Clustering is very subjective
• Cluster the following animals:
– Sheep, lizard, cat, dog, sparrow, blue shark, viper, seagull, gold
fish, frog, red-mullet
TATATAAGTTCCA TATATAAGGCTAAT
TATATAAGT -TCCA
• Interval-scaled variables:
• Binary variables:
• Nominal, ordinal, and ratio variables:
• Variables of mixed types:
Interval-valued variables
• Standardize data
– Calculate the mean absolute deviation:
s f 1n (| x1 f m f | | x2 f m f | ... | xnf m f |)
d (i, j) p
p
m
yif = log(xif)
– treat them as continuous ordinal data treat their rank as interval-
scaled
Variables of Mixed
Types
• A database may contain all the six types of variables
– symmetric binary, asymmetric binary, nominal, ordinal, interval
and ratio
• One may use a weighted formula to combine their
effects pf 1 ij( f ) d ij( f )
d (i, j )
pf 1 ij( f )
– f is binary or nominal:
dij(f) = 0 if xif = xjf , or dij(f) = 1 o.w.
– f is interval-based: use the normalized distance
– f is ordinal or ratio-scaled
• compute ranks rif and zif rif 1
• and treat zif as interval-scaled
M f 1
Cluster Analysis
• What is Cluster Analysis?
• Types of Data in Cluster Analysis
• A Categorization of Major Clustering Methods
• Partitioning Methods
• Hierarchical Methods
• Density-Based Methods
• Grid-Based Methods
• Model-Based Clustering Methods
• Outlier Analysis
• Summary
Major Clustering Approaches
• Partitioning algorithms: Construct various partitions and then
evaluate them by some criterion
• Hierarchy algorithms: Create a hierarchical decomposition of the set
of data (or objects) using some criterion
• Density-based: based on connectivity and density functions
• Grid-based: based on a multiple-level granularity structure
• Model-based: A model is hypothesized for each of the clusters and
the idea is to find the best fit of that model to each other