DM Cluster Analysis
DM Cluster Analysis
DM Cluster Analysis
http://www.tutorialspoint.com/data_mining/dm_cluster_analysis.htm
Copyright tutorialspoint.com
Cluster is a group of objects that belongs to the same class. In other words, similar objects are
grouped in one cluster and dissimilar objects are grouped in another cluster.
What is Clustering?
Clustering is the process of making a group of abstract objects into classes of similar objects.
Points to Remember
A cluster of data objects can be treated as one group.
While doing cluster analysis, we first partition the set of data into groups based on data
similarity and then assign the labels to the groups.
The main advantage of clustering over classification is that, it is adaptable to changes and
helps single out useful features that distinguish different groups.
Clustering Methods
Clustering methods can be classified into the following categories
Partitioning Method
Hierarchical Method
Density-based Method
Grid-Based Method
Model-Based Method
Constraint-based Method
Partitioning Method
Suppose we are given a database of n objects and the partitioning method constructs k partition
of data. Each partition will represent a cluster and k n. It means that it will classify the data into k
groups, which satisfy the following requirements
Each group contains at least one object.
Each object must belong to exactly one group.
Points to remember
For a given number of partitions sayk, the partitioning method will create an initial
partitioning.
Then it uses the iterative relocation technique to improve the partitioning by moving objects
from one group to other.
Hierarchical Methods
This method creates a hierarchical decomposition of the given set of data objects. We can classify
hierarchical methods on the basis of how the hierarchical decomposition is formed. There are two
approaches here
Agglomerative Approach
Divisive Approach
Agglomerative Approach
This approach is also known as the bottom-up approach. In this, we start with each object forming
a separate group. It keeps on merging the objects or groups that are close to one another. It keep
on doing so until all of the groups are merged into one or until the termination condition holds.
Divisive Approach
This approach is also known as the top-down approach. In this, we start with all of the objects in the
same cluster. In the continuous iteration, a cluster is split up into smaller clusters. It is down until
each object in one cluster or the termination condition holds. This method is rigid, i.e., once a
merging or splitting is done, it can never be undone.
Density-based Method
This method is based on the notion of density. The basic idea is to continue growing the given
cluster as long as the density in the neighborhood exceeds some threshold, i.e., for each data
point within a given cluster, the radius of a given cluster has to contain at least a minimum number
of points.
Grid-based Method
In this, the objects together form a grid. The object space is quantized into finite number of cells
that form a grid structure.
Advantage
The major advantage of this method is fast processing time.
It is dependent only on the number of cells in each dimension in the quantized space.
Model-based methods
In this method, a model is hypothesized for each cluster to find the best fit of data for a given
model. This method locates the clusters by clustering the density function. It reflects spatial
distribution of the data points.
This method also provides a way to automatically determine the number of clusters based on
standard statistics, taking outlier or noise into account. It therefore yields robust clustering
methods.
Constraint-based Method
In this method, the clustering is performed by the incorporation of user or application-oriented
constraints. A constraint refers to the user expectation or the properties of desired clustering
results. Constraints provide us with an interactive way of communication with the clustering
process. Constraints can be specified by the user or the application requirement.
Loading [MathJax]/jax/output/HTML-CSS/jax.js