A_new_hierarchical_clustering_algorithm (1)
A_new_hierarchical_clustering_algorithm (1)
Zahra Nazari, Dongshik Kang & M.Reza Asharif Yulwan Sung & Seiji Ogawa
Graduate School of Engineering & Science Kansei Fukushi Research Center
University of the Ryukyus Tohoku Fukushi University
Okinawa, Japan Sendai, Japan
zahra.amin.nazari@gmail.com {sung, ogawa-s}@tfu-mail.tfu.ac.jp
{kang,asharif}@ie.u-ryukyu.ac.jp
Abstract—The purpose of data clustering algorithm is to form This method uses the single uniform grid mesh to partition the
clusters (groups) of data points such that there is high intra- entire problem domain into cells. However, according to Farley
cluster and low inter-cluster similarity. There are different types and Rafery (1998) clustering methods are mainly divided only
of clustering methods such as hierarchical, partitioning, grid and into two groups of hierarchical and partitioning [1, 2, 3].
density based. Hierarchical clustering is a method of cluster
analysis which seeks to build a hierarchy of clusters. A Hierarchical clustering methods primarily have been
hierarchical clustering method can be thought of as a set of obtained using agglomerative algorithms (Sneath and Sokal,
ordinary (flat) clustering methods organized in a tree structure. 1973; King, 1967; Guha et al.; 1998, 1999; Karypis et al.,
These methods construct the clusters by recursively partitioning 1999), in which each object initially represents a cluster of its
the objects in either a top-down or bottom-up fashion. In this own, and then clusters are iteratively merged until the whole
paper we present a new hierarchical clustering algorithm using tree is formed. However, divisive algorithms (Mac- Queen,
Euclidean distance. To validate this method we have performed 1967; Jain and Dubes, 1988; Ng and Han, 1994; Cheeseman
some experiments with low dimensional artificial datasets and and Stutz, 1996; Zahn, 1971; Han et al., 1998; Strehl and
high dimensional fMRI dataset. Finally the result of our method Ghosh, 2000; Boley, 1998; Ding et al., 2001) can also be used
is compared to some of existing clustering methods. to obtain hierarchical clustering via a sequence of repeated
splitting. In recent years, various researchers have recognized
Keywords—clustering algorithm; hierarchical clustering
that divisive clustering algorithms are well-suited for clustering
algorithm; agglomerative hierarchical clustering; fMRI data
clustering;
large document datasets due to their relatively low
computational cost (Cutting et al., 1992; Larsen and Aone,
I. INTRODUCTION 1999; Aggarwal et al., 1999; Steinbach et al., 2000).
Nevertheless, there is the common belief that in terms of
Clustering is a fundamental problem that has been the focus clustering quality, agglomerative algorithms are better and
of considerable researches in machine learning, pattern more effective than their divisive counterparts [4].
recognition and statistics. Clustering is an example of
unsupervised learning which means there are no training II. CLUSTERING
samples from which to learn. Clustering automatically forms
clusters of samples that all are closely related. Therefore, the A general definition of clustering is “organizing a group of
similarities between those samples assigned to the same cluster objects that share similar characteristics.” Clustering is an
tend to be bigger than those in different clusters. It is also inductive learning task. It differs from classification by the lack
called unsupervised classification, because it produces the of a predetermined target value to be predicted, which means
same result as classification algorithms, but without having that resulting clusters are not known before the execution of
predefined classes. In a simple form, the goal of clustering clustering algorithm. Clustering can be thought of as
algorithms is to take a dataset and find the distinct clusters that classification with autonomously discovered rather than
exist within it. Clustering is widely used algorithm in different predefined classes, which are based on similarity patterns
areas such as psychology, business and retail, computational identified in the data. The purpose of clustering is organizing
biology, social media network analysis, and etc. There are data into clusters such that there is high intra-cluster and low
many types of clustering methods such as hierarchical, inter-cluster similarity. The similarity of a cluster can be
partitioning, grid and density based, each of which uses a expressed by the distance function. Scalability, ability to deal
different induction principle. Briefly, the hierarchical method with noisy data, ability to handle dynamic data and
produces a sequence of clustering’s in which each clustering is insensitivity to order of input records and high dimensionality
nested into the next clustering in the sequence. The partitioning are some of requirements for a clustering algorithm. The
method splits the dataset into k partitions, where each partition efficiency of a clustering algorithm is measured by its ability to
represents a cluster. The density based clustering finds clusters find out some or all of the hidden patterns of data. A good
of data which are dense regions of dataset, these clusters are clustering algorithm should be able to identify clusters
separated by low-density regions from each other. The grid irrespective of their shapes. Some typical reasons to perform
based clustering method is the fastest processing time that clustering are: finding internal structure of the data e.g. genes
typically depends on the size of the grid instead of the dataset. clustering; partitioning the data e.g. market segmentation;
148
978-1-4799-8562-3/15/$31.00 ©2015 IEEE
Authorized licensed use limited to: Army Institute of Technology. Downloaded on August 29,2024 at 15:56:58 UTC from IEEE Xplore. Restrictions apply.
ICIIBMS 2015, Track2: Artificial Intelligence, Robotics, and Human-Computer Interaction, Okinawa, Japan
b) Divisive hierarchical clustering: A top-down
approach that all objects initially belongs to a single root
cluster and iteratively partitions existing clusters into sub-
clusters. This method is introduced in Kaufmann and
Rousseeuw (1990) and is the inverse of agglomerative method
[1].
Hierarchical algorithms are of great interest for a number of
application domains and are thought to produce better quality
clusters. Embedded flexibility with regard to the level of
granularity, ease of handling any forms of similarity and
Fig.1. Stages of clustering. distance and applicability to any attributes type are some of the
advantages of hierarchical clustering methods. Meanwhile
there are two disadvantages for these methods: most of
knowledge discovery in data, e.g. underlying rules and topics; hierarchical algorithms do not revisit once constructed clusters
and sometimes for pre-processing of data [1, 2]. with the purpose of improvement; vagueness of termination
A. Hierarchical Clustering criteria. While most of hierarchical clustering algorithms
involve joining two sub-clusters or splitting a cluster into two
Hierarchical clustering is a well known clustering method sub-clusters in one step, some hierarchical algorithms join
that can be thought of as a set of flat clustering methods more than two sub-clusters or split a cluster into more than two
organized in a tree structure. These methods construct the sub-clusters. In hierarchical clustering, one can represent the
clusters by recursively partitioning the data in either a top- clustering of objects as a tree called a dendrogram, in which the
down or bottom-up fashion, applicable to different domain nodes represent subsets of the input dataset. The leaf nodes
regions. Hierarchical methods are commonly used for correspond to the individual elements of the dataset, and the
clustering in Data Mining problems. Among hierarchical root corresponds to the entire dataset. Each edge in the
algorithms, bottom-up approaches tend to be more accurate, dendrogram represents an inclusion relationship (Fig. 2) [2, 8].
but have higher computational cost than top-down approaches.
However, this increased computational complexity does not B. Partitioning Clustering
coincide with increased conceptual or algorithmic complexity, The simplest form of clustering is partitioning clustering
since the process of cluster hierarchy formation can be which splits a given dataset into k (an arbitrary number)
organized as a sequence of basic cluster merging or partitions, where each partition represents a cluster.
partitioning operations [5, 6, 7]. Hierarchical method can be Partitioning algorithms create one-level (un-nested) partitions
subdivided as following: of the data points. If k is the desired number of clusters, then
a) Agglomerative hierarchical clustering: A bottom-up partitioning algorithms find all k clusters at once, unlike the
approach which each object initially represents a cluster of its traditional hierarchical approaches which bisect a cluster to get
own, then similar clusters are iteratively merged until the two sub-clusters or merge two sub-clusters to get one cluster.
desired cluster structure is obtained. This algorithm for N This clustering method uses several greedy heuristics schemes
in the form of iterative optimization, which means different
samples begins with N clusters and each cluster contains a
relocation schemes that iteratively reassign points between the
single sample. Afterwards two clusters with the closest
k clusters. Relocation algorithms gradually improve clustering
similarity will merge until the number of clusters becomes one result. In this method, cluster should exhibit two properties: (a)
or as specified by the user. The criteria used in this algorithm each cluster must contain at least one object; (b) each object
are min distance, max distance, average distance, and center must belong to exactly one cluster. There are many methods of
distance [1, 6]. partitioning clustering such as: K-means, Bisecting k-means
method, PAM (Partitioning Around Medoids), CLARA and the
Probabilistic Clustering. Among all, K-means is one of the
most popular and widely used clustering algorithms which is
explained in the following sub section [2, 3, 9].
a) K-means: K-means clustering is one of the most well
known partitioning methods which is used in scientific and
industrial applications. The k-means algorithm discovers k
clusters by finding k centroids (central points) also called
cluster representative in the dataset. Then assigns each data
point to the cluster with the nearest centroid; each data point
could be assigned to only one of k clusters (a cluster centroid
is the mean or median of the points in its cluster and nearness
is defined by a distance or similarity function). K-means
algorithm is applicable only when the number of clusters or
Fig.2. The dendrogram for an agglomerative hierarchical clustering means is defined. This means it requires the value of k in
(bottom-up). advance which is very difficult. K-means algorithm is a local
149
Authorized licensed use limited to: Army Institute of Technology. Downloaded on August 29,2024 at 15:56:58 UTC from IEEE Xplore. Restrictions apply.
ICIIBMS 2015, Track2: Artificial Intelligence, Robotics, and Human-Computer Interaction, Okinawa, Japan
5) Find the Maximum value of (d) in each cluster and
name it D.
6) Calculate the distance between the data points of
different clusters, if there are two points (point 1 from cluster
1 and point 2 from cluster 2) from different clusters which
their distance is <= (D of cluster 1) or (D of cluster 2), then
merge these two clusters (Fig. 5).
III. NEW HIERARCHICAL CLUSTERING ALGORITHM 5) Stop if current result is same as previous result (no
change).
This new hierarchical clustering algorithm is a bottom-up Result: Cluster 1= {(2, 1) (4, 2) (5, 2) (8, 4) (9, 4) (10, 8)}
agglomerative hierarchical clustering approach. Suppose that a
set of points ܵ ൌ ሼଵ ǡ Ǥ Ǥ Ǥ ǡ ሽ in Թ is given and we want to After searching for all pairs, we will have primary clusters as
cluster them. The first step is to find the nearest neighbor for below (Fig. 5. (a)):
each data point to make pairs, then looking for those pairs that
Cluster 1= {(2, 1) (4, 2) (5, 2) (8, 4) (9, 4) (10, 8)}
have a point in common to make primary clusters. The next
step is to calculate the mean value for each primary clusters Cluster 2= {(3, 7) (6, 7)}
and then measuring the distance between mean and all data Cluster 3= {(11, 13) (15, 13) (17, 11)}
points of clusters to find the biggest distance (D) in each cluster. Cluster 4= {(12, 14) (16, 12) (18, 16) (19, 16)}
The final step is to measure the distance between data points of
different clusters. If there are two data points from different After constructing primary clusters, we do steps 3 to 6 to
clusters which their distance is <= (D of cluster 1) or (D of construct final clusters of data. These steps will be repeated
cluster 2), then merge these two clusters. until no change in all clusters.
A. Algorithm:
1) Find the nearest neighbor for each data point to make
pairs, by calculating Euclidean distance between all data
points (Fig. 4).
݀ሺǡ ݍሻ ൌ ඥሺݍଵ െ ଵ ሻଶ ሺݍଶ െ ଶ ሻଶ (1)
150
Authorized licensed use limited to: Army Institute of Technology. Downloaded on August 29,2024 at 15:56:58 UTC from IEEE Xplore. Restrictions apply.
ICIIBMS 2015, Track2: Artificial Intelligence, Robotics, and Human-Computer Interaction, Okinawa, Japan
Fig.5. (a) The primary clusters which are shown by different colors
after merging pairs of data with a common point. (b) The final clusters
after merging primary clusters.
IV. EXPERIMENTS
To test our algorithm we applied it to two clustering
problems in MATLAB environment. First we used a two
dimensional artificial dataset. This dataset contains fifty data
points in three categories. We used k-means, agglomerative
clustering method and our new method to cluster these data and Fig. 8. (a) Computational time (b) Clustering accuracy of fMRI data.
then we compared their results. The result for artificial dataset
clustering is illustrated in Fig. 7. For the second experiment we
used the high dimensional fMRI dataset. These data were
acquired by a Verio system (Siemens, Germany) with a V. DISCUSSION
standard 12-channel head matrix coil operating at 3 Tesla at
Kansei Fukushi Research Institute, Tohoku Fukushi University, This new method ensures that all nearby points are in the
Sendai, Japan (see ref 11 for more information). This dataset same cluster. According to the results of k-means,
contains resting-state fMRI data of twenty six subjects, thirteen agglomerative clustering and the new clustering method, we
(half) of them are in class one and others are in class two. can claim that the accuracy of our purposed method is higher
Before clustering the data, first we reduced the dimensionality than others. However the computational time is more,
of data from 13456 to 10 using Principle Component Analysis especially for the dataset with higher dimension.
(PCA). Afterwards we applied k-means, agglomerative The future work to be done is to reduce the computational
clustering method and also our purposed method to cluster time of this algorithm to make it more suitable for high
them. The result of fMRI data clustering is illustrated in Fig. 8. dimensional datasets. As well as testing it on more clustering
Computational time and clustering accuracy are two factors problems and comparing its performance with other clustering
that were considered in above experiments; thereby we have methods.
illustrated them in below figures respectively (Fig.7 and Fig.8).
151
Authorized licensed use limited to: Army Institute of Technology. Downloaded on August 29,2024 at 15:56:58 UTC from IEEE Xplore. Restrictions apply.
ICIIBMS 2015, Track2: Artificial Intelligence, Robotics, and Human-Computer Interaction, Okinawa, Japan
ACKNOWLEDGMENT [5] O. Maimon & L. Rokach, The Data Mining and Knowledge Discovery
Handbook, pp. 321 -340, Springer Science+Business Media, Inc. 2005.
This work was supported by JSPS KAKEN Grant Number [6] E. Alpydin, Introduction to Machinel Learning, pp. 143-158, The MIT
26350995. Press 2010.
[7] E. Masciari, G. M. Mazzeo & C. Zaniolo, “A new, fast and accurate
REFERENCES algorithm for hierarchical clustering on euclidean distances” pp. 111-114,
Springer-Verlag Berlin Heidelberg 2013, Part ė, LNAI 7819.
[1] P. Cichosz, Data Mining Algorithms Explained Using R, pp. 349-362, [8] T. Segaran, Programming Collective Intelligence, pp. 29-38, O’REILLY
John Wiley & Sons, Ltd. 2015. Media, Inc. 2007.
[2] A. K. Mann & N. Kuar, “Review paper on clustering techniques,” [9] Arpit Gupta, Ankit. Gupta & A. Mishra, “Research paper on clustering
Global Journal of Computer Science and Technology Software & Data techniques of data variations,” pp. 39-42, International Journal of
Engineering, pp. 43-46, vol. 13, issue. 5, version. 1.0, 2013. Advance Technology & Engineering Research (IJATER), vol. 1, issue. 1,
November 2011.
[3] M. Kaur & U. Kaur, “Comparison between k-means and hierarchical
algorithm using query redirection,” International Journal of Advanced [10] A. Likas, N. Vlassis, J. Verbeek, “The global k-means clustering
Research in Computer Science and Software Engineering, pp. 1454- algorithm,” pp. 451-452, The Journal of the Pattern Recognition Society,
1455, vol. 3, issue. 7, July 2013. Pė: S0031-3203(02)00060-2, 2013.
[4] Y. Zhao & G. Karypis, “Hierarchical clustering algorithms for document [11] Y. Sung, S. Ogawa, “Cross-model connectivity of the secondary
datasets,” pp. 141-142, Data Mining and Knowledge Discovery, 10, auditory cortex with higher visual area in the congenitally deaf-A case
141-168, 2005 Springer Science + Business Media, Inc. study,” J. Biomedical Science and Engineering, 2013, 6, 314-318.
152
Authorized licensed use limited to: Army Institute of Technology. Downloaded on August 29,2024 at 15:56:58 UTC from IEEE Xplore. Restrictions apply.