An Efficient Incremental Clustering Algorithm

Clustering is process of grouping data objects into distinct clusters so that data in the same cluster are similar. The most popular clustering algorithm used is the K-means algorithm, which is a partitioning algorithm. Unsupervised techniques like clustering may be used for fault prediction in software modules. This paper describes the standard k-means algorithm and analyzes the shortcomings of standard k-means algorithm. This paper proposes an incremental clustering algorithm. Experimental results show that the proposed algorithm produces clusters in less computation time. Keywords - Clustering; Incremental Clustering; K-means; Unsupervised; Partitioning; Data Objects.

Uploaded by

World of Computer Science and Information Technology Journal

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

211 views

An Efficient Incremental Clustering Algorithm

Uploaded by

World of Computer Science and Information Technology Journal

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741 Vol. 3, No.

5, 97-99, 2013

An Efficient Incremental Clustering Algorithm

Nidhi Gupta
USICT, GGSIPU Delhi, India

R.L.Ujjwal
USICT, GGSIPU Delhi, India

Abstract- Clustering is process of grouping data objects into distinct clusters so that data in the same cluster are similar. The most popular clustering algorithm used is the K-means algorithm, which is a partitioning algorithm. Unsupervised techniques like clustering may be used for fault prediction in software modules. This paper describes the standard k-means algorithm and analyzes the shortcomings of standard k-means algorithm. This paper proposes an incremental clustering algorithm. Experimental results show that the proposed algorithm produces clusters in less computation time. Keywords - Clustering; Incremental Clustering; K-means; Unsupervised; Partitioning; Data Objects.

I. INTRODUCTION Clustering is the task of organizing data into groups (known as clusters) such that the data objects that are similar to( or close to) each other are put in the same cluster. Clustering is a form of unsupervised learning in which no class labels are provided. K-means clustering is a popular clustering algorithm based on the partition of data. However, there are some disadvantages of it, such as the number of clusters needs to be defined beforehand. The proposed algorithm overcomes the shortcoming of k-means algorithm. The rest of the paper is organized as follows. Section 2 describes the standard k-means algorithm. Section 3 describes the related work. Section 4 present the method proposed in this paper. Experimental results are shown in section 5.Finally conclusions are drawn in section 6.

A. Limitations of K-means Algorithm 1. The number of clusters (K) needs to be determined beforehand. 2. The algorithm is sensitive to an initial seed selection 3. It is sensitive to outliers. 4. Number of iterations are unknown.

III. RELATED WORK

Shi Na, Liu Xumin, Guan Yong[1] proposed an improved K-means Clustering algorithm. The improved method avoids computing the distance of each data object to the cluster centers repeatedly, saving the running time. K A Abdul Nazeer, D Madhu Kumar ,M P Sebastian[2] proposed a heuristic method based on sorting and partitioning the input data, for finding the initial centroids in accordance with the data distribution, hereby improving the accuracy of the K-means algorithm. Baolin Yi, Haiquan Qiao, Fan Yang, Chenwei Xu[4] proposed a new method to find the initial center and improve the sensitivity to the initial centers of k-means algorithm. The algorithm first computes the density of the area where the data object belongs to; then it finds k data objects, which are belong to high density area, as the initial start centers. Experimental results shows that the proposed method can produce a high purity clustering results and eliminate the sensitivity to the initial centers to some extent. Juntao Wang, Xiaolong Su[3] proposed an improved Kmeans algorithm using noise data filter. The algorithm developed density-based detection methods based on characteristics of noise data. 97

II. K-MEANS CLUSTERING ALGORITHM

The k-means algorithm takes the input parameter, k, and partitions a set of n objects into k clusters K-means Algorithm 1. 2. 3. 4. 5. 6. The basic step of k-means clustering is to give n data objects and k number of clusters. Input the centroids of each cluster. Determine the distance of each object to the centroids Group the object based on minimum distance. Update the centroids Repeat steps from 3 to 5 until change in Groups.

WCSIT 3 (5), 97 -99, 2013 Fang Yuan, Zeng-HuiMeng, Hong-Xia Zhang, Chun-Ru Dong [11] proposed a systematic method for finding the initial centroids. The centroids obtained by this method are consistent with the distribution of data. Fahim A M et al [10]. Proposed an efficient method for assigning data-points to clusters. The original K-means algorithm is computationally very expensive because each iteration computes the distances between data points and all the centroids.Fahims approach makes use of two distance functions for this purpose-one similar to the k-means and other one based on a heuristics. Abdul Nazeer and Sebastian[9] proposed an algorithm comprising of separate methods for accomplishing the two phases of clustering. Mushfeq-Us-Saleheen Shameem and Raihana Ferdous[6] proposed a modified algorithm that uses Jaccard distance measure to choose k most different document and use it as k centroid of cluster. Author show in his result that the sum of square in modified k- mean is nearly half of the traditional k-mean. IV. PROPOSED ALGORITHM In this paper, an incremental clustering approach is used. The basic idea of this algorithm is as follows: Let Tth denotes a threshold of dissimilarity between data objects. We initially give a value of Tth then choose an object randomly from the given datasets, let it be the center of a cluster, and choose another object from the given datasets again, compute distance between the selected data object and the existing cluster center, If this distance is larger than Tth then form a new cluster and selected object will be the center of the cluster otherwise group the object into existing cluster and update its centroid. Choose an object again from the datasets, repeat the process until all objects are clustered. Clustering Steps 1. The basic step of proposed clustering is to give n data objects. 2. Assign any random data object to the first cluster. 3. Select next random object. 4. Determine the distance between selected object and centroids of existing clusters. 5. Compare the distance with threshold limit, group the object into existing cluster or form a new cluster with that object. 6. Repeat the steps 3 to 5 until all objects are selected. Input: D= { d1,d2,d3,..dn} //Set of n objects to cluster Output: K={k1,k2,k3.kk },C= {c1,c2,c3, ck } //K is set of subsets of D as final clusters and C is set of centroids of those clusters Algorithm: Proposed Algorithm (D) 1. let k=1 2. DI=RAND() 98 kk={ dk} K= { kk } Ck= di Assign some constant value to Tth for i= 2 to n do Determine distance (m) between di and each centroid Cj of any kj in K such that m is minimum. (1<=j<=k) 9. if (m<= Tth) then //Tththreshold limit for max. distance allowed 10. kj= kj U di 11. Calculate new mean (centroid cj) for cluster kj; 12. else k= k+1 13. kk= di 14. K= K U kk 15. Ck= di V. EXPERIMENTAL RESULTS A synthetic data set is taken which contains 600 data points and each data point contains 4 attributes. The same data set is given as input to the standard K-means algorithm and the proposed algorithm. First we run the proposed algorithm and note down the clusters formed for taking different value of the threshold. For the same number of clusters we check the k-means algorithm by specifying the value of K equals to the number of cluster formed using proposed algorithm. Experiments compare k-means algorithm with the proposed algorithm in terms of total execution time of clusters. The results of the experiments are tabulated in Table 1.
TABLE 1: COMPARISON OF THE K-MEANS AND PROPOSED ALGORITHM USING A SYNTHETIC DATA SET. K-means Algorithm Number of Clusters 9 8 7 6 5 4 3 2 Time Taken(s) 0.769231 1.043956 0.769231 0.879121 0.659341 0.549451 0.384615 0.274725 Proposed Algorithm Threshold value 15 17 18 19 20 22 25 35 Time Taken(s) 0.219780 0.274725 0.549451 0.467033 0.549451 0.219780 0.164835 0.219780

3. 4. 5. 6. 7. 8.

WCSIT 3 (5), 97 -99, 2013

Figure1. Comparison of time taken by the algorithms.

VI. CONCLUSION In this paper, we propose a new clustering algorithm that can remove the disadvantages of K-means algorithm. In Proposed algorithm we do not need to specify the value of K i.e. the number of cluster required. An experimental result shows that the proposed algorithm takes less time than Kmeans algorithm. From our result we conclude that the proposed algorithm is better than the K-means algorithm. VII. REFERENCES
[1] Shi Na, Liu Xumin,, Guan yong ,Research on k -means Clustering Algorithm, Third International Symposium on Intelligent Information Technology and Security Informatics. [2] K A Abdul Nazeer, S D Madhu Kumar, M P Sebastian, Enhancing the k-means clustering algorithm by using a O(n logn) heuristic method for finding better initial centroid, 2011 Second International Conference on Emerging Applications of Information Technology, IEEE,978-0-7695-4329-1 [3] Juntao Wang, Xiaolong Su, An improved K-Means clustering algorithm, 2011 IEEE, 978-1-61284-486-2 [4]Baolin Yi, Haiquan Qiao, Fan Yang, Chenwei Xu, An Improved Initialization Center Algorithm for K-means Clustering, IEEE 2010 . [5] Abdul Nazeer K A, Sebastian M P, Improving the Accuracy and Efficiency of the k-means Clustering Algorithm, Proceedings of the International Conference on Data Mining and Knowledge Engineering, London, UK, 2009. [6]Mushfeq-Us-Saleheen Shameen, Raihana Ferdous,An Efficient KMeans Algorithm integrated with Jaccard Distance ,2009 IEEE,978 1-4244-4570-7. [7] Jirong Gu ,Jieming Zhou, Xianwei Chen, An Enhancement of Kmeans Clustering Algorithm,2009 International Conference on Business Intelligence and Financial Engineering, 978-0-7695-3705-4. [8]Xiaoping Qing, Shijue Zheng,A new method for initializing the Kmeans Clustering algorithm, 2009 Second International Symposium on Knowledge Acquisition and Modeling, IEEE, 978-0-7695-3888-4. [9] K A Abdul Nazeer and M P Sebastian, A O(n logn) clustering algorithm using heuristic partitioning, Technical Report, Department of Computer Science and Engineering, NIT Calicut, March 2008. [10] Fahim A.M, Salem A. M, Torkey A and Ramadan M. A, An Efficient enhanced k-means clustering algorithm, Journal of Zhejiang University, 10(7):16261633, 2006. [11] Fang yuan,Zeng-Hui Meng, H. X Zhang and C. R Dong, A New Algorithm to Get the Initial Centroids, Proc. of the 3rd International Conference on Machine Learning and Cybernetics, pages 26 29,August 2004.

The International Journal of Engineering and Science (The IJES)
No ratings yet
The International Journal of Engineering and Science (The IJES)
4 pages
A Dynamic K-Means Clustering For Data Mining-Dikonversi
No ratings yet
A Dynamic K-Means Clustering For Data Mining-Dikonversi
6 pages
na2010
No ratings yet
na2010
5 pages
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
No ratings yet
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
8 pages
Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648
No ratings yet
Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648
6 pages
A Dynamic K-Means Clustering For Data Mining
No ratings yet
A Dynamic K-Means Clustering For Data Mining
6 pages
Normalization Based K Means Clustering Algorithm
No ratings yet
Normalization Based K Means Clustering Algorithm
5 pages
1 A Modified Version
No ratings yet
1 A Modified Version
7 pages
A Review On K Means Clustering
No ratings yet
A Review On K Means Clustering
7 pages
V5I5201647
No ratings yet
V5I5201647
13 pages
Comprehensive Review of K-Means Clustering Algorithms
No ratings yet
Comprehensive Review of K-Means Clustering Algorithms
5 pages
A Genetic K-Means Clustering Algorithm Based On The Optimized Initial Centers
No ratings yet
A Genetic K-Means Clustering Algorithm Based On The Optimized Initial Centers
7 pages
An Improvement in K Means Clustering Algorithm IJERTV2IS1385
No ratings yet
An Improvement in K Means Clustering Algorithm IJERTV2IS1385
6 pages
Storage Technologies: Digital Assignment 1
No ratings yet
Storage Technologies: Digital Assignment 1
16 pages
Research on k Mean Algorithm
No ratings yet
Research on k Mean Algorithm
5 pages
JETIR1503025
No ratings yet
JETIR1503025
4 pages
An_Efficient_Fuzzy_Clusnjkstering_Algorithm
No ratings yet
An_Efficient_Fuzzy_Clusnjkstering_Algorithm
10 pages
Azimi 2017
No ratings yet
Azimi 2017
26 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Intro Data Science: Cluster Analysis
No ratings yet
Intro Data Science: Cluster Analysis
60 pages
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
No ratings yet
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
7 pages
MMZ XRF O0 Ra Pre 0 ZB XGXW W1 Er 02 OAYQum QDD78 HQP
No ratings yet
MMZ XRF O0 Ra Pre 0 ZB XGXW W1 Er 02 OAYQum QDD78 HQP
4 pages
Implementing and Improvisation of K-Means Clustering: International Journal of Computer Science and Mobile Computing
No ratings yet
Implementing and Improvisation of K-Means Clustering: International Journal of Computer Science and Mobile Computing
5 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
No ratings yet
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
8 pages
A Novel Approach For Data Clustering Using Improved K-Means Algorithm PDF
No ratings yet
A Novel Approach For Data Clustering Using Improved K-Means Algorithm PDF
6 pages
DM UNIT IV (1)
No ratings yet
DM UNIT IV (1)
45 pages
I Jsa It 04132012
No ratings yet
I Jsa It 04132012
4 pages
Amalgam Clustering Algorithm
No ratings yet
Amalgam Clustering Algorithm
9 pages
Research On K-Means Clustering Algorithm An Improved K-Means Clustering Algorithm
No ratings yet
Research On K-Means Clustering Algorithm An Improved K-Means Clustering Algorithm
5 pages
By R.Siranjeevi Me (Cse) Guided by Mrs.P.Hemavathi ME., (PHD) (AP/IT)
No ratings yet
By R.Siranjeevi Me (Cse) Guided by Mrs.P.Hemavathi ME., (PHD) (AP/IT)
10 pages
An Incremental K-Means Algorithm
No ratings yet
An Incremental K-Means Algorithm
14 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Kmeans and Adaptive K Means
No ratings yet
Kmeans and Adaptive K Means
6 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
clustering
No ratings yet
clustering
9 pages
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
No ratings yet
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
6 pages
An Efficient Enhanced K-Means Clustering Algorithm
No ratings yet
An Efficient Enhanced K-Means Clustering Algorithm
8 pages
K-Means Clustering Method For The Analysis of Log Data
No ratings yet
K-Means Clustering Method For The Analysis of Log Data
3 pages
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
No ratings yet
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
12 pages
Kmeans&Variants
No ratings yet
Kmeans&Variants
29 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
1730702218_ML13_Kmeans
No ratings yet
1730702218_ML13_Kmeans
11 pages
Clustering
No ratings yet
Clustering
125 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
1 s2.0 S0020025522014633 Main
No ratings yet
1 s2.0 S0020025522014633 Main
33 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Fuzzy C-Means - Review
No ratings yet
Fuzzy C-Means - Review
3 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
Clustering
No ratings yet
Clustering
104 pages
Ijcset 2016060701
No ratings yet
Ijcset 2016060701
3 pages
8910-24120-1-PB
No ratings yet
8910-24120-1-PB
7 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
1 s2.0 S1877050923018549 Main
No ratings yet
1 s2.0 S1877050923018549 Main
5 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
K Means Algo
No ratings yet
K Means Algo
7 pages
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Mobile Based Attendance System Using QR Code
100% (1)
Mobile Based Attendance System Using QR Code
5 pages
Multimorbidity Prediction Using Data Mining Model
No ratings yet
Multimorbidity Prediction Using Data Mining Model
5 pages
Elderly People's Life Saving Assistant (EPLSA)
No ratings yet
Elderly People's Life Saving Assistant (EPLSA)
6 pages
Customer Segmentation Based On GRFM Case Study
No ratings yet
Customer Segmentation Based On GRFM Case Study
6 pages
Review of Literature On Software Quality
No ratings yet
Review of Literature On Software Quality
11 pages
Crack Detection On Concrete Surfaces Using
No ratings yet
Crack Detection On Concrete Surfaces Using
6 pages
The General Form of GoF Design Patterns
No ratings yet
The General Form of GoF Design Patterns
9 pages
Research On Segmenting E-Commerce Customer Through An Improved K-Medoids Clustering Algorithm
No ratings yet
Research On Segmenting E-Commerce Customer Through An Improved K-Medoids Clustering Algorithm
10 pages
BI Bankai
No ratings yet
BI Bankai
27 pages
MLQB Unit 3
No ratings yet
MLQB Unit 3
12 pages
[S1 IJEECS 2021 Rohit Chivukula] Classifying Clinically KNN and SVM
No ratings yet
[S1 IJEECS 2021 Rohit Chivukula] Classifying Clinically KNN and SVM
8 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Standardization and Its Effects On K-Means Clustering Algorithm
No ratings yet
Standardization and Its Effects On K-Means Clustering Algorithm
6 pages
Ontology Modelling For FDA Adverse Event Reporting System
No ratings yet
Ontology Modelling For FDA Adverse Event Reporting System
5 pages
1120pm - 85.epra Journals 8308
No ratings yet
1120pm - 85.epra Journals 8308
7 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
47 pages
DSML Notes
No ratings yet
DSML Notes
32 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
123 pages
Clustering L7
No ratings yet
Clustering L7
7 pages
Lecture 17 Clustering
No ratings yet
Lecture 17 Clustering
63 pages
K-Medoids-Clustering Method
No ratings yet
K-Medoids-Clustering Method
5 pages
Implementation of Data Mining To Classify The Consumer's Complaints of Electricity Usage Based On Consumer's Locations Using Clustering Method
No ratings yet
Implementation of Data Mining To Classify The Consumer's Complaints of Electricity Usage Based On Consumer's Locations Using Clustering Method
8 pages
An Efficient User Centric Clustering Approach For Product Recommendation Based On Majority Voting: A Case Study On Wine Data Set
No ratings yet
An Efficient User Centric Clustering Approach For Product Recommendation Based On Majority Voting: A Case Study On Wine Data Set
9 pages
Intelligent Systems Notes: Federico Rossi A.A 2017/2018
No ratings yet
Intelligent Systems Notes: Federico Rossi A.A 2017/2018
34 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Quantum machine learning in medical image analysis: A survey
No ratings yet
Quantum machine learning in medical image analysis: A survey
12 pages
Kmeans Matlab Code Feed Own Data Source - QuestionInBox
No ratings yet
Kmeans Matlab Code Feed Own Data Source - QuestionInBox
5 pages
BE AIDS 2020 Syllabus
No ratings yet
BE AIDS 2020 Syllabus
126 pages
Classification and Clustering: CS109/Stat121/AC209/E-109 Data Science
No ratings yet
Classification and Clustering: CS109/Stat121/AC209/E-109 Data Science
28 pages
SMEC ML LAB MANUAL R22
No ratings yet
SMEC ML LAB MANUAL R22
21 pages
Unsupe - Rvised Learning: Able T Understand and Prehend
No ratings yet
Unsupe - Rvised Learning: Able T Understand and Prehend
25 pages
Report
No ratings yet
Report
35 pages
Q1a) What Is Big Data? Explain Characteristics of Big Data (4M) Ans
No ratings yet
Q1a) What Is Big Data? Explain Characteristics of Big Data (4M) Ans
16 pages
Syllabus Sem 5
No ratings yet
Syllabus Sem 5
90 pages
Machine Learning and A I For Risk Management
No ratings yet
Machine Learning and A I For Risk Management
18 pages
Nptel Notes 4
No ratings yet
Nptel Notes 4
12 pages
Unit 4
No ratings yet
Unit 4
23 pages

An Efficient Incremental Clustering Algorithm

Uploaded by

An Efficient Incremental Clustering Algorithm

Uploaded by

World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741 Vol. 3, No.

An Efficient Incremental Clustering Algorithm

III. RELATED WORK

II. K-MEANS CLUSTERING ALGORITHM

WCSIT 3 (5), 97 -99, 2013

Figure1. Comparison of time taken by the algorithms.

You might also like