—Clustering can be define as a method of unsupervised classification, in which the data points ar... more —Clustering can be define as a method of unsupervised classification, in which the data points are grouped into cluster based on their similarity. K-means is an efficient and widely used algorithm for the purpose of partitioned clustering. K-means algorithm is effective in producing clusters for many real time practical applications. In this Work a modified parallel K-Means algorithm is proposed that overcome the problem of fixed number of input clusters and their serial execution. Originally K-means algorithm accepts the fix number of clusters as input. But in real world scenario it is very difficultto fix the number of clusters in order to get the optimal outcomes. K-means is popular and widely used clustering technique. Many research has been already done in same area for the improvement of K-means clustering, but further investigation is always required to reveal the answers of the important questions such as 'is it possible to find optimal number of clusters dynamically while ignoring the empty clusters' or 'does the parallel execution of any clustering algorithm really improves it performance in terms of speedup'. This research presents an improved K-Means algorithm which is capable to calculate the number of clusters dynamically using Dunn's index approach and further executes the algorithm in parallel using the capabilities of Microsoft's Task Parallel Libraries. The original K-Means and Improved parallel modified K-Means algorithm performed for the two dimensional raw data consisting different numbers of records. From the results it is clear that the Improved K-Means is better in all the scenarios either increase the numbers of clusters or change the number of records in raw data. For the same number of input clusters and different data sets in original K-Means and Improved K-Means, the performance ofModified parallel K-Means is 18 to 46 percent better than the original K-Means in terms of Execution time and Speedup.
—Clustering can be define as a method of unsupervised classification, in which the data points ar... more —Clustering can be define as a method of unsupervised classification, in which the data points are grouped into cluster based on their similarity. K-means is an efficient and widely used algorithm for the purpose of partitioned clustering. K-means algorithm is effective in producing clusters for many real time practical applications. In this Work a modified parallel K-Means algorithm is proposed that overcome the problem of fixed number of input clusters and their serial execution. Originally K-means algorithm accepts the fix number of clusters as input. But in real world scenario it is very difficultto fix the number of clusters in order to get the optimal outcomes. K-means is popular and widely used clustering technique. Many research has been already done in same area for the improvement of K-means clustering, but further investigation is always required to reveal the answers of the important questions such as 'is it possible to find optimal number of clusters dynamically while ignoring the empty clusters' or 'does the parallel execution of any clustering algorithm really improves it performance in terms of speedup'. This research presents an improved K-Means algorithm which is capable to calculate the number of clusters dynamically using Dunn's index approach and further executes the algorithm in parallel using the capabilities of Microsoft's Task Parallel Libraries. The original K-Means and Improved parallel modified K-Means algorithm performed for the two dimensional raw data consisting different numbers of records. From the results it is clear that the Improved K-Means is better in all the scenarios either increase the numbers of clusters or change the number of records in raw data. For the same number of input clusters and different data sets in original K-Means and Improved K-Means, the performance ofModified parallel K-Means is 18 to 46 percent better than the original K-Means in terms of Execution time and Speedup.
Uploads
[www.ijecs.in] Vol 5 - Isuue 3 by Hitesh Yadav