Welcome To International Journal of Engineering Research and Development (IJERD)
Welcome To International Journal of Engineering Research and Development (IJERD)
Welcome To International Journal of Engineering Research and Development (IJERD)
I.
INTRODUCTION
Clustering algorithm can be categorized based on several cluster method. A cluster is a set of points such that a point in a cluster is closer to one or more other points in the cluster than to any point not in the cluster. A good clustering method will produce high quality cluster in which the intra cluster similarity is high and inter class similarity is low. For example, in an image related datasets it is difficult to identify how many clusters are available. Image clustering which is an important technology for processing image that has been actively researched for a long period of time. Recently the growth of interest in supervised method makes to improve the way of representing image sets. Image clustering is the high level description of image content. Nowadays the grayscale images are very important for analyzing image contents which has the application of satellite images to medical images. Such analysis becomes very complex. By using certain mathematical approach [1] the clustered grayscale images are determined with the optimal cluster number. The clustering process which separates the data into number of segments those are in the form of n-dimensional space. These segmented data uses a specific function which helps to model the data distribution [8]. Based on intra cluster and inter cluster distance measure in the mathematical approach which allows the number of clusters to be determined automatically. Cluster will be grow depend on the size of database. In other hand some existing subjects concentrates on reducing iteration in K-means method [2] during the clustering process so to obtain an optimized cluster output. These algorithm uses methods such as Genetic algorithm , PSO, Ant Colony Optimization (ACO) Using Genetics algorithm(GA) and PSO these are the optimization technique, to reduce the no of iteration. The new unsupervised k-means clustering algorithm [2] can be applied for the any type of datasets such as images, documents etc. This clustering process depends on the size of the database and concentrates on reducing iterations in K-Means method [1] to obtain an optimized cluster output.
II.
EXISTING WORK
Clustering can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. If the cluster analysis is done on image clustering, the growth of interest in unsupervised method makes to improve the way of representing image sets. There are
24
III.
K-MEANS ALGORITHM
Now, using K-Means clustering algorithm we can cluster an image to obtain segments. To run this algorithm, we need to provide the value of K which is nothing but the number of cluster centers. 3.1 Enhanced Approach The algorithm for Enhanced K-Means is as Follows Input: D={d1,d2,d3,..dn}//Set of n data points C={c1,c2,.ck}//Set of k clusters Output: A set of k clusters Steps: 1. Compute the distance of each data points di(1<=i<=n) to all the centroids cj(1<=j<=k) as d(di,cj). 2. For each data point di, find the closest centroid cj and assign di to cluster j. 3. Set cluster id[i]=j;//j:id of the closest cluster. 4. Set Nearest_Dist[i]=d(di,cj); 5. For each cluster j(1<=j<=k),recalculate the centroid; 6. Repeat; 7. for each data point di; 7.1 Compute its distance from the centroid of the present nearest cluster; 7.2 If this distance is less than or equal to the present nearest distance, the data point stay in this cluster; Else 7.2.1 For every centroid cj (1<=j<=k) Compute the distance d(di,cj); End for; 7.2.2 Assign the data point di to the cluster with the nearest centroid cj; 7.2.3 Set Cluster id[i]=j; 7.2.4 Set Nearest _Dist[i]=d(di,cj); Endfor; 8. For each cluster j(1<=j<=k); Recalculate the centroids; Untill the convergence criteria is met.
IV.
DLCT
DLCT is a Double Link Cluster tree which is used as the enhancement of Enhanced K-means algorithm. In certain dataset there will need for algorithm which can cluster the data without any initialization (i.e.) NO_OF_CLUSTERS.
Fig 1: The Diagrammatical representation of the DLCT DLCT is an algorithm, which works as a looping frame for Enhanced K-Means algorithm and makes Enhanced k-means algorithm to cluster the dataset several times.
25
V.
PROPOSED WORK
In the traditional K-Means algorithm it has limitations of getting no of cluster centers [4] by means of its user. This makes K-Means difficult to use, where there is a unpredictable datasets are available. So to solve this problem we proposed a method named DLCT (Double Link Cluster Tree). In certain dataset there will need for algorithm which can cluster the data without any initialization. During the clustering processes are, 1. Sometimes the cluster centers of the tree node would be same. The algorithm will merge the two clusters which has unique cluster center 2. Due to data insufficiency some cluster will does not have any cluster element. So that cluster space was terminated. In the standard algorithm, the usage of K-Means algorithm [2] is allowed for clustering the datasets only one time, hence the numbers of clusters are given manually. The concept K-Means Enhanced Approach Algorithm with Double Link Cluster Tree (DLCT) focuses on clustering of documents and images in an efficient way by without initializing the number of clusters. In DLCT, the K-Means algorithm is used as library to process the clustering among all the Database types, images always contains the unpredictable amount of cluster in the Fig 2 and 3. It shows input image before clustering and output image after clustering.
26
VI.
CONCLUSION
The clustering of datasets with Enhanced K-Mean algorithm [2] and DLCT helps to make clustering in an efficient way, by without initializing the number of clusters in an unpredictable database. The designing of Double Link Cluster Tree (DLCT) algorithm is done in such a way is solves the problems in previous unsupervised methods implemented. When comparing the proposed algorithm with various existing clustering algorithm, the analysis of clustering process results in the cluster centers. The proposed algorithm is also an automatic optimization process since the separation of the clusters is done each time for every level of the process. In which every cluster centers should have a difference of about 60% else those clusters that have less difference below 60% will be merged and considered as single cluster. This process will be handles to avoid minimum distanced cluster to be a separate clusters.
REFERENCES
[1]. A Novel Approach for Determination of Optimal Number of Cluster, Debashis Ganguly.Computer Science and Engineering,Department,Heritage Institute of Technology,Anandapur Kolkata 700107, India Improving the accuracy and efficiency of K-Mean Clustering Algorithm, by K.A. Abdul Nazeer, M.P. Sebastian. Proceeding of the world congress on Engineering 2009 vol I WCE 2009, July 1-3, 2009, London, U.K. Clustering Algorithms Based on Volume Criteria Raghu Krishnapuram and Jongwoo Kim IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 8, NO. 2, APRIL 2000. An Efficient k-Means Clustering Algorithm: Analysis and Implementation Tapas Kanungo, Senior Member, IEEE, David M. Mount, Member, IEEE, Nathan S. Netanyahu, Member, IEEE, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu, Senior Member, IEEE VOL. 24, NO. 7, JULY 2002
[2].
[3]. [4].
27
[11].
28