Data mining is the procedure of mining knowledge from data. The information or knowledge extracted so can be used for Market Analysis, Customer Reten-tion. As a data mining function, cluster analysis serves as a tool to gain insight into the distribution of data to observe characteristics of each cluster. Clustering is the process of making a group of abstract objects into classes of similar objects. An Existing system uses itera-tive optimization algorithm for clustering the data objects with d
Data mining is the procedure of mining knowledge from data. The information or knowledge extracted so can be used for Market Analysis, Customer Reten-tion. As a data mining function, cluster analysis serves as a tool to gain insight into the distribution of data to observe characteristics of each cluster. Clustering is the process of making a group of abstract objects into classes of similar objects. An Existing system uses itera-tive optimization algorithm for clustering the data objects with d
Original Title
Data mining is the procedure of mining knowledge from data. The information or knowledge extracted so can be used for Market Analysis, Customer Reten-tion. As a data mining function, cluster analysis serves as a tool to gain insight into the distribution of data to observe characteristics of each cluster. Clustering is the process of making a group of abstract objects into classes of similar objects. An Existing system uses itera-tive optimization algorithm for clustering the data objects with d
Data mining is the procedure of mining knowledge from data. The information or knowledge extracted so can be used for Market Analysis, Customer Reten-tion. As a data mining function, cluster analysis serves as a tool to gain insight into the distribution of data to observe characteristics of each cluster. Clustering is the process of making a group of abstract objects into classes of similar objects. An Existing system uses itera-tive optimization algorithm for clustering the data objects with d
Download as DOC, PDF, TXT or read online from Scribd
Download as doc, pdf, or txt
You are on page 1of 5
Dr. J.S.Kanchana1, Dr.N.Rajathi2
1 Department of Information Technology, K.L.N. College of Engineering, Pottapalayam, Tamil Nadu, India 2 Department of Information Technology, Kumaraguru College of Technology Coimbatore, Tamil Nadu, India
Data mining is the procedure of mining automated, prospective tools typical of
knowledge from data. The information decision support systems. Data mining or knowledge extracted so can be used tools can answer business questions that for Market Analysis, Customer traditionally were too time consuming to Retention. As a data mining function, resolve. The techniques can be cluster analysis serves as a tool to gain implemented rapidly on existing software insight into the distribution of data to and hardware platforms to enhance the observe characteristics of each cluster. value of existing information-resources, Clustering is the process of making a and can be integrated with new products group of abstract objects into classes of and systems as they are brought on-line. similar objects. An Existing system uses The objective of clustering is to iterative optimization algorithm for grouping similar data items into clusters so clustering the data objects with drifting that items in the same cluster have high concepts using some cluster validity similarity but are dissimilar with item in function to evaluate the effectiveness of other clusters. The process of clustering the clustering model while each new the entire dataset without performing input data subset is flowing. The drifting concepts are in considered. Such Proposed system uses Differential process is not only decreasing the quality Evolutionary Particle Swarm of the cluster but also disregards the Optimization (DEPSO) model for expectations of user that they need only effectively clustering the several real the recent clustering result. The volume of data sets with drifting concepts. data stream is huge so storing and taking the entire data stream is expensive. Keywords: Cluster analysis, Differential Therefore, well defined technique is used evolutionary, Particle Swarm Optimization, for effectively and efficiently clustering Drifting Concepts dataset with drifting concepts. In this paper we use differential evolutionary particle 1 INTRODUCTION swarm optimization algorithm for clustering the dataset with drifting Today’s data mining, it is helpful for concepts. the extraction of hidden predictive information from large databases, it is an powerful new technology with great 2. RELATED WORK: potential to help companies focus on the most important in their data warehouses. Liang Bai [3] in “An optimization Data mining tools predict future trends and model for clustering categorical data behaviors, allowing business to make streams with drifting concepts” had proactive. knowledge based decisions. The observed that there is always a lack of a 2 cluster validity function and optimization without having to look at every point that strategy to find out clusters and catch the has been clustered so far. evolution trend of cluster structures on a categorical data stream. Therefore, this Fuyuan Cao in “A new initialization paper presents an optimization model for method for categorical data clustering” has clustering categorical data streams. An revealed that a novel initialization method iterative optimization algorithm is for categorical data is proposed, in which proposed to solve an optimal solution of the distance between objects and the the objective function with some density of the object is considered. We also constraints. apply the proposed initialization method to Hung-Leng Chen in “Catching the trend: k-modes algorithm and fuzzy k-modes A framework for clustering concept- algorithm. drifting categorical data” has revealed that sampling has been recognized as an important technique to improve the efficiency of clustering. Mechanism 3. PROPOSED WORK: named Maximal Resemblance Data Labeling (abbreviated as MARDL) is 3.1. BLOCK DIAGRAM proposed to allocate each unlabeled data point into the corresponding appropriate The block diagram of the proposed cluster based on the novel categorical system is presented in figure 1. clustering representative, namely, N-Node set Importance Representative(abbreviated as NNIR), which represents clusters by the importance of the combinations of attribute values.
Fuyuan Caoin in “A framework for
clustering categorical time-evolving data” had observed that, the problem of clustering categorical time-evolving data remains as a challenging issue. In this paper, we propose a generalized clustering framework for categorical time-evolving data. The time-complexity analysis indicates that these proposed algorithms are effective for large datasets.
Daniel Barbara in “COOLCAT: An
entropy-based algorithm for categorical “had investigated COOLCAT is well equipped to deal with clustering of data streams (continuously arriving streams of data point) since it is an incremental Fig 1. Working of the Proposed System algorithm capable of clustering new points 7 3.2. CLUSTERING FRAMEWORK in the construction of the new clustering model. In the case, we need to re-cluster In order to cluster categorical data the new input data subset. we consider the streams and detect the drifting concepts, following two factors to find out the we apply a clustering framework based on drifting concepts such as distribution the sliding window technique. The sliding variation and certainty variation. window technique conveniently eliminates The distribution variation may reduce the outdated records and only saves the or enhance the uncertainty of the clustering models, which is utilized in clustering model for cluster several previous works on clustering time- representatives. If the uncertainty is evolving data. Therefore, based on the reduced, it is thought that the concepts do technique, we can cluster the latest data not drift, although the distribution objects in the current window and catch variation is large. the evolution trend of cluster structures on the data stream. The different partition results maybe lead to the different 3.5. CLUSTERING ACCURACY detection results of drifting concepts. EVALUATION Therefore, we need the optimization model to find out the optimal partition result for To evaluate the performance of the new input data subset. clustering algorithms in the experiment, we consider the three validity measures 3.3. DEPSO ALGORITHM accuracy, precision and recall and the formulas are as follows DE-PSO starts like the usual DE 1) accuracy(AC) algorithm up to the point where the trial vector is generated. If the trial vector 2) precision (PE) satisfies the conditions, then it is included in the population otherwise the algorithm enters the PSO phase and generates a new 3) recall (RE) candidate solution. The method is repeated iteratively till the optimum value is RE= reached. The inclusion of PSO phase creates a perturbation in the population, which in turn helps in maintaining 4. EXPERIMENTAL SETUP diversity of the population and producing a good optimal solution. Here we can use KDD-CUP’99 data stream for our experiment. Table 1 shows 3.4. THE DRIFTING CONCEPT the computational times against the DETECTION numbers of clusters. According to the After clustering new input data table, the DEPSO algorithm requires more subsets, we need to analyze the change computational times than iterative situation between the new and last algorithms. clustering models, in order to determine whether the drifting concepts occur. While the concepts have drifted, the last clustering model is not used to participate 4 effectively clustering with drifting No. of Iterative DEPSO concepts was presented. Finally, the Iteration Algorithm Algorithm performance of the proposed algorithm is tested and the experimental results have 5 169.12 172.7 shown that the proposed algorithm is effective in clustering the data streams and 10 256.88 262.34 the detection results based on the proposed 15 371.69 385.52 method are reliable. This work has a very vast scope in future and it can be 20 458.69 467.21 implemented on other new heuristic algorithms in future. It can be updated in Table 1. Computational Times (Seconds) near future as and when requirement for of Clustering Algorithms for Different the same arises, as it is very flexible in Numbers of Clusters terms of efficiency.
The figure 2 shows the computational
time of iterative algorithm and DEPSO REFERENCES algorithm for different number of clusters. The DEPSO algorithm shows the better [1] Liang Bai, Xueqi Cheng, Member, computational time when compared with IEEE, Jiye Liang, and Huawei Shen., “An iterative algorithm. Optimization Model for Clustering Categorical Data Streams with Drifting Concepts”, IEEE Transactions on Knowledge and Data Engineering, Vol. 28 , Issue. 11, Nov. 2016. [2] L. Bai, J. Liang, C. Dang, and F. Cao, “The impact of cluster representatives on the convergence of the K-modes type clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 6, pp. 1509– 1522, Jun. 2013. [3] L. Bai, J. Liang, and C. Dang, “An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data,” Knowl.-Based Syst., vol. 24, no. 6, pp. Fig.2. Computational Times (Seconds) of 785–795, 2011. Clustering Algorithms for Different [4] S. Ho and H. Wechsler, “A Numbers of Clusters martingale framework for detecting changes in data streams by testing exchangeability,” IEEE Trans. Pattern 5. CONCLUSION Anal. Mach. Intell., vol. 32, no. 20, pp. 2113–2127, Dec 2010. In this paper, Differential evolutionary [5] F. Cao, J. Liang, and L. Bai, “A particle swarm optimization algorithm for framework for clustering categorical time- 7 evolving data,” IEEE Trans. Fuzzy Syst., vol. 18, no. 5, pp. 872–885, Oct. 2010. [6] K. Chen and L. Liu, “HE-tree: A framework for detecting changes in clustering structure for categorical data streams,” Int. J. Very Large Data Bases, vol. 18, no. 5, pp. 1241–1260, 2009. [7] H. Chen, M. Chen, and S. Lin, “Catching the trend: A framework for clustering concept-drifting categorical data,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 5, pp. 652–665, May 2009.
An Ideal Comparator Compares Two Input Voltages and Produces A Logic Output Signal The Circuit Symbol For A Comparator Is Identical To The One For Op-Amp