ClusteringAlgorithms ConventionalandRecent
ClusteringAlgorithms ConventionalandRecent
net/publication/331702336
CITATIONS READS
0 213
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Rahul Joshi on 13 March 2019.
1
Distance and Similarity Indices
2
Clustering Evaluation Indices
3
Cluster A: Conventional Clustering Algorithms
Suitable for clustering of large scale data. Cluster number dependent. Parameter sensitive clustering. Little sensitive to noise and
outlier. Moderately sensitive to sequence of inputting data. High time complexity.
4
Cluster B: Conventional Clustering Algorithms
High scalability, Suitable for arbitrary data and Highly sensitive to sequence of inputting data.
5
Cluster A: Recent Clustering Algorithms
Use of functions for clustering, High time complexity, Little sensitive to noise or outlier, Suitable for arbitrary data.
6
Cluster B: Recent Clustering Algorithms
High time complexity, Non suitability to large scale and high dimensional data, Suitable for arbitrary data.
7
Cluster C: Recent Clustering Algorithms
Large scale characteristics sharing, Little sensitive to noise/ outlier, Scalable to large scale data.
8
Cluster A: Parallel and Distributed Clustering Algorithms
Conventional clustering algorithms, High suitability to large scale, high dimensional data.
9
Cluster B: Parallel and Distributed Clustering Algorithms
Conventional, recent clustering algorithms, Middle scalability to large scale and high dimensional data.
10
Learning from Mind Maps
11
Literature Review Contd...(Key strategies to be adopted for Distributed
Incremental Processing)
• Iterative processing
• Incremental processing
• Task scheduling
12
Literature Review Contd...(Research Questions to be taken into account
for Distributed Incremental Processing)
13
Literature Review Contd...(Major Challenges in Distributed System)
(Source: http://www.ejbtutorial.com/distributed-systems/challenges-for-a-distributed-
system)
14
Literature Review Contd...(Chronic disease datasets)
• Diabetes 130-US hospitals for years 1999-2008 data set (55 attributes,
100000 instances)
• Diabetic retinopathy detection data set (20 attributes, 1151 instances)
• Pima Indians Diabetes data set (8 attributes, 768 instances)
• Heart disease data set (75 attributes, 303 attributes)
• Breast cancer Wisconsin (Diagnostic) data set (32 attributes, 569 instances)
• U.S. chronic Disease Indicators (CDI) data set (34 attributes, 523K
instances)
• Chronic_Kidney_Disease data set (25 attributes, 400 instances)
• Thyroid disease dataset (21 attributes, 7200 instances)
• Liver disorders data set (7 attributes, 345 instances)
• ILPD (Indian Liver Patient Dataset) data set (10 attributes, 583 instances)
• Parkinsons Telemonitoring data set (26 attributes, 5875 instances)
• Cervical cancer (Risk Factors) data Set (36 attributes, 858 instances)
15
Outlook
16
References
[1] Mulay, P., & Kulkarni, P. A. (2013). Knowledge augmentation via incremental
clustering: new technology for effective knowledge management. International
Journal of Business Information Systems, 12(1), 68-87.
[2] Kulkarni, P. A., & Mulay, P. (2013). Evolve systems using incremental
clustering approach. Evolving Systems, 4(2), 71-85.
[3] Gaikwad, S. M., Joshi, R. R., & Mulay, P. (2015). Cluster Mapping with the
help of New Extended MCF Algorithm and MCF Algorithm to Recommend an
Ice Cream to the Diabetic Patient. METHODOLOGY, 1(6), 7.
[4] Mulay, P. (2016). Threshold computation to discover cluster structure: a new
approach. International Journal of Electrical and Computer Engineering, 6(1),
275.
[5] Shinde, K., & Mulay, P. (2017, April). Cbica: Correlation based incremental
clustering algorithm, a new approach. In Convergence in Technology (I2CT),
2017 2nd International Conference for (pp. 291-296). IEEE.
[6] Mulay, P., Joshi, R. R., Anguria, A. K., Gonsalves, A., Deepankar, D., & Ghosh,
D. (2017). Threshold Based Clustering Algorithm Analyzes Diabetic Mellitus.
In Proceedings of the 5th International Conference on Frontiers in Intelligent
Computing: Theory and Applications (pp. 27-33). Springer, Singapore.
17
References Contd…
[7] Mali, M., Kulkarni, P., Bagade, V. (2017). Medical Records Clustering: A
Survey. International Journal of Innovative Research in Computer and
Communication Engineering (pp. 9322-9328).
[8] https://archive.ics.uci.edu/ml/index.php
[9] Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall, Inc,
Upper Saddle River.
[10] Shinde, K., & Mulay, P. (2017, April). Cbica: Correlation based
incremental clustering algorithm, a new approach. In Convergence in
Technology (I2CT), 2017 2nd International Conference for (pp. 291-296). IEEE.
[11] Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural
Netw 16:645–678.
[12] Estivill-Castro V (2002) Why so many clustering algorithms: a position paper.
ACMSIGKDD Explor Newsl 4:65–75.
[13] Färber I, Günnemann S, Kriegel H, Kröger P, Müller E, Schubert E, Seidl T,
Zimek A (2010) On using class-labels in evaluation of clusterings. In
MultiClust: 1st international workshop on discovering, summarizing and using
multiple clusterings held in conjunction with KDD, Washington, DC.
18
References Contd…
[14]MacQueen J (1967) Some methods for classification and analysis of
multivariate observations. Proc Fifth Berkeley Symp Math Stat Probab 1:281–
297.
[15] Park H, Jun C (2009) A simple and fast algorithm for K-medoids clustering.
Expert Syst Appl 36:3336–3341.
[16] Kaufman L, Rousseeuw P (1990) Partitioning around medoids (program pam).
Finding groups in data: an introduction to cluster
analysis. Wiley, Hoboken.
[17] Kaufman L, Rousseeuw P (2008) Finding groups in data: an introduction to
cluster analysis, vol 344. Wiley, Hoboken. doi:10.1002/9780470316801.
[18] Ng R, Han J (2002) Clarans: a method for clustering objects for spatial data
mining. IEEE Trans Knowl Data Eng 14:1003–1016.
[19] Johnson S (1967) Hierarchical clustering schemes. Psychometrika 32:241–
254.
[20] Zhang T, Ramakrishnan R, Livny M(1996) BIRCH: an efficient data
clustering method for very large databases. ACM SIGMOD Rec 25:103–104.
[21] Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm
for large databases. ACM SIGMOD Rec 27:73–84 19
References Contd…
[22] Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for
categorical attributes. In: Proceedings of the 15th international conference on
data engineering, pp 512-521.
[23] Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using
dynamic modeling. Computer 32:68–75.
[24] Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering
algorithm. Comput Geosci 10:191–203.
[25] Dave R, Bhaswan K (1992) Adaptive fuzzy c-shells clustering and detection
of ellipses. IEEE Trans Neural Netw 3:643–662.
[26] Yager R, Filev D (1994) Approximate clustering via the mountain method.
IEEE Trans Syst Man Cybern 24:1279–1284.
[27] Xu X, Ester M, Kriegel H, Sander J (1998) A distribution-based clustering
algorithm for mining in large spatial databases. In: Proceedings of the
fourteenth international conference on data engineering, pp 324-331.
[28] Rasmussen C (1999) The infinite Gaussian mixture model. Adv Neural Inf
Process Syst 12:554–560.
20
References Contd…
21
References Contd…
[42] Wu Z, Xie W,Yu J (2003) Fuzzy c-means clustering algorithm based on kernel
method. In: Proceedings of the fifth ICCIMA, pp 49–54.
[43] Ben-Hur A, Horn D, Siegelmann H, Vapnik V (2002) Support vector
clustering. J Mach Learn Res 2:125–137.
[44] Xu L, Neufeld J, Larson B, Schuurmans D (2004) Maximum margin
clustering. In: Advances in neural information processing systems, pp 1537–
1544.
[45] Zhao B, Kwok J, Zhang C (2009) Multiple kernel clustering. In SDM, pp 638–
649.
[46] Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework
for combining multiple partitions. J Mach Learn Res 3:583–617.
[47] Topchy A, Jain A, Punch W (2004) A mixture model for clustering ensembles.
In: Proceedings of the SIAM international conference on data mining, pp 379.
[48] Topchy A, Jain A, Punch W (2005) Clustering ensembles: models of
consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27:1866–
1881.
[49] Handl J, Meyer B (2007) Ant-based and swarm-based clustering. Swarm Intell 23
1:95–113.
References Contd…
[50] Van der Merwe D, Engelbrecht A (2003) Data clustering using particle swarm
optimization. Congr Evol Comput 1:215–220.
[51] Amiri B, Fathian M, Maroosi A (2009) Application of shuffled frog-leaping
algorithm on clustering. Int J Adv Manuf Technol 45:199–209.
[52] Karaboga D, Ozturk C (2011) A novel clustering approach: artificial bee
colony (ABC) algorithm. Appl Soft Comput 11:652–657.
[53] Horn D, Gottlieb A (2001) The method of quantum clustering. In: Advances in
neural information processing systems, pp 769–776.
[54] Weinstein M, Horn D (2009) Dynamic quantum clustering: a method for
visual exploration of structures in data. Phys Rev E 80:066117.
[55] Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans
Pattern Anal Mach Intell 22:888–905.
[56] Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an
algorithm. Adv Neural Inf Process Syst 2:849–856.
[57] Frey BJ, Dueck D (2007) Clustering by passing messages between data points.
Science 315(5814):972–976.
[58] Rodriguez A, Laio A (2014) Clustering by fast search and find of density 24
peaks. Science 344:1492–1496.
References Contd…
[66] Lorbeer, B., Kosareva, A., Deva, B., Softić, D., Ruppel, P., & Küpper, A.
(2017). Variations on the Clustering Algorithm BIRCH. Big Data Research.
[67] He, Y., Tan, H., Luo, W., Mao, H., Ma, D., Feng, S., & Fan, J. (2011,
December). Mr-dbscan: an efficient parallel density-based clustering algorithm
using mapreduce. In Parallel and Distributed Systems (ICPADS), 2011 IEEE
17th International Conference on (pp. 473-480). IEEE.
[68] Patwary, M. A., Palsetia, D., Agrawal, A., Liao, W. K., Manne, F., &
Choudhary, A. (2012, November). A new scalable parallel DBSCAN algorithm
using the disjoint-set data structure. In Proceedings of the International
Conference on High Performance Computing, Networking, Storage and
Analysis (p. 62). IEEE Computer Society Press.
[69] Hu, X., Huang, J., & Qiu, M. (2017, November). A Communication
Efficient Parallel DBSCAN Algorithm based on Parameter Server. In
Proceedings of the 2017 ACM on Conference on Information and Knowledge
Management (pp. 2107-2110). ACM.
[70] Götz, M., Bodenstein, C., & Riedel, M. (2015, November). HPDBSCAN:
highly parallel DBSCAN. In Proceedings of the Workshop on Machine
Learning in High-Performance Computing Environments (p. 2). ACM. 26
[
References Contd…
[71] Lulli, A., Dell'Amico, M., Michiardi, P., & Ricci, L. (2016). NG-
DBSCAN: scalable density-based clustering for arbitrary data. Proceedings of
the VLDB Endowment, 10(3), 157-168.
[72] He, Y., Tan, H., Luo, W., Feng, S., & Fan, J. (2014). MR-DBSCAN: a
scalable MapReduce-based DBSCAN algorithm for heavily skewed data.
Frontiers of Computer Science, 8(1), 83-99.
[73] Andrade, G., Ramos, G., Madeira, D., Sachetto, R., Ferreira, R., &
Rocha, L. (2013). G-dbscan: A gpu accelerated algorithm for density-based
clustering. Procedia Computer Science, 18, 369-378.
[74] Merk, A., Cal, P., & Woźniak, M. (2017, May). Distributed DBSCAN
Algorithm–Concept and Experimental Evaluation. In International Conference
on Computer Recognition Systems (pp. 472-480). Springer, Cham.
[75] Yıldırım, A. A., & Özdoğan, C. (2011). Parallel WaveCluster: A linear
scaling parallel clustering algorithm implementation with application to very
large datasets. Journal of Parallel and Distributed Computing, 71(7), 955-962.
[76] Yıldırım, A. A., & Özdoğan, C. (2011). Parallel wavelet-based clustering
algorithm on GPUs using CUDA. Procedia Computer Science, 3, 396-400.
27
References Contd…
[77] Anggraini, E. L., Suciati, N., & Suadi, W. (2013, June). Parallel
computing of WaveCluster algorithm for face recognition application. In QiR
(Quality in Research), 2013 International Conference on (pp. 56-59). IEEE.
[78] Hadjidoukas, P. E., & Amsaleg, L. (2008). Parallelization of a
hierarchical data clustering algorithm using openmp. In OpenMP Shared
Memory Parallel Programming (pp. 289-299). Springer, Berlin, Heidelberg.
[79] Lathiya, P., & Rani, R. (2016, August). Improved CURE clustering for
big data using Hadoop and Mapreduce. In Inventive Computation Technologies
(ICICT), International Conference on (Vol. 3, pp. 1-5). IEEE.
[80] Maitrey, S., Jha, C. K., Gupta, R., & Singh, J. (2012). Enhancement of
CURE clustering technique in data mining. International Journal of Computer
Applications.
[81] Jakovits, P., & Srirama, S. N. (2013, September). Clustering on the cloud:
Reducing clara to mapreduce. In Proceedings of the Second Nordic Symposium
on Cloud Computing & Internet Technologies (pp. 64-71). ACM.
[82] Wu, J., & Hong, B. (2011, May). An efficient k-means algorithm on
CUDA. In Parallel and Distributed Processing Workshops and Phd Forum
(IPDPSW), 2011 IEEE International Symposium on (pp. 1740-1749). IEEE. 28
References Contd…
[83] Zhang, J., Wu, G., Hu, X., Li, S., & Hao, S. (2011, December). A parallel
k-means clustering algorithm with mpi. In Parallel Architectures, Algorithms
and Programming (PAAP), 2011 Fourth International Symposium on (pp. 60-
64). IEEE.
[84] Wang, B., Yin, J., Hua, Q., Wu, Z., & Cao, J. (2016, August).
Parallelizing k-means- based clustering on spark. In Advanced Cloud and Big
Data (CBD), 2016 International Conference on (pp. 31-36). IEEE.
[85] Mao, Y., Xu, Z., Li, X., & Ping, P. (2015, August). An optimal distributed
K-Means clustering algorithm based on cloudstack. In Information and
Automation, 2015 IEEE International Conference on (pp. 3149-3156). IEEE.
[86] Jin, S., Cui, Y., & Yu, C. (2016). A New Parallelization Method for K-
means. arXiv preprint arXiv:1608.06347.
[87] https://coggle.it/
29