A new swarm-based efficient data clustering approach using KHM and fuzzy logic

Gupta, Yogesh; Saini, Ashish

doi:10.1007/s00500-018-3514-1

A new swarm-based efficient data clustering approach using KHM and fuzzy logic

Methodologies and Application
Published: 10 September 2018

Volume 23, pages 145–162, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

Yogesh Gupta¹ &
Ashish Saini²

245 Accesses
17 Citations
Explore all metrics

Abstract

Clustering is a useful technique to create different groups of objects on the basis of their nature. Objects of same group are of similar in nature and differ to the objects of other groups. Clustering has proved its importance in various fields such as information retrieval, bioinformatics, image processing and many others. In this paper, particle swarm optimization (PSO) technique is used with K-harmonic means (KHM) for clustering. PSO overcomes the limitations of KHM like local optimum problem. Fuzzy logic is also employed in this paper to make PSO adaptive in nature by controlling various parameters. The performance of the proposed approach is validated on five benchmark datasets in terms of inter-clustering distance, intra-clustering distance, F-measure and fitness value. The results of proposed approach are compared with well-known conventional clustering techniques such as K-means, KHM and fuzzy C-means along with different state-of-the-art clustering approaches. Two text-based benchmark datasets such as CACM and CISI are also used to test the performance of all clustering approaches. The proposed clustering approach gives better results in comparison with other clustering approaches as clear from both the experimental and statistical analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

A Hybrid Clustering Algorithm Based on Fuzzy c-Means and Improved Particle Swarm Optimization

Article 18 November 2014

An Analysis of K-Means, Particle Swarm Optimization and Genetic Algorithm with Data Clustering Technique

Research of improved fuzzy c-means algorithm based on a new metric norm

Article 29 January 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Abraham A, Das S, Konar A (2006) Document clustering using differential evolution. In: Proceedings of the 2006 IEEE congress on evolutionary computation (CEC 2006), Vancouver, pp 1784–1791
Alguliev R, Aliguliyev R (2005) Fast genetic algorithm for clustering of text documents. Artif Intell 3:698–707
Google Scholar
Aliguliyev R (2006) A clustering method for document collections and algorithm for estimation the optimal number of clusters. Artif Intell 4:651–659
Google Scholar
Aupetit S, Monmarché N, Slimane M (2007) Hidden Markov models training by a particle swarm optimization algorithm. J Math Model Algorithms 6:175–193
Article MathSciNet MATH Google Scholar
Azzag H, Venturini G, Oliver A, Guinot C (2007) A hierarchical ant based clustering algorithm and its use in three real-world applications. Eur J Oper Res 179:906–922
Article MATH Google Scholar
Bergh F, Engelbrecht A (2001) Effect of swarm size on cooperative particle swarm optimizers. In: Proceedings of genetic evolutionary computation conference (GECCO-2001), San Francisco, pp 892–899
Bezdek J (1974) Fuzzy mathematics in pattern classification. PhD thesis, Cornell University, Ithaca
Chang P, Liu C, Fan C (2009) Data clustering and fuzzy neural network for sales forecasting: a case study in printed circuit board industry. Knowl-Based Syst 22(5):344–355
Article Google Scholar
Cui X, Potok T, Palathingal P (2005) Document clustering using particle swarm optimization. In: Proceedings of the 2005 IEEE swarm intelligence symposium, Pasadena, pp 186–191
Das S, Abraham A, Konar A (2008a) Automatic clustering with a multi-elitist particle swarm optimization algorithm. Pattern Recogn Lett 29:688–699
Article Google Scholar
Das S, Abraham A, Konar A (2008b) Automatic clustering using an improved differential evolution algorithm. IEEE Trans Syst Man Cybern Part A Syst Hum 38:218–237
Article Google Scholar
ElAlami M (2011) Supporting image retrieval framework with rule base system. Knowl-Based Syst 24(2):331–340
Article Google Scholar
Fraley C, Raftery A (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Article MathSciNet MATH Google Scholar
Garai G, Chaudhuri B (2004) A novel genetic algorithm for automatic clustering. Pattern Recogn Lett 25:173–187
Article Google Scholar
Gath I, Geva G (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 11:773–781
Article MATH Google Scholar
Güngör Z, Ünler A (2008) K-harmonic means data clustering with tabu search method. Appl Math Model 32:1115–1125
Article MATH Google Scholar
Gupta Y, Saini A (2015) An efficient clustering approach based on hybridization of PSO, fuzzy logic and K-harmonic means. In: IEEE workshop on computational intelligence: theories, applications and future directions (WCI). IIT Kanpur
Hadavandi E, Shavandi H, Ghanbari A (2010) Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting. Knowl-Based Syst 23(8):800–808
Article Google Scholar
Hammerly G, Elkan C (2002) Alternatives to the k-means algorithm that find better clusterings. In: Proceedings of the 11th international conference on information and knowledge management, pp 600–607
Han J, Kamber M, Pei P (2006) Data mining: concepts and techniques. Morgan Kaufmann, Los Altos
MATH Google Scholar
Hartmann V (2005) Ant colony optimization and swarm intelligence: evolving agent swarms for clustering and sorting. In: Proceedings of the 2005 conference on genetic and evolutionary computation (GECCO’05), Washington, DC, pp 217–224
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31:264–323
Article Google Scholar
Kalyani S, Swarup K (2011) Particle swarm optimization based K-means clustering approach for security assessment in power systems. Expert Syst Appl 38(9):10839–10846
Article Google Scholar
Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. J Comput 32(8):68–75
Google Scholar
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis, vol 39. Wiley, London
Book MATH Google Scholar
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the 1995 IEEE international conference on neural networks, Englewood Cliffs, pp 1942–1948
Khan M, Khor S (2004) Web document clustering using a hybrid neural network. Appl Soft Comput 4:423–432
Article Google Scholar
Khy S, Ishikawa Y, Kitagawa H (2008) A novelty-based clustering method for on-line documents. World Wide Web 11:1–37
Article Google Scholar
Laszlo M, Mukherjee S (2006) A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering. IEEE Trans Pattern Anal Mach Intell 28:533–543
Article Google Scholar
Laszlo M, Mukherjee S (2007) A genetic algorithm that exchanges neighboring centers for k-means clustering. Pattern Recogn Lett 28:2359–2366
Article Google Scholar
Li Y, Chung S, Holt J (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64(1):381–404
Article Google Scholar
Liao C, Tseng C, Luarn P (2007) A discrete version of particle swarm optimization for flowshop scheduling problems. Comput Oper Res 34:3099–3111
Article MATH Google Scholar
Lin H, Yang F, Kao Y (2005) An efficient GA-based clustering technique. Tamkang J Sci Eng 8(2):113–122
Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: The 5th Berkeley symposium mathematical, statistic and probability, Berkeley
Martin-Guerrero J, Palomares A, Balaguer-Ballester E, Soria-Olivas E, Gomez-Sanchis J, Soriano-Asensi A (2006) Studying the feasibility of a recommender in a citizen web portal based on user modeling and clustering algorithms. Expert Syst Appl 30(2):299–312
Article Google Scholar
Nock R, Nielsen F (2006) On weighting clustering. IEEE Trans Pattern Anal Mach Intell 28:1223–1235
Article Google Scholar
Ponomarenko J, Merkulova T, Orlova G, Fokin O, Gorshkov E, Ponomarenko M (2002) Mining DNA sequences to predict sites which mutations cause genetic diseases. Knowl-Based Syst 15(4):225–233
Article Google Scholar
Sander J, Ester M, Kriegel M, Xu X (1998) Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min Knowl Disc 2(2):169–194
Article Google Scholar
Sebiskveradze D, Vrabie V, Gobinet C, Durlach A, Bernard P, Ly E, Manfait M, Jeannesson P, Piot O (2011) Automation of an algorithm based on fuzzy clustering for analyzing tumoral heterogeneity in human skin carcinoma tissue sections. Lab Invest 91(5):799–811
Article Google Scholar
Shi J, Luo Z (2010) Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples. Comput Biol Med 40(8):723–732
Article Google Scholar
Subramanyam V, Sett S (2008) Knowledge-based image retrieval system. Knowl-Based Syst 21(2):89–100
Article Google Scholar
Suganthan P (1999) Particle swarm optimizer with neighborhood operator. In: Proceedings of IEEE international conference on evolutionary computation, vol 3, pp 1958–1962
Thakare A, Hanchate R (2014) Introducing hybrid model for data clustering using K-harmonic means and Gravitational search algorithms. Int J Comput Appl 88(17):18–22
Google Scholar
Verma N, Roy A (2014) Self-optimal clustering technique using optimized threshold function. IEEE Syst J 8(4):1213–1226
Article Google Scholar
Vesanto W, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11(3):586–600
Article Google Scholar
Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In Proceedings of 23rd international conference on very large databases, Greece, pp 186–195
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Article Google Scholar
Yang F, Sun T, Zhang C (2009) Efficient hybrid data clustering method based on K-harmonic means and particle swarm optimization. Expert Syst Appl 36(6):9847–9852
Article Google Scholar
Zadeh L (1965) Fuzzy sets. Inf Control 8:338–353
Article MATH Google Scholar
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD conference of management of data, Canada, pp 103–114
Zhang B, Hsu M, Dayal U (1999) K-harmonic means—a data clustering algorithm. Technical Report HPL-1999-124, Hewlett-Packard Laboratories
Zhang B, Hsu M, Dayal U (2000) K-harmonic means. In: International workshop on temporal, spatial and spatio-temporal data mining. TSDM 2000, Lyon

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Manipal University, Jaipur, India
Yogesh Gupta
Department of Electrical Engineering, Dayalbagh Educational Institute, Agra, India
Ashish Saini

Authors

Yogesh Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Ashish Saini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashish Saini.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Appendices

Appendix A: K-harmonic means clustering algorithm

This algorithm was proposed by Zhang et al. (1999, 2000). The variants of KHM were also proposed by Hammerly and Elkan (2002). KHM gives dynamic weight to each data point by averaging of the harmonic means of the distance from each data point to all centers. The harmonic average assigns a large weight to a data point that is not close to any centers and a small weight to the data point that is close to one or more centers. Therefore, KHM is less sensitive to the initialization than the K-means. Some notations are described in Table 12 as used in KHM algorithm, before discussing it:

Table 12 Notations and descriptions used in KHM

Full size table

The detail of KHM clustering algorithm is given as follows:

1.
Initially, select the centers randomly.
2.
Determine objective function value by (A.1) as defined below
$$ \text{KHM}\,\left( {X, C} \right) = \sum\limits_{i = 1}^{n} {\frac{k}{{\sum\nolimits_{j = 1}^{k} {\frac{1}{{\left\| {x_{i} - c_{j} } \right\|^{p} }}} }}} $$
(A.1)
where p is an input parameter and typically p ≥ 2.
3.
Compute membership value m (c_j/x_i) in each center c_j for each data point x_i by (A.2) as defined below
$$ m\,\left( {c_{j} , x_{i} } \right) = \frac{{\left\| {x_{i} - c_{j} } \right\|^{ - p - 2} }}{{\sum\nolimits_{j = 1}^{k} {\left\| {x_{i} - c_{j} } \right\|^{ - p - 2} } }}. $$
(A.2)
4.
In this step, compute weights W (x_i) for each data point x_i by (A.3) as follows
$$ W\,\left( {x_{i} } \right) = \frac{{\sum\nolimits_{j = 1}^{k} {\left\| {x_{i} - c_{j} } \right\|^{ - p - 2} } }}{{\left( {\sum\nolimits_{j = 1}^{k} {\left\| {x_{i} - c_{j} } \right\|^{ - p - 2} } } \right)^{2} }}. $$
(A.3)
5.
Now re-compute the locations of each center c_j from all the data points x_i according to their memberships and weights define by (A.4).
$$ c_{j} = \frac{{\sum\nolimits_{i = 1}^{n} {m\left( {c_{j} /x_{i} } \right) w\left( {x_{i} } \right) x_{i} } }}{{\sum\nolimits_{i = 1}^{n} {m\left( {c_{j} /x_{i} } \right) w\left( {x_{i} } \right) } }} $$
(A.4)
6.
Repeat steps 2–5 for predefined number of iterations or until KHM(X, C) does not change significantly.
7.
Assign data point x_i to cluster j with the biggest m(c_j/x_i).

It is demonstrated that KHM is essentially insensitive to the initialization of the centers (Zhang et al. 1999), while it tends to converge to local optima (Güngör and Ünler 2008).

Appendix B: Particle swarm optimization

PSO is an evolutionary approach which has been successfully applied to science and many practical fields (Aupetit et al. 2007; Liao et al. 2007). It is a sociologically inspired optimization algorithm which is based on population. Each particle in PSO represents an individual, and all the particles form a swarm. The solution space for any problem is formulated as a search space in PSO. Each position of search space represents a correlated solution of the problem. Each particle moves according to its velocity. The movement of a particle is computed as (B.1) and (B.2)

$$ x_{i} \left( {t + 1} \right) \leftarrow \, x_{i} \left( {t + 1} \right) + v_{i} \left( t \right) $$

(B.1)

$$ \begin{aligned} v_{i} \left( {t + 1} \right) \, &\leftarrow \omega v_{i} \left( t \right) + c_{1} {\text{rand}}_{1} \left( {pbest_{i} \left( t \right) \, - \, x_{i} \left( t \right)} \right) \\&\quad+ c_{2} {\text{rand}}_{2} \left( {gbest\left( t \right) - \, x_{i} \left( t \right)} \right) \end{aligned} $$

(B.2)

where x_i(t) is the position of particle i at time t, v_i(t) is the velocity of particle i at time t, ω is an inertia weight scaling the previous velocity, rand₁ and rand₂ are random variables between 0 and 1, pbest_i(t) is the best position found by particle i so far, gbest(t) is the best position of whole swarm so far, c₁ and c₂ are two acceleration coefficients that scale the influence of pbest_i(t) and gbest(t), respectively.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, Y., Saini, A. A new swarm-based efficient data clustering approach using KHM and fuzzy logic. Soft Comput 23, 145–162 (2019). https://doi.org/10.1007/s00500-018-3514-1

Download citation

Published: 10 September 2018
Issue Date: 24 January 2019
DOI: https://doi.org/10.1007/s00500-018-3514-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new swarm-based efficient data clustering approach using KHM and fuzzy logic

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Hybrid Clustering Algorithm Based on Fuzzy c-Means and Improved Particle Swarm Optimization

An Analysis of K-Means, Particle Swarm Optimization and Genetic Algorithm with Data Clustering Technique

Research of improved fuzzy c-means algorithm based on a new metric norm

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Appendices

Appendix A: K-harmonic means clustering algorithm

Appendix B: Particle swarm optimization

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A new swarm-based efficient data clustering approach using KHM and fuzzy logic

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Hybrid Clustering Algorithm Based on Fuzzy c-Means and Improved Particle Swarm Optimization

An Analysis of K-Means, Particle Swarm Optimization and Genetic Algorithm with Data Clustering Technique

Research of improved fuzzy c-means algorithm based on a new metric norm

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Appendices

Appendix A: K-harmonic means clustering algorithm

Appendix B: Particle swarm optimization

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation