Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Enhanced Connectivity Validity Measure Based on Outlier Detection for Multi-Objective Metaheuristic Data Clustering Algorithms

Published: 01 January 2022 Publication History

Abstract

Data clustering algorithms experience challenges in identifying data points that are either noise or outlier. Hence, this paper proposes an enhanced connectivity measure based on the outlier detection approach for multi-objective data clustering problems. The proposed algorithm aims to improve the quality of the solution by utilising the local outlier factor method (LOF) with the connectivity validity measure. This modification is applied to select the neighbour data point’s mechanism that can be modified to eliminate such outliers. The performance of the proposed approach is assessed by applying the multi-objective algorithms to eight real-life and seven synthetic two-dimensional datasets. The external validity is evaluated using the F-measure, while the performance assessment matrices are employed to assess the quality of Pareto-optimal sets like the coverage and overall non-dominant vector generation. Our experimental results proved that the proposed outlier detection method has enhanced the performance of the multi-objective data clustering algorithms.

References

[1]
L. M. Abualigah, A. T. Khader, and E. S. Hanandeh, “A new feature selection method to improve the document clustering using particle swarm optimization algorithm,” Journal of Computational Science, vol. 25, pp. 456–466, 2018.
[2]
V. Boeva, “Clustering approaches for dealing with multiple DNA microarray datasets,” Journal of Computational Science, vol. 5, no. 3, pp. 368–376, 2014.
[3]
D. Mustafi, G. Sahoo, and A. Mustafi, “An improved heuristic K-means clustering method using genetic algorithm based initialization,” Advances in Intelligent Systems and Computing, vol. 509, pp. 123–132, 2017.
[4]
M. Garza-Fabre, J. Handl, and J. Knowles, “A new reduced-length genetic representation for evolutionary multiobjective clustering,” Lecture Notes in Computer Science, pp. 236–251, Springer, Berlin, Germany, 2017.
[5]
J. Handl and J. Knowles, “An evolutionary approach to multiobjective clustering,” IEEE Transactions on Evolutionary Computation, vol. 11, no. 1, pp. 56–76, 2007.
[6]
A. Mukhopadhyay, U. Maulik, and S. Bandyopadhyay, “A survey of multiobjective evolutionary clustering,” ACM Computing Surveys, vol. 47, no. 4, pp. 1–46, 2015.
[7]
C. C. Aggarwal, Data Mining, Springer, Switzerland, 2015.
[8]
N. A. Jamil, S. L. Wang, and T. F. Ng, “Self-adaptive differential evolution based on best and mean schemes,” in Proceedings of the 2015 IEEE International Conference on Control System, Computing and Engineering (ICCSCE), pp. 287–292, Penang, Malaysia, November 2015.
[9]
N. A. Harun, M. Makhtar, A. Abd Aziz, Z. A. Zakaria, F. S. Abdullah, and J. A. Jusoh, “The application of apriori algorithm in predicting flood areas,” International Journal of Advanced Science, Engineering and Information Technology, vol. 7, no. 3, p. 763, 2017.
[10]
M. Sammour and Z. Othman, “An agglomerative hierarchical clustering with various distance measurements for ground level ozone clustering in putrajaya, Malaysia,” International Journal of Advanced Science, Engineering and Information Technology, vol. 6, no. 6, p. 1127, 2016.
[11]
G. Gan and M. K.-P. Ng, “k-means clustering with outlier removal,” Pattern Recognition Letters, vol. 90, pp. 8–14, 2017.
[12]
M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “LOF: identifying density-based local outliers,” ACM SIGMOD Record, vol. 29, no. 2, pp. 93–104, 2000.
[13]
N. Malini and M. Pushpa, “Analysis on credit card fraud identification techniques based on KNN and outlier detection,” in Proceedings of the 2017 Third International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB), pp. 255–258, Chennai, India, Febuary 2017.
[14]
G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent, and M. E. Houle, “On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study,” Data Mining and Knowledge Discovery, vol. 30, no. 4, pp. 891–927, 2016.
[15]
L. Wang and X. Deng, “Multimode process fault detection method based on variable local outlier factor,” in Proceedings of the 2017 9th International Conference on Modelling, Identification and Control (ICMIC), pp. 175–180, Kunming, China, July 2017.
[16]
J. Auskalnis, N. Paulauskas, and A. Baskys, “Application of local outlier factor algorithm to detect anomalies in computer network,” Elektronika ir Elektrotechnika, vol. 24, no. 3, pp. 96–99, 2018.
[17]
Y. Yan, L. Cao, C. Kulhman, and E. Rundensteiner, “Distributed local outlier detection in big data,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234, Halifax, NS, Canada, August 2017.
[18]
L. Qi and L. Ting, “Active semi-supervised affinity propagation clustering algorithm based on local outlier factor,” in Proceedings of the 2018 37th Chinese Control Conference (CCC), pp. 9368–9373, Wuhan, China, July 2018.
[19]
S. Seo, S. Park, I. Hwang, and J. Kim, “ADSTREAM: anomaly detection in large-scale data streams using local outlier factor based on micro-cluster,” Advanced Science Letters, vol. 23, no. 10, pp. 10204–10209, 2017.
[20]
S. Das, S. Chaudhuri, and A. K. Das, “Optimal set of overlapping clusters using multi-objective genetic algorithm,” in Proceedings of the 9th International Conference on Machine Learning and Computing 2017, pp. 232–237, Singapore, Febuary 2017.
[21]
J. Prakash and P. K. Singh, “An effective multiobjective approach for hard partitional clustering,” Memetic Computing, vol. 7, no. 2, pp. 93–104, 2015.
[22]
S.-T. Wang, “An analysis of the optimal customer clusters using dynamic multi-objective decision,” International Journal of Information Technology and Decision Making, vol. 17, no. 02, pp. 547–582, 2018.
[23]
Z. Zhou and S. Zhu, “Kernel-based multiobjective clustering algorithm with automatic attribute weighting,” Soft Computing, vol. 22, no. 11, pp. 3685–3709, 2018.
[24]
E. Gajda-Zagórska, R. Schaefer, M. Smołka, D. Pardo, and J. Álvarez-Aramberri, “A multi-objective memetic inverse solver reinforced by local optimization methods,” Journal of Computational Science, vol. 18, pp. 85–94, 2017.
[25]
D. E. Hernández, E. Clemente, G. Olague, and J. L. Briseño, “Evolutionary multi-objective visual cortex for object classification in natural images,” Journal of Computational Science, vol. 17, pp. 216–233, 2016.
[26]
K. K. Bharti and P. K. Singh, “A three-stage unsupervised dimension reduction method for text clustering,” Journal of Computational Science, vol. 5, no. 2, pp. 156–169, 2014.
[27]
J. Handl and J. Knowles, “Evolutionary multiobjective clustering,” Lecture Notes in Computer Science, vol. 3242, pp. 1081–1091, Springer, Berlin, Germany, 2004.
[28]
J. Handl and J. Knowles, “Clustering criteria in multiobjective data clustering,” Lecture Notes in Computer Science, pp. 32–41, Springer, Berlin, Germany, 2012.
[29]
E. Chen and F. Wang, “Dynamic clustering using multi-objective evolutionary algorithm,” in Computational Intelligence and Security, Y. Hao, J. Liu, Y. Wang, Y. Cheung, H. Yin, L. Jiao, J. Ma, and Y.-C. Jiao, Eds., pp. 73–80, Springer, Berlin, Germany, 2005.
[30]
A. Mukhopadhyay and U. Maulik, “Multiobjective approach to categorical data clustering,” in Proceedings of the 2007 IEEE Congress on Evolutionary Computation, pp. 1296–1303, Singapore, September 2007.
[31]
X. Xiaoxue Qian, X. Xiangrong Zhang, L. Licheng Jiao, and W. Wenping Ma, “Unsupervised texture image segmentation using multiobjective evolutionary clustering ensemble algorithm,” in Proceedings of the 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), pp. 3561–3567, Hong Kong, China, June 2008.
[32]
K. S. N. Ripon and M. N. H. Siddique, “Evolutionary multi-objective clustering for overlapping clusters detection,” in Proceedings of the 2009 IEEE Congress on Evolutionary Computation, pp. 976–982, Trondheim, Norway, May 2009.
[33]
A. Mukhopadhyay and U. Maulik, “A multiobjective approach to MR brain image segmentation,” Applied Soft Computing, vol. 11, no. 1, pp. 872–880, 2011.
[34]
O. Kirkland, V. J. Rayward-Smith, and B. de la Iglesia, “A novel multi-objective genetic algorithm for clustering,” in Proceedings of the 12th International Conference on Intelligent Data Engineering and Automated Learning - IDEAL 2011, H. Yin, W. Wang, and V. Rayward-Smith, Eds., pp. 317–326, Springer, Berlin, Germany, Norwich, UK, September 2011, Lecture Notes in Computer Science.
[35]
A. Kishor, P. K. Singh, and J. Prakash, “NSABC: non-dominated sorting based multi-objective artificial bee colony algorithm and its application in data clustering,” Neurocomputing, vol. 216, pp. 514–533, 2016.
[36]
J. Prakash and P. K. Singh, “Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach,” Soft Computing, vol. 23, no. 6, pp. 2083–2100, 2017.
[37]
H. S. Jangwan and A. Negi, “A swarm optimization based power aware clustering strategy for WSNs,” International Journal of Advanced Science, Engineering and Information Technology, vol. 7, no. 1, p. 250, 2017.
[38]
F. De Morsier, D. Tuia, M. Borgeaud, V. Gass, and J.-P. Thiran, “Cluster validity measure and merging system for hierarchical clustering considering outliers,” Pattern Recognition, vol. 48, no. 4, pp. 1478–1489, 2015.
[39]
S. Das, A. Abraham, and A. Konar, “Automatic clustering using an improved differential evolution algorithm,” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 38, no. 1, pp. 218–237, 2008.
[40]
H. M. J. Mustafa, M. Ayob, M. Z. A. Nazri, and G. Kendall, “An improved adaptive memetic differential evolution optimization algorithms for data clustering problems,” PLoS One, vol. 14, no. 5, 2019.
[41]
J. Tang, Z. Chen, A. W.-c. Fu, and D. W. Cheung, “Enhancing effectiveness of outlier detections for low density patterns,” in Advances in Knowledge Discovery and Data Mining, M.-S. Chen, P. S. Yu, and B. Liu, Eds., pp. 535–548, Springer, Berlin, Germany, 2002.
[42]
W. Jin, A. K. H. Tung, J. Han, and W. Wang, “Ranking outliers using symmetric neighborhood relationship,” in Advances in Knowledge Discovery and Data Mining, W.-K. Ng, M. Kitsuregawa, J. Li, and K. Chang, Eds., pp. 577–593, Springer, Berlin, Germany, 2006.
[43]
H.-P. Kriegel, P. Kröger, E. Schubert, and A. Zimek, “LoOP: local outlier probabilities,” in Proceeding of the 18th ACM Conference on Information and knowledge management-CIKM ‘09, p. 1649, Hong Kong, China, November 2009.
[44]
G. Jaradat, M. Ayob, and I. Almarashdeh, “The effect of elite pool in hybrid population-based meta-heuristics for solving combinatorial optimization problems,” Applied Soft Computing, vol. 44, pp. 45–56, 2016.
[45]
E. T. Yassen, M. Ayob, M. Z. A. Nazri, and N. R. Sabar, “An adaptive hybrid algorithm for vehicle routing problems with time windows,” Computers & Industrial Engineering, vol. 113, pp. 382–391, 2017.
[46]
N. R. Sabar, M. Ayob, and G. Kendall, “A hybrid of differential evolution and simulated annealing algorithms for the capacitated arc routing problems,” in Proceedings of the 6th Multidisciplinary International Conference on Scheduling: Theory and Applications (MISTA 2013), pp. 549–554, Ghent, Belgium, August 2013.
[47]
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.
[48]
E. Zitzler, M. Laumanns, and L. Thiele, “SPEA2: improving the strength Pareto evolutionary algorithm,” in Proceedings of the Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems-EUROGEN’2001, pp. 95–100, Athens. Greece, September 2001.
[49]
C. L. Blake and C. J. Merz, UCI Repository of Machine Learning Databases, University of California, Los Angeles, CA, USA, 1998, https://archive.ics.uci.edu/ml/.
[50]
S. Das, A. Abraham, and A. Konar, “Metaheuristic pattern clustering - an overview,” Metaheuristic Clustering, pp. 1–62, Springer, Berlin, Germany, 2009.
[51]
A. Topchy, A. K. Jain, and W. Punch, “A mixture model for clustering ensembles,” in Proceedings of the 2004 SIAM International Conference on Data Mining (SDM), pp. 379–390, Lake Buena Vista, FL, USA, April 2004.
[52]
M.-G. Martínez-Peñaloza, E. Mezura-Montes, N. Cruz-Ramírez, H.-G. Acosta-Mesa, and H.-V. Ríos-Figueroa, “Improved multi-objective clustering with automatic determination of the number of clusters,” Neural Computing & Applications, vol. 28, no. 8, pp. 2255–2275, 2017.
[53]
T. Okabe, Y. Yaochu Jin, and B. Sendhoff, “A critical survey of performance indices for multi-objective optimisation,” in Proceedings of the 2003 Congress on Evolutionary Computation, 2003. CEC ‘03, vol. 2, pp. 878–885, Canberra, ACT, Australia, December 2003.
[54]
E. Zitzler and L. Thiele, “Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach,” IEEE Transactions on Evolutionary Computation, vol. 3, no. 4, pp. 257–271, 1999.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Applied Computational Intelligence and Soft Computing
Applied Computational Intelligence and Soft Computing  Volume 2022, Issue
2022
855 pages
ISSN:1687-9724
EISSN:1687-9732
Issue’s Table of Contents
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Publisher

Hindawi Limited

London, United Kingdom

Publication History

Published: 01 January 2022

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media