Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-981-97-4677-4_38guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Sparse Binary Data Clustering Method for Transaction Data

Published: 10 July 2024 Publication History

Abstract

Cluster analysis is an important approach in data mining, yet binary data has received less attention compared to categorical and numerical data. Thus, this study intends to address this gap by concentrating on the clustering of sparse binary data with high dimensions. It introduces a novel method that integrates the sparse binary dimensionality reduction method with a particle swarm optimization (PSO) algorithm-based fuzzy K-modes algorithm. The proposed approach not only achieves dimensionality reduction but also explores the optimal cluster centroids through the application of the PSO algorithm. Furthermore, both within-cluster variance and maximum entropy are combined as the objective during the clustering process. To evaluate the performance of the proposed method, a case study involving transaction data clustering is undertaken. The empirical findings demonstrate the superiority of the proposed method over existing ones when applied to sparse binary data, as evidenced by improved fitness values.

References

[1]
Tan P-N, Steinbach MS, Karpatne A, and Kumar V Introduction to Data Mining 2019 NY Pearson Education Inc
[2]
James, B.T., Luczak, B.B., Girgis, H.Z.: MeShClust: an intelligent tool for clustering DNA sequences. Nucleic Acids Res. 46(14) (2018)
[3]
Kakushadze Z and Yu W K-means and cluster models for cancer signatures Biomol. Detect. Quantification 2017 13 7-31
[4]
Mungle S, Benyoucef L, Son YJ, and Tiwari MK A fuzzy clustering-based genetic algorithm approach for time-cost-quality trade-off problems: a case study of highway construction project Eng. Appl. Artif. Intell. 2013 26 1953-1966
[5]
Kuo RJ, Potti Y, and Zulvia FE Application of metaheuristic based fuzzy k-modes algorithm to supplier clustering Comput. Ind. Eng. 2018 120 298-307
[6]
Li H-J, Bu Z, Wang Z, and Cao J Dynamical clustering in electronic commerce systems via optimization and leadership expansion IEEE Trans. Industr. Inf. 2020 16 5327-5334
[7]
Jain AK, Murty MN, and Flynn PJ Data clustering: a review ACM Comput. Surv. 1999 31 3 264-323
[8]
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA (1967)
[9]
Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), pp. 1–8 (1997b)
[10]
Huang Z Extensions to the k-means algorithm for clustering large data sets with categorical values Data Min. Knowl. Disc. 1998 2 3 283-304
[11]
Bezdek JC, Ehrlich R, and Full WE FCM: The fuzzy c-means clustering algorithm Comput. Geosci. 1984 10 191-203
[12]
Huang JZ and Ng MK A fuzzy k-modes algorithm for clustering categori-cal data IEEE Trans. Fuzzy Syst. 1999 7 446-452
[13]
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the First Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 21–34 (1997a))
[14]
Ji J, Pang W, Zhou C, Han X, and Wang Z A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data Knowl.-Based Syst. 2012 30 129-135
[15]
Katz SM Estimation of probabilities from sparse data for the language model component of a speech recognizer IEEE Trans. Acoust. Speech Signal Process. 1987 35 400-401
[16]
Contu G and Frigau L Mola F, Conversano C, and Vichi M Comparison of cluster analysis approaches for binary data Classification, (Big) Data Analysis and Statistical Learning 2018 Cham Springer 155-162
[17]
Pratap, R., Kulkarni, R., Sohony, I.: Efficient dimensionality reduction for sparse binary data. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 152–157 (2018)
[18]
Maaten LVD and Hinton GE Visualizing data using t-SNE J. Mach. Learn. Res. 2008 9 2579-2605
[19]
Śmieja, M., Nakoneczny, S., Tabor, J.: Fast entropy clustering of sparse high dimensional binary data. In: Proceedings of 2016 International Joint Conference on Neural Networks (IJCNN), pp. 2397–2404 (2016)
[20]
Talbi E-G Metaheuristics: From Design to Implementation 2009 Hoboken Wiley
[21]
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks. IV, 1, pp. 942–1948 (1995)
[22]
Clerc, M.: Particle swarm optimization. International Scientific and Technical Encyclopedia, (2006)
[23]
Soliman OS and Rassem A Huang T, Zeng Z, Li C, and Leung CS A bio inspired estimation of distribution algorithm for global optimization Neural Information Processing 2012 Heidelberg Springer 645-652
[24]
Nguyen TPQ and Kuo RJ Automatic fuzzy clustering using non-dominated sorting particle swarm optimization algorithm for categorical data IEEE Access 2019 7 99721-99734
[25]
Kuo RJ, Zheng Y, and Quyên NÅTK Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering Inf. Sci. 2021 557 1-15
[26]
Daqing, C.: Online Retail. UCI Machine Learning Repository (2015).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Advances and Trends in Artificial Intelligence. Theory and Applications: 37th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2024, Hradec Kralove, Czech Republic, July 10–12, 2024, Proceedings
Jul 2024
524 pages
ISBN:978-981-97-4676-7
DOI:10.1007/978-981-97-4677-4
  • Editors:
  • Hamido Fujita,
  • Richard Cimler,
  • Andres Hernandez-Matamoros,
  • Moonis Ali

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 10 July 2024

Author Tags

  1. Sparse binary data clustering
  2. Binary dimension reduction model
  3. Fuzzy K-modes algorithm

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media