Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1276958.1277126acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
Article

Multiobjective clustering with automatic k-determination for large-scale data

Published: 07 July 2007 Publication History

Abstract

Web mining - data mining for web data - is a key factor of web technologies. Especially, web behavior mining has attracted a great deal of attention recently. Behavior mining involves analyzing the behavior of users, finding patterns of user behavior, and predicting their subsequent behaviors or interests. Web behavior mining is used in web advertising systems or content recommendation systems. To analyze huge amounts of data, such as web data, data-clustering techniques are usually used. Data clustering is a technique involving the separation of data into groups according to similarity, and is usually used in the first step of data mining. In the present study, we developed a scalable data-clustering algorithm for web mining based on existent evolutionary multiobjective clustering algorithm. To derive clusters, we applied multiobjective clustering with automatic k-determination (MOCK). It has been reported that MOCK shows better performance than k-means, agglutination methods, and other evolutionary clustering algorithms. MOCK can also find the appropriate number of clusters using the information of the trade-off curve. The k-determination scheme of MOCK is powerful and strict. However the computational costs are too high when applied to clustering huge data. In this paper, we propose a scalable automatic k-determination scheme. The proposed scheme reduces Pareto-size and the appropriate number of clusters can usually be determined.

References

[1]
Julia Handl and Joshua Knowles. Multiobjective clustering with automatic determination of the number of clusters, Technical Report No. TR-COMPSYSBIO-2004-02, UMIST, Department of Chemistry, August 2004.
[2]
Julia Handl and Joshua Knowles. Evolutionary Multiobjective Clustering, in Xin Yao et al. (editors), Parallel Problem Solving from Nature (PPSN VIII), pp. 1081--1091, Springer-Verlag, Lecture Notes in Computer Science, Vol. 3242, Berlin, September 2004.
[3]
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM Computing Surveys, 31:264--323, 1999.
[4]
L.Kaufman and P.J.Rousseeuw. Finding Groups in Data. John Wiley & and Sons Inc., New York, NY, 1990.
[5]
Ujjiwal Maulik, Sandhamitra Bandypadhyay. Genetic algorithm-based clustering technique. Pattern Recognition 33, pages 1455--1465, 2000.
[6]
Julia Handl and Jushua Knowles. An investigation of representations and operators for evolutionary data clustering with a variable number of clusters. Parallel Problem Solving from Nature, 2006.
[7]
R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a dataset via the Gap statictic. Technical report, Stanford University, 2000.
[8]
R. C. Prim. Shortest connection networks and some generalizations. Bell System Technical Journal, 36:1389--1401, 1957.
[9]
J. Branke, K. Deb, H. Dierof, and M. Osswald. Finding knees in multi-objective optimization. Proceedings of the Eighth International Conference on Parallel Problem Solving from Nature, Birmingham, UK, 2004.
[10]
Julia Handle, Jusua Knowles. Improvements to the scalability of multiobjective clustering. In Proceedings of the 2005 IEEE Congress on Evolutionary Computation, pages 2372--2379. IEEE Press, Anaheim, CA, 2005.
[11]
E. Zitzler and M. Laumanns and L. Thiele. SPEA2: Improving the Performance of the Strength Pareto Evolutionary Algorithm. Technical Report 103, Computer Engineering and Communication Networks Lab(TLK), Swiss Federal Institute of Technology(ETH)Zurich(2001), May 2001.
[12]
SPEA2 C++ Package, http://mikilab.doshisha.ac.jp/dia/research/mop_ga/archive/index.html
[13]
David W. Corne, Nick R. Jerram, Joshua D, Knowles, Martin J. Oates. PESA-II: Region-based Selection in Evolutionary Multiobjective Optimization. Proceedings of the Genetic and Evolutionary Computation Conference, 2001.
[14]
S. Watanabe. Genetic Algorithm Based on neighborhood Crossover for Multi-Objective Optimization. Doctoral thesis, Doshisha University, JPN, 2003.

Cited By

View all
  • (2024)An adaptive evolutionary multi-objective clustering based on the data properties of the base partitionsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.123102245:COnline publication date: 2-Jul-2024
  • (2023)A Review of Quantum-Inspired Metaheuristic Algorithms for Automatic ClusteringMathematics10.3390/math1109201811:9(2018)Online publication date: 24-Apr-2023
  • (2023)A Diversified Attention Model for Interpretable Multiple ClusteringsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.321869335:9(8852-8864)Online publication date: 1-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '07: Proceedings of the 9th annual conference on Genetic and evolutionary computation
July 2007
2313 pages
ISBN:9781595936974
DOI:10.1145/1276958
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data mining
  2. multi-objective optimization
  3. pattern recognition and classification
  4. speedup technique

Qualifiers

  • Article

Conference

GECCO07
Sponsor:

Acceptance Rates

GECCO '07 Paper Acceptance Rate 266 of 577 submissions, 46%;
Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An adaptive evolutionary multi-objective clustering based on the data properties of the base partitionsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.123102245:COnline publication date: 2-Jul-2024
  • (2023)A Review of Quantum-Inspired Metaheuristic Algorithms for Automatic ClusteringMathematics10.3390/math1109201811:9(2018)Online publication date: 24-Apr-2023
  • (2023)A Diversified Attention Model for Interpretable Multiple ClusteringsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.321869335:9(8852-8864)Online publication date: 1-Sep-2023
  • (2023)A Robust Learning Membership Scaling Fuzzy C-Means Algorithm Based on New Belief PeakIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.328691031:12(4486-4500)Online publication date: 1-Dec-2023
  • (2023)Evolutionary Clustering and Community DetectionHandbook of Evolutionary Machine Learning10.1007/978-981-99-3814-8_6(151-169)Online publication date: 2-Nov-2023
  • (2022)Decision Making in Evolutionary Multiobjective Clustering: A Machine Learning ChallengeIEEE Access10.1109/ACCESS.2022.321985410(117281-117303)Online publication date: 2022
  • (2022)An analysis of the admissibility of the objective functions applied in evolutionary multi-objective clusteringInformation Sciences: an International Journal10.1016/j.ins.2022.08.045610:C(1143-1162)Online publication date: 1-Sep-2022
  • (2022)Machine Learning-Based Decision Making in Evolutionary Multiobjective ClusteringAdvances in Computational Intelligence10.1007/978-3-031-19493-1_10(123-137)Online publication date: 23-Oct-2022
  • (2021)Multi-objective Quantum Moth Flame Optimization for ClusteringEnabling Machine Learning Applications in Data Science10.1007/978-981-33-6129-4_14(193-205)Online publication date: 28-May-2021
  • (2021)A Comprehensive Review of Evaluation and Fitness Measures for Evolutionary Data ClusteringEvolutionary Data Clustering: Algorithms and Applications10.1007/978-981-33-4191-3_2(23-71)Online publication date: 21-Feb-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media