Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis

Published: 25 December 2015 Publication History

Abstract

The clustering ensemble technique aims to combine multiple clusterings into a probably better and more robust clustering and has been receiving an increasing attention in recent years. There are mainly two aspects of limitations in the existing clustering ensemble approaches. Firstly, many approaches lack the ability to weight the base clusterings without access to the original data and can be affected significantly by the low-quality, or even ill clusterings. Secondly, they generally focus on the instance level or cluster level in the ensemble system and fail to integrate multi-granularity cues into a unified model. To address these two limitations, this paper proposes to solve the clustering ensemble problem via crowd agreement estimation and multi-granularity link analysis. We present the normalized crowd agreement index (NCAI) to evaluate the quality of base clusterings in an unsupervised manner and thus weight the base clusterings in accordance with their clustering validity. To explore the relationship between clusters, the source aware connected triple (SACT) similarity is introduced with regard to their common neighbors and the source reliability. Based on NCAI and multi-granularity information collected among base clusterings, clusters, and data instances, we further propose two novel consensus functions, termed weighted evidence accumulation clustering (WEAC) and graph partitioning with multi-granularity link analysis (GP-MGLA) respectively. The experiments are conducted on eight real-world datasets. The experimental results demonstrate the effectiveness and robustness of the proposed methods.

References

[1]
L. Xu, A. Krzyzak, E. Oja, Rival penalized competitive learning for clustering analysis, RBF net, and curve detection, IEEE Trans. Neural Netw., 4 (1993) 636-649.
[2]
J. Li, S. Ray, B.G. Lindsay, A nonparametric statistical approach to clustering via mode identification, J. Mach. Learn. Res., 8 (2007) 1687-1723.
[3]
M.-L. Zhang, Z.-H. Zhou, Multi-instance clustering with applications to multi-instance prediction, Appl. Intell., 31 (2009) 47-68.
[4]
F. Zhao, L. Jiao, H. Liu, X. Gao, M. Gong, Spectral clustering with eigenvector selection based on entropy ranking, Neurocomputing, 73 (2010) 1704-1717.
[5]
C.-D. Wang, J.-H. Lai, Energy based competitive learning, Neurocomputing, 74 (2011) 2265-2275.
[6]
M. Li, X. C. Lian, J.T. Kwok, B.L. Lu, Time and space efficient spectral clustering via column sampling, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11), 2011.
[7]
C.-D. Wang, J.-H. Lai, C.Y. Suen, J.-Y. Zhu, Multi-exemplar affinity propagation, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013) 2223-2237.
[8]
C.-D. Wang, J.-H. Lai, D. Huang, W.-S. Zheng, SVStream, IEEE Trans. Knowl. Data Eng., 25 (2013) 1410-1424.
[9]
C.-D. Wang, J.-H. Lai, Position regularized support vector domain description, Pattern Recognit., 46 (2013) 875-884.
[10]
A.K. Jain, Data clustering, Pattern Recognit. Lett., 31 (2010) 651-666.
[11]
S. Vega-Pons, J. Ruiz-Shulcloper, A survey of clustering ensemble algorithms, Int. J. Pattern Recognit. Artif. Intell., 25 (2011) 337-372.
[12]
A. Strehl, J. Ghosh, Cluster ensembles, J. Mach. Learn. Res., 3 (2003) 583-617.
[13]
X.Z. Fern, C.E. Brodley, Solving cluster ensemble problems by bipartite graph partitioning, in: Proceedings of the International Conference on Machine Learning (ICML'04), 2004.
[14]
A.L.N. Fred, A.K. Jain, Combining multiple clusterings using evidence accumulation, IEEE Trans. Pattern Anal. Mach. Intell., 27 (2005) 835-850.
[15]
A. Topchy, A.K. Jain, W. Punch, Clustering ensembles, IEEE Trans. Pattern Anal. Mach. Intell., 27 (2005) 1866-1881.
[16]
S.T. Hadjitodorov, L.I. Kuncheva, L.P. Todorova, Moderate diversity for better cluster ensembles, Inf. Fusion, 7 (2006) 264-275.
[17]
Y. Li, J. Yu, P. Hao, Z. Li, Clustering ensembles based on normalized edges, in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'07), 2007.
[18]
N. Iam-On, T. Boongoen, S. Garrett, Refining pairwise similarity matrix for cluster ensemble problem with cluster relations, in: Proceedings of the International Conference on Discovery Science (ICDS'08), 2008.
[19]
C. Domeniconi, M. Al-Razgan, Weighted cluster ensembles, ACM Trans. Knowl. Discov. Data, 2 (2009) 1-40.
[20]
X. Wang, C. Yang, J. Zhou, Clustering aggregation by probability accumulation, Pattern Recognit., 42 (2009) 668-675.
[21]
S. Mimaroglu, E. Erdil, Combining multiple clusterings using similarity graph, Pattern Recognit., 44 (2011) 694-703.
[22]
N. Iam-On, T. Boongoen, S. Garrett, C. Price, A link-based approach to the cluster ensemble problem, IEEE Trans. Pattern Anal. Mach. Intell., 33 (2011) 2396-2409.
[23]
J. Yi, T. Yang, R. Jin, A.K. Jain, Robust ensemble clustering by matrix completion, in: Proceedings of the IEEE International Conference on Data Mining (ICDM'12), 2012.
[24]
L. Franek, X. Jiang, Ensemble clustering by means of clustering embedding in vector spaces, Pattern Recognit., 47 (2014) 833-842.
[25]
D. Huang, J.-H. Lai, C.-D. Wang, Exploiting the wisdom of crowd: a multi-granularity approach to clustering ensemble, in: Proceedings of the International Conference on Intelligence Science and Big Data Engineering (IScIDE'13), 2013.
[26]
D. Cristofor, D. Simovici, Finding median partitions using information-theoretical-based genetic algorithms, J. Univers. Comput. Sci., 8 (2002) 153-172.
[27]
E. Weiszfeld, F. Plastria, On the point for which the sum of the distances to n given points is minimum, Ann. Oper. Res., 167 (2009) 7-41.
[28]
S. Vega-Pons, J. Correa-Morris, J. Ruiz-Shulcloper, Weighted partition consensus via kernels, Pattern Recognit., 43 (2010) 2712-2724.
[29]
S. Vega-Pons, J. Ruiz-Shulcloper, A. Guerra-Gandón, Weighted association based methods for the combination of heterogeneous partitions, Pattern Recognit. Lett., 32 (2011) 2163-2170.
[30]
T. Li, C. Ding, Weighted consensus clustering, in: Proceedings of the SIAM International Conference on Data Mining (SDM'08), 2008.
[31]
X.Z. Fern, W. Lin, Cluster ensemble selection, Stat. Anal. Data Min., 1 (2008) 128-141.
[32]
S. Wu, T.W.S. Chow, Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density, Pattern Recognit., 37 (2004) 175-188.
[33]
K. Faceli, M.C.P. de Souto, D.S.A. de Araújo, A.C.P.L.F. de Carvalho, Multi objective clustering ensemble for gene expression data analysis, Neurocomputing, 72 (2009) 2763-2774.
[34]
N. Li, L. J. Latecki, Clustering aggregation as maximum-weight independent set, in: Advances in Neural Information Processing Systems (NIPS'12), 2012.
[35]
J. Surowiecki, The Wisdom of Crowds: Why The Many Are Smarter Than The Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations, Anchor Books, New York, 2004.
[36]
M. Levandowsky, D. Winter, Distance between sets, Nature, 234 (1971) 34-35.
[37]
Z. Li, X.-M. Wu, S.-F. Chang, Segmentation using superpixels: a bipartite graph partitioning approach, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'12), 2012.
[38]
K. Bache, M. Lichman, UCI Machine Learning Repository, 2013.
[39]
D. Huang, J.-H. Lai, C.-D. Wang, Incremental support vector clustering with outlier detection, in: Proceedings of the International Conference on Pattern Recognition (ICPR'12), 2012.

Cited By

View all
  • (2024)An improved weighted ensemble clustering based on two-tier uncertainty measurementExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121672238:PAOnline publication date: 15-Mar-2024
  • (2024)PCS-granularity weighted ensemble clustering via Co-association matrixApplied Intelligence10.1007/s10489-024-05368-354:5(3884-3901)Online publication date: 1-Mar-2024
  • (2023)Fuzzy-Rough induced spectral ensemble clusteringJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-22389745:1(1757-1774)Online publication date: 1-Jan-2023
  • Show More Cited By
  1. Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Neurocomputing
      Neurocomputing  Volume 170, Issue C
      December 2015
      466 pages

      Publisher

      Elsevier Science Publishers B. V.

      Netherlands

      Publication History

      Published: 25 December 2015

      Author Tags

      1. Graph partitioning with multi-granularity link analysis
      2. Weighted clustering ensemble
      3. Weighted consensus clustering
      4. Weighted evidence accumulation clustering

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 17 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)An improved weighted ensemble clustering based on two-tier uncertainty measurementExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121672238:PAOnline publication date: 15-Mar-2024
      • (2024)PCS-granularity weighted ensemble clustering via Co-association matrixApplied Intelligence10.1007/s10489-024-05368-354:5(3884-3901)Online publication date: 1-Mar-2024
      • (2023)Fuzzy-Rough induced spectral ensemble clusteringJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-22389745:1(1757-1774)Online publication date: 1-Jan-2023
      • (2022)The Core Cluster-Based Subspace Weighted Clustering EnsembleWireless Communications & Mobile Computing10.1155/2022/79909692022Online publication date: 1-Jan-2022
      • (2022)DenMG: Density-Based Member Generation for Ensemble ClusteringWorkshop Proceedings of the 51st International Conference on Parallel Processing10.1145/3547276.3548520(1-7)Online publication date: 29-Aug-2022
      • (2022)Method of sensitive data mining based on Pan-Bull algebraWireless Networks10.1007/s11276-021-02725-928:6(2733-2741)Online publication date: 1-Aug-2022
      • (2022)The Calculation Method of the Network Security Probability of the Multi-rail Division Based on Fuzzy InferenceMobile Networks and Applications10.1007/s11036-022-01921-x27:4(1368-1377)Online publication date: 1-Aug-2022
      • (2021)Design of Network Distance Education System Based on ASP under the Background of Massive Open Online Course2021 2nd International Conference on Computers, Information Processing and Advanced Education10.1145/3456887.3457462(1063-1069)Online publication date: 25-May-2021
      • (2021)Feature weighted dual random sampling cluster EnsembleProceedings of the 2021 5th International Conference on Machine Learning and Soft Computing10.1145/3453800.3453811(54-59)Online publication date: 29-Jan-2021
      • (2021)Direct Alignment Maximization for Clustering EnsembleProceedings of the 2021 5th International Conference on Machine Learning and Soft Computing10.1145/3453800.3453808(38-43)Online publication date: 29-Jan-2021
      • Show More Cited By

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media