Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Bayesian contiguity constrained clustering: spanning trees and dendrograms

Published: 12 January 2024 Publication History

Abstract

Clustering is a well-known and studied problem, one of its variants, called contiguity-constrained clustering, accepts as a second input a graph used to encode prior information about cluster structure by means of contiguity constraints i.e. clusters must form connected subgraphs of this graph. This paper discusses the interest of such a setting and proposes a new way to formalise it in a Bayesian setting, using results on spanning trees to compute exactly a posteriori probabilities of candidate partitions. An algorithmic solution is then investigated to find a maximum a posteriori partition and extract a Bayesian dendrogram from it. The interest of this last tool, which is reminiscent of the classical output of a simple hierarchical clustering algorithm, is analysed. Finally, the proposed approach is demonstrated with experiments on simulated data and real applications. A reference implementation of this work is available in the R package gtclust that accompanies the paper.

References

[1]
Ambroise C, Dehman A, Neuvial P, et al. Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics Algorithms Mol. Biol. 2019 14 1 22
[2]
Angriman, E., Predari, M., van der Grinten, A., et al.: Approximation of the diagonal of a Laplacian’s pseudoinverse for complex network analysis. In: Grandoni, F., Herman, G., Sanders, P. (eds.) 28th Annual European Symposium on Algorithms (ESA 2020), Leibniz International Proceedings in Informatics (LIPIcs), vol. 173, pp. 6:1–6:24. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl, Germany (2020)., https://drops.dagstuhl.de/opus/volltexte/2020/12872
[3]
Anselin L Spatial Econometrics 2001 Hoboken Wiley 310-330
[4]
Assunção RM, Neves MC, Câmara G, et al. Efficient regionalization techniques for socio-economic geographical units using minimum spanning trees Int. J. Geogr. Inf. Sci. 2006 20 7 797-811
[5]
Barry D and Hartigan JA A Bayesian analysis for change point problems J. Am. Stat. Assoc. 1993 88 421 309-319
[6]
Bates, D., Maechler, M.: Matrix: Sparse and Dense Matrix Classes and Methods (2019). https://CRAN.R-project.org/package=Matrix, r package version 1.2-17
[7]
Bellocchi, L., Geroliminis, N.: Shenzhen Whole Day Speeds (2019)., figshare.com/articles/dataset/Shenzhen_whole_day_Speeds/7212230
[8]
Biernacki C, Celeux G, and Govaert G Assessing a mixture model for clustering with the integrated completed likelihood IEEE Trans. Pattern Anal. Mach. Intell. 2000 7 719-725
[9]
Biernacki C, Celeux G, and Govaert G Exact and Monte Carlo calculations of integrated likelihoods for the latent class model J. Stat. Plan. Inference 2010 140 2991-3002
[10]
Binder D Bayesian cluster analysis Biometrika 1978 65 31-38
[11]
Blondel VD, Guillaume JL, Lambiotte R, et al. Fast unfolding of communities in large networks J. Stat. Mech. Theory Exp. 2008 2008 10 P10,008
[12]
Bondy, A., Murty, U.: Graph theory. In: Graduate Texts in Mathematics. Springer, London (2011). https://books.google.fr/books?id=HuDFMwZOwcsC
[13]
Breiman, L.: Classification and Regression Trees. The Wadsworth & Brooks/Cole (1984)
[14]
Cayley A A theorem on trees Quat. J. Math. 1889 23 376-378
[15]
Chavent M, Kuentz-Simonet V, Labenne A, et al. ClustGeo: an R package for hierarchical clustering with spatial constraints Comput. Stat. 2018 33 4 1799-1822
[16]
Chen Y, Davis T, Hager W, et al. Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate ACM Trans. Math. Softw. 2008 35 22:1-22:14
[17]
Côme E, Latouche P, Jouvin N, et al. Hierarchical clustering with discrete latent variable models and the integrated classification likelihood Adv. Data Anal. Classif. 2021
[18]
Davis TA and Hager WW Modifying a sparse Cholesky factorization SIAM J. Matrix Anal. Appl. 1999 20 606-627
[19]
Davis TA and Hager WW Multiple-rank modifications of a sparse Cholesky factorization SIAM J. Matrix Anal. Appl. 2001 22 997-1013
[20]
Davis TA and Hager WW Row modifications of a sparse Cholesky factorization SIAM J. Matrix Anal. Appl. 2005 26 621-639
[21]
Diaconis P and Ylvisaker D Conjugate priors for exponential families Ann. Stat. 1979 7 269-281
[22]
Eddelbuettel D and Balamuta JJ Extending extitR with extitC++: A Brief Introduction to extitRcpp PeerJ Preprints 2017 5 e3188v1
[23]
Eddelbuettel D and Sanderson C RcppArmadillo: accelerating r with high-performance C++ linear algebra Comput. Stat. Data Anal. 2014 71 1054-1063
[24]
Fritsch A and Ickstadt K An improved criterion for clustering based on the posterior similarity matrix Bayesian Anal. 2009
[25]
Gordon A A survey of constrained classification Comput. Stat. Data Anal. 1996 21 17-29
[26]
Gramacy RB and Lee HKH Bayesian treed gaussian process models with an application to computer modeling J. Am. Stat. Assoc. 2008 103 483 1119-1130
[27]
Grimm EC CONISS: a FORTRAN 77 program for stratigraphically constrained analysis by the method of incremental sum of squares Comput. Geosci. 1987 13 13-35
[28]
Guo D Regionalization with dynamically constrained agglomerative clustering and partitioning (redcap) Int. J. Geogr. Inf. Sci. 2008 22 7 801-823
[29]
Haghbayan SA, Geroliminis N, and Akbarzadeh M Community detection in large scale congested urban road networks PLoS ONE 2021 16 11 1-14
[30]
Hartigan J Partition models Commun. Stat. Theory Methods 1990 19 8 2745-2756
[31]
Hayashi, T., Akiba, T., Yoshida, Y.: Efficient algorithms for spanning tree centrality. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16), pp. 3733–3739 (2016)
[32]
Hegarty A and Barry D Bayesian disease mapping using product partition models Stat. Med. 2008 27 19 3868-3893
[33]
Ji Y and Geroliminis N On the spatial partitioning of urban transportation networks Transp. Res. Part B Methodol. 2012 46 1639-1656
[34]
Kirchhoff G Über die auflösung der gleichungen, auf welche man bei der untersuchung der linearen vertheilung galvanischer ströme geführt wird Ann. Phys. 1847 148 497-508
[35]
Lebart, L.: Programme d’agrégation avec contraintes. Cahiers de l’analyse des données 3(3):275–287 ( 1978). http://www.numdam.org/item/CAD_1978__3_3_275_0/
[36]
Masser I and Brown PJB Hierarchical aggregation procedures for interaction data Environ. Plan. A 1975 7 509-523
[37]
Murtagh F A survey of algorithms for contiguity-constrained clustering and related problems Comput. J. 1985 28 82-88
[38]
Murtagh, F., Contreras, P.: Algorithms for hierarchical clustering: an overview. WIREs Data Min. Knowl. Discov. 2(1), 86–97 (2012).
[39]
Openshaw S A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modeling Trans. Inst. Br. Geogr. 1977 2 459-72
[40]
Openshaw S and Rao L Algorithms for reengineering the 1991 census geography Environ. Plan. A 1995 27 425-46
[41]
Page GL and Quintana FA Spatial product partition models Bayesian Anal. 2016 11 1 265-298
[42]
Pebesma E Simple features for R: standardized support for spatial vector data R J. 2018 10 1 439-446
[43]
Peixoto TP Bayesian Stochastic Blockmodeling 2019 Hoboken John Wiley & Sons Ltd 289-332
[44]
Core Team, R.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2019). https://www.R-project.org/
[45]
Randriamihamison N, Vialaneix N, and Neuvial P Applicability and interpretability of ward hierarchical agglomerative clustering with or without contiguity constraints J. Classif. 2020
[46]
Rasmussen, C., Ghahramani, Z.: Infinite mixtures of gaussian process experts. In: Dietterich, T., Becker, S., Ghahramani, Z. (eds.) Proceedings of NIPS, vol. 14. pp. 881–888. MIT Press (2001)
[47]
Schwaller L and Robin S Exact Bayesian inference for off-line change-point detection in tree-structured graphical models Stat. Comput. 2017 27 1331-1345
[48]
Schwaller L and Robin S Exact Bayesian inference for off-line change-point detection in tree-structured graphical models Stat. Comput. 2017 27 1331-1345
[49]
Teixeira LV, Assunção RM, and Loschi RH Bayesian space-time partitioning by sampling and pruning spanning trees J. Mach. Learn. Res. 2019 20 85 1-35
[50]
Traag VA, Waltman L, and Van Eck NJ From Louvain to Leiden: guaranteeing well-connected communities Sci. Rep. 2019 9 1 1-12
[51]
Vinh NX and Epps J Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance J. Mach. Learn. Res. 2010 11 2837-2854
[52]
Wade S and Ghahramani Z Bayesian cluster analysis: point estimation and credible balls (with discussion) Bayesian Anal. 2018 13 2 559-626

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Statistics and Computing
Statistics and Computing  Volume 34, Issue 2
Apr 2024
579 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 12 January 2024
Accepted: 14 December 2023
Received: 16 March 2023

Author Tags

  1. Bayesian model-based clustering
  2. Contiguity constraints
  3. Agglomerative clustering
  4. Dendrogram

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media