Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Independence versus indetermination: basis of two canonical clustering criteria

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

This paper aims at comparing two coupling approaches as basic layers for building clustering criteria, suited for modularizing and clustering very large networks. We briefly use “optimal transport theory” as a starting point, and a way as well, to derive two canonical couplings: “statistical independence” and “logical indetermination”. A symmetric list of properties is provided and notably the so called “Monge’s properties”, applied to contingency matrices, and justifying the \(\otimes \) versus \(\oplus \) notation. A study is proposed, highlighting “logical indetermination”, because it is, by far, lesser known. Eventually we estimate the average difference between both couplings as the key explanation of their usually close results in network clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In 1961 Alan Hoffman (IBM Fellow and US Science Academy member) rediscovered Monges’s observation see Hoffman (1963). Hoffman showed that the Hitchcock-Kantorovich transportation problem can be solved by a very simple approach if its underlying cost matrix satisfies those Monge’s properties.

  2. Factually this is the method of S. Lloyd(1957) rewritten by E.W. Forgy (1965) which corresponds to the oldest version of the K-means really used.

References

  • Ah-Pine J (2007) Sur des aspects algébriques et combinatoires de l’analyse relationnelle: applications en classification automatique, en théorie du choix social et en théorie des tresses. Ph.D. thesis, Paris 6

  • Ah-Pine J (2010) On aggregating binary relations using 0-1 integer linear programming. In: ISAIM, pp 1–10

  • Asano T, Bhattacharya B, Keil M, Yao F (1988) Clustering algorithms based on minimum and maximum spanning trees. In: proceedings of the fourth annual symposium on Computational Geometry, pp 252–257

  • Bertrand P (2021) Transport optimal, matrices de monge et pont relationnel. Ph.D. thesis, Paris 6

  • Bertrand P, Broniatowski M, Marcotorchino JF (2020) Logical indetermination coupling: a method to minimize drawing matches and its applications. https://hal.archives-ouvertes.fr/hal-03086553. Working paper or preprint

  • Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):10008

    Article  MATH  Google Scholar 

  • Burkard RE, Klinz B, Rudolf R (1996) Perspectives of Monge properties in optimization. Disc Appl Math 70:95–161

    Article  MathSciNet  MATH  Google Scholar 

  • Campigotto R, Conde-Céspedes P, Guillaume JL (2014) A generalized and adaptive method for community detection. arXiv preprint arXiv:1406.2518

  • Chen M, Nguyen T, Szymanski BK (2015) A new metric for quality of network community structure. arXiv preprint arXiv:1507.04308

  • Conde-Céspedes P (2013) Modélisations et extensions du formalisme de l’analyse relationnelle mathématique à la modularisation des grands graphes. Ph.D. thesis, Paris 6

  • Coron JL (2017) Quelques exemples de jeux à champ moyen. Ph.D. thesis, Paris Sciences et Lettres (ComUE)

  • Csiszár I et al (1991) Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems. Ann Stat 19(4):2032–2066

    Article  MathSciNet  MATH  Google Scholar 

  • Doreian P, Batagelj V, Ferligoj A (2020) Advances in network clustering and blockmodeling. John Wiley & Sons, Hoboken

    MATH  Google Scholar 

  • Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174

    Article  MathSciNet  Google Scholar 

  • Fortunato S, Barthelemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci 104(1):36–41

    Article  Google Scholar 

  • Fréchet M (1951) Sur les tableaux de corrélations dont les marges sont données. Annales de lUniversité de Lyon. Sect A 14:53–77

    Google Scholar 

  • Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America p 7821-7826

  • Hamilton RS et al (1982) Three-manifolds with positive ricci curvature. J Diff Geom 17(2):255–306

    MathSciNet  MATH  Google Scholar 

  • Hoffman AJ (1963) On simple linear programming problems. Proceedings of the Seventh Symposium in Pure Mathematics of the AMS 317–327

  • Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43

    Article  MATH  Google Scholar 

  • Lancichinetti A, Fortunato S (2011) Limits of modularity maximization in community detection. Physical review E 84(6):066122

    Article  Google Scholar 

  • Marcotorchino JF (1984) Utilisation des comparaisons par paires en statistique des contingences. Publication du Centre Scientifique IBM de Paris et Cahiers du Séminaire Analyse des Données et Processus Stochastiques Université Libre de Bruxelles pp 1–57

  • Marcotorchino JF (1986) Maximal association theory as a tool of research. In: Gaul W, Schader M (eds) Classification as a tool of research. North Holland Amsterdam

  • Marcotorchino JF, Conde-Céspedes P (2013) Optimal transport and minimal trade problem, impacts on relational metrics and applications to large graphs and networks modularity. In: international conference on geometric science of information, pp 169–179. Springer

  • Marcotorchino JF, Michaud P (1979) Optimisation en Analyse Ordinale des, Données. Masson, Paris

    MATH  Google Scholar 

  • Messatfa H (1990) Maximal association for the sum of squares of a contingency table. Revue RAIRO, Recherche Opérationnelle 24:29–47

    MathSciNet  MATH  Google Scholar 

  • Nascimento MC (2014) Community detection in networks via a spectral heuristic based on the clustering coefficient. Disc Appl Math 176:89–99

    Article  MathSciNet  MATH  Google Scholar 

  • Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113

    Article  Google Scholar 

  • Ni CC, Lin YY, Luo F, Gao J (2019) Community detection on networks with Ricci flow. Sci Rep 9(1):1–12

    Google Scholar 

  • Ollivier Y (2009) Ricci curvature of markov chains on metric spaces. J Funct Anal 256(3):810–864

    Article  MathSciNet  MATH  Google Scholar 

  • Opitz O, Paul H (2005) Aggregation of ordinal judgements based on condorcet’s majority rule. Data Analysis and Decision Support. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg

  • Reichardt J, Bornholdt S (2006) Statistical mechanics of community detection. Phys Rev E 74(1):016110

    Article  MathSciNet  Google Scholar 

  • Sklar A (1973) Random variables, joint distribution functions, and copulas. Kybernetika 9(6):449–460

    MathSciNet  MATH  Google Scholar 

  • Steinhaus H (1957) Sur la division des corps matériels en parties. Bull de lacadéemie Polonaise des Sci 4(12):801–804

    MATH  Google Scholar 

  • Stemmelen E (1977) Tableaux déchanges, description et prévision. Cahiers du Bureau Universitaire de Recherche Opérationnelle 28

  • Wilson AG (1967) A statistical theory of spatial distribution models. Transp Res 1:253–269

    Article  Google Scholar 

  • Wilson AG (1969) The use of entropy maximising models. J Transp Econ Policy 3:108–126

    Google Scholar 

Download references

Acknowledgements

We thank the editor and two anonymous referees for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierre Bertrand.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bertrand, P., Broniatowski, M. & Marcotorchino, JF. Independence versus indetermination: basis of two canonical clustering criteria. Adv Data Anal Classif 16, 1069–1093 (2022). https://doi.org/10.1007/s11634-021-00484-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-021-00484-1

Keywords

Mathematics Subject Classification