Abstract
This paper aims at comparing two coupling approaches as basic layers for building clustering criteria, suited for modularizing and clustering very large networks. We briefly use “optimal transport theory” as a starting point, and a way as well, to derive two canonical couplings: “statistical independence” and “logical indetermination”. A symmetric list of properties is provided and notably the so called “Monge’s properties”, applied to contingency matrices, and justifying the \(\otimes \) versus \(\oplus \) notation. A study is proposed, highlighting “logical indetermination”, because it is, by far, lesser known. Eventually we estimate the average difference between both couplings as the key explanation of their usually close results in network clustering.
Similar content being viewed by others
Notes
In 1961 Alan Hoffman (IBM Fellow and US Science Academy member) rediscovered Monges’s observation see Hoffman (1963). Hoffman showed that the Hitchcock-Kantorovich transportation problem can be solved by a very simple approach if its underlying cost matrix satisfies those Monge’s properties.
Factually this is the method of S. Lloyd(1957) rewritten by E.W. Forgy (1965) which corresponds to the oldest version of the K-means really used.
References
Ah-Pine J (2007) Sur des aspects algébriques et combinatoires de l’analyse relationnelle: applications en classification automatique, en théorie du choix social et en théorie des tresses. Ph.D. thesis, Paris 6
Ah-Pine J (2010) On aggregating binary relations using 0-1 integer linear programming. In: ISAIM, pp 1–10
Asano T, Bhattacharya B, Keil M, Yao F (1988) Clustering algorithms based on minimum and maximum spanning trees. In: proceedings of the fourth annual symposium on Computational Geometry, pp 252–257
Bertrand P (2021) Transport optimal, matrices de monge et pont relationnel. Ph.D. thesis, Paris 6
Bertrand P, Broniatowski M, Marcotorchino JF (2020) Logical indetermination coupling: a method to minimize drawing matches and its applications. https://hal.archives-ouvertes.fr/hal-03086553. Working paper or preprint
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):10008
Burkard RE, Klinz B, Rudolf R (1996) Perspectives of Monge properties in optimization. Disc Appl Math 70:95–161
Campigotto R, Conde-Céspedes P, Guillaume JL (2014) A generalized and adaptive method for community detection. arXiv preprint arXiv:1406.2518
Chen M, Nguyen T, Szymanski BK (2015) A new metric for quality of network community structure. arXiv preprint arXiv:1507.04308
Conde-Céspedes P (2013) Modélisations et extensions du formalisme de l’analyse relationnelle mathématique à la modularisation des grands graphes. Ph.D. thesis, Paris 6
Coron JL (2017) Quelques exemples de jeux à champ moyen. Ph.D. thesis, Paris Sciences et Lettres (ComUE)
Csiszár I et al (1991) Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems. Ann Stat 19(4):2032–2066
Doreian P, Batagelj V, Ferligoj A (2020) Advances in network clustering and blockmodeling. John Wiley & Sons, Hoboken
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Fortunato S, Barthelemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci 104(1):36–41
Fréchet M (1951) Sur les tableaux de corrélations dont les marges sont données. Annales de lUniversité de Lyon. Sect A 14:53–77
Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America p 7821-7826
Hamilton RS et al (1982) Three-manifolds with positive ricci curvature. J Diff Geom 17(2):255–306
Hoffman AJ (1963) On simple linear programming problems. Proceedings of the Seventh Symposium in Pure Mathematics of the AMS 317–327
Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43
Lancichinetti A, Fortunato S (2011) Limits of modularity maximization in community detection. Physical review E 84(6):066122
Marcotorchino JF (1984) Utilisation des comparaisons par paires en statistique des contingences. Publication du Centre Scientifique IBM de Paris et Cahiers du Séminaire Analyse des Données et Processus Stochastiques Université Libre de Bruxelles pp 1–57
Marcotorchino JF (1986) Maximal association theory as a tool of research. In: Gaul W, Schader M (eds) Classification as a tool of research. North Holland Amsterdam
Marcotorchino JF, Conde-Céspedes P (2013) Optimal transport and minimal trade problem, impacts on relational metrics and applications to large graphs and networks modularity. In: international conference on geometric science of information, pp 169–179. Springer
Marcotorchino JF, Michaud P (1979) Optimisation en Analyse Ordinale des, Données. Masson, Paris
Messatfa H (1990) Maximal association for the sum of squares of a contingency table. Revue RAIRO, Recherche Opérationnelle 24:29–47
Nascimento MC (2014) Community detection in networks via a spectral heuristic based on the clustering coefficient. Disc Appl Math 176:89–99
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Ni CC, Lin YY, Luo F, Gao J (2019) Community detection on networks with Ricci flow. Sci Rep 9(1):1–12
Ollivier Y (2009) Ricci curvature of markov chains on metric spaces. J Funct Anal 256(3):810–864
Opitz O, Paul H (2005) Aggregation of ordinal judgements based on condorcet’s majority rule. Data Analysis and Decision Support. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg
Reichardt J, Bornholdt S (2006) Statistical mechanics of community detection. Phys Rev E 74(1):016110
Sklar A (1973) Random variables, joint distribution functions, and copulas. Kybernetika 9(6):449–460
Steinhaus H (1957) Sur la division des corps matériels en parties. Bull de lacadéemie Polonaise des Sci 4(12):801–804
Stemmelen E (1977) Tableaux déchanges, description et prévision. Cahiers du Bureau Universitaire de Recherche Opérationnelle 28
Wilson AG (1967) A statistical theory of spatial distribution models. Transp Res 1:253–269
Wilson AG (1969) The use of entropy maximising models. J Transp Econ Policy 3:108–126
Acknowledgements
We thank the editor and two anonymous referees for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bertrand, P., Broniatowski, M. & Marcotorchino, JF. Independence versus indetermination: basis of two canonical clustering criteria. Adv Data Anal Classif 16, 1069–1093 (2022). https://doi.org/10.1007/s11634-021-00484-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-021-00484-1
Keywords
- Correlation clustering
- Mathematical relational analysis
- Logical indetermination
- Coupling functions
- Optimal transport
- Graph theoretical approaches