Local equivalences of distances between clusterings—a geometric perspective

Meilă, Marina

doi:10.1007/s10994-011-5267-2

Local equivalences of distances between clusterings—a geometric perspective

Published: 17 December 2011

Volume 86, pages 369–389, (2012)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Local equivalences of distances between clusterings—a geometric perspective

Download PDF

Marina Meilă¹

753 Accesses
19 Citations
Explore all metrics

Abstract

In comparing clusterings, several different distances and indices are in use. We prove that the Misclassification Error distance, the Hamming distance (equivalent to the unadjusted Rand index), and the χ ² distance between partitions are equivalent in the neighborhood of 0. In other words, if two partitions are very similar, then one distance defines upper and lower bounds on the other and viceversa. The proofs are geometric and rely on the concavity of the distances. The geometric intuitions themselves advance the understanding of the space of all clusterings. To our knowledge, this is the first result of its kind.

Practically, distances are frequently used to compare two clusterings of a set of observations. But the motivation for this work is in the theoretical study of data clustering. Distances between partitions are involved in constructing new methods for cluster validation, determining the number of clusters, and analyzing clustering algorithms. From a probability theory point of view, the present results apply to any pair of finite valued random variables, and provide simple yet tight upper and lower bounds on the χ ² measure of (in)dependence valid when the two variables are strongly dependent.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Bach, F., & Jordan, M. I. (2006). Learning spectral clustering with applications to speech separation. Journal of Machine Learning Research, 7, 1963–2001.
MathSciNet MATH Google Scholar
Ben-David, S., von Luxburg, U., & Pal, D. (2006). A sober look at clustering stability. In 19th annual conference on learning theory, COLT 2006. Berlin: Springer.
Google Scholar
Candès, E. J., & Tao, T. (2005). The Dantzig selector: statistical estimation when p is much larger than n. Annals of Statistics, 35, 2313–2351.
Article Google Scholar
Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20, 273–297.
MATH Google Scholar
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York: Wiley.
Book MATH Google Scholar
Donoho, D. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289–1306.
Article MathSciNet Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Article Google Scholar
Lancaster, H. (1969). The Chi-squared distribution. New York: Wiley.
MATH Google Scholar
Meilă, M. (2005). Comparing clusterings—an axiomatic view. In S. Wrobel & L. De Raedt (Eds.), Proceedings of the international machine learning conference (ICML). New York: ACM Press.
Google Scholar
Meilă, M. (2006). The uniqueness of a good optimum for K-means. In A. Moore & W. Cohen (Eds.), Proceedings of the international machine learning conference (ICML) (pp. 625–632). Princeton: International Machine Learning Society.
Chapter Google Scholar
Meilă, M. (2007). Comparing clusterings—an information based distance. Journal of Multivariate Analysis, 98(5), 873–895.
Article MathSciNet MATH Google Scholar
Meilă, M., Shortreed, S., & Xu, L. (2005). Regularized spectral learning. In R. Cowell & Z. Ghahramani (Eds.), Proceedings of the artificial intelligence and statistics workshop (AISTATS 05).
Google Scholar
Mirkin, B. G. (1996). Mathematical classification and clustering. Dordrecht: Kluwer Academic.
Book MATH Google Scholar
Papadimitriou, C., & Steiglitz, K. (1998). Combinatorial optimization. Algorithms and complexity. Minneola: Dover.
MATH Google Scholar
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.
Article Google Scholar
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.
Google Scholar
Srebro, N., Shakhnarovich, G., & Roweis, S. (2006). An investigation of computational and informational limits in Gaussian mixture clustering. In Proceedings of the 23rd international conference on machine learning (ICML).
Google Scholar
Vajda, I. (1989). Theory of statistical inference and information. Theory and decision library. Series B: Mathematical and statistical methods. Norwell: Kluwer Academic Publishers.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Washington, Box 354322, Seattle, WA, 98195-4322, USA
Marina Meilă

Authors

Marina Meilă
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marina Meilă.

Additional information

Editor: Carla Brodley.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Meilă, M. Local equivalences of distances between clusterings—a geometric perspective. Mach Learn 86, 369–389 (2012). https://doi.org/10.1007/s10994-011-5267-2

Download citation

Received: 20 February 2006
Accepted: 10 October 2011
Published: 17 December 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s10994-011-5267-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Local equivalences of distances between clusterings—a geometric perspective

Abstract

Article PDF

Similar content being viewed by others

Some Multivariate Measures Based on Distances and Their Entropy Versions

Dynamic Similarity and Distance Measures Based on Quantiles

Generalization of clustering agreements and distances for overlapping clusters and network communities

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Local equivalences of distances between clusterings—a geometric perspective

Abstract

Article PDF

Similar content being viewed by others

Some Multivariate Measures Based on Distances and Their Entropy Versions

Dynamic Similarity and Distance Measures Based on Quantiles

Generalization of clustering agreements and distances for overlapping clusters and network communities

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation