Accelerating k-Means Clustering with Cover Trees

Lang, Andreas; Schubert, Erich

doi:10.1007/978-3-031-46994-7_13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14289))

Included in the following conference series:

International Conference on Similarity Search and Applications

366 Accesses

Abstract

The k-means clustering algorithm is a popular algorithm that partitions data into k clusters. There are many improvements to accelerate the standard algorithm. Most current research employs upper and lower bounds on point-to-cluster distances and the triangle inequality to reduce the number of distance computations, with only arrays as underlying data structures. These approaches cannot exploit that nearby points are likely assigned to the same cluster. We propose a new k-means algorithm based on the cover tree index, that has relatively low overhead and performs well, for a wider parameter range, than previous approaches based on the k-d tree. By combining this with upper and lower bounds, as in state-of-the-art approaches, we obtain a hybrid algorithm that combines the benefits of tree aggregation and bounds-based filtering.

Part of the work on this paper has been supported by Deutsche Forschungsgemeinschaft (DFG), project number 124020371, within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis”, project A2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information

Using a Set of Triangle Inequalities to Accelerate K-means Clustering

A Unified Framework for Clustering Constrained Data Without Locality Property

Article 20 August 2019

Notes

1.
It is also possible to stop early centers’ movement is below some threshold.

References

Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms, SODA, pp. 1027–1035 (2007)
Google Scholar
Beygelzimer, A., Kakade, S.M., Langford, J.: Cover trees for nearest neighbor. In: International Conference on Machine Learning, ICML, pp. 97–104 (2006). https://doi.org/10.1145/1143844.1143857
Borgelt, C.: Even faster exact k-means clustering. In: International Symposium on Intelligent Data Analysis, IDA, pp. 93–105 (2020). https://doi.org/10.1007/978-3-030-44584-3_8
Bradley, P.S., Fayyad, U.M., Reina, C.: Scaling clustering algorithms to large databases. In: KDD
Google Scholar
Elkan, C.: Using the triangle inequality to accelerate k-means. In: International Conference on Machine Learning, ICML, pp. 147–153 (2003)
Google Scholar
Fichtenberger, H., Gillé, M., Schmidt, M., Schwiegelshohn, C., Sohler, C.: BICO: BIRCH meets coresets for k-means clustering. In: ESA, pp. 481–492 (2013). https://doi.org/10.1007/978-3-642-40450-4_41
Hamerly, G.: Making k-means even faster. In: SIAM Data Mining, SDM, pp. 130–140 (2010). https://doi.org/10.1137/1.9781611972801.12
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002). https://doi.org/10.1109/TPAMI.2002.1017616
Article MATH Google Scholar
Kriegel, H., Schubert, E., Zimek, A.: The (black) art of runtime evaluation: are we comparing algorithms or implementations? Knowl. Inf. Syst. 52(2), 341–378 (2017). https://doi.org/10.1007/s10115-016-1004-2
Article Google Scholar
Lang, A., Schubert, E.: BETULA: fast clustering of large data with improved BIRCH CF-trees. Inf. Syst. 108, 101918 (2022). https://doi.org/10.1016/j.is.2021.101918
Article Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–136 (1982). https://doi.org/10.1109/TIT.1982.1056489
Article MathSciNet MATH Google Scholar
Mahajan, M., Nimbhorkar, P., Varadarajan, K.R.: The planar k-means problem is NP-hard. In: WALCOM: Algorithms and Computation, pp. 274–285 (2009). https://doi.org/10.1007/978-3-642-00202-1_24
Newling, J., Fleuret, F.: Fast k-means with accurate bounds. In: International Conference on Machine Learning, vol. 48, pp. 936–944 (2016)
Google Scholar
Pelleg, D., Moore, A.W.: Accelerating exact k-means algorithms with geometric reasoning. In: KDD, pp. 277–281 (1999) https://doi.org/10.1145/312129.312248
Phillips, S.J.: Acceleration of k-means and related clustering algorithms. In: International Workshop on Algorithm Engineering and Experiments, ALENEX, pp. 166–177 (2002). https://doi.org/10.1007/3-540-45643-0_13
Pollard, D.: Strong consistency of k-means clustering. Ann. Stat. 9(1), 135–140 (1981)
Article MathSciNet MATH Google Scholar
Schubert, E.: A triangle inequality for cosine similarity. In: Similarity Search and Applications, SISAP, pp. 32–44 (2021). https://doi.org/10.1007/978-3-030-89657-7_3
Schubert, E.: Automatic indexing for similarity search in ELKI. In: Similarity Search and Applications, SISAP (2022). https://doi.org/10.1007/978-3-031-17849-8_16
Schubert, E.: Stop using the elbow criterion for k-means and how to choose the number of clusters instead. SIGKDD Explor. 25(1), 36–42 (2023). https://doi.org/10.1145/3606274.3606278
Article Google Scholar
Schubert, E., Lang, A., Feher, G.: Accelerating spherical k-means. In: Similarity Search and Applications, SISAP, pp. 217–231 (2021). https://doi.org/10.1007/978-3-030-89657-7_17
Schubert, E., Zimek, A.: ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI). https://doi.org/10.5281/zenodo.6355684. Zenodo, June 2010
Sculley, D.: Web-scale k-means clustering. In: World Wide Web, WWW, pp. 1177–1178 (2010). https://doi.org/10.1145/1772690.1772862
Steinhaus, H.: Sur la division des corp materiels en parties. Bull. Acad. Polon. Sci 1, 801–804 (1956)
MathSciNet MATH Google Scholar
Yu, Q., Chen, K., Chen, J.: Using a set of triangle inequalities to accelerate k-means clustering. In: Similarity Search and Applications, SISAP, pp. 297–311 (2020). https://doi.org/10.1007/978-3-030-60936-8_23

Download references

Author information

Authors and Affiliations

TU Dortmund University, Dortmund, Germany
Andreas Lang & Erich Schubert

Authors

Andreas Lang
View author publications
You can also search for this author in PubMed Google Scholar
Erich Schubert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Lang .

Editor information

Editors and Affiliations

University of A Coruña, Coruña, Spain
Oscar Pedreira
Pompeu Fabra University, Barcelona, Spain
Vladimir Estivill-Castro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lang, A., Schubert, E. (2023). Accelerating k-Means Clustering with Cover Trees. In: Pedreira, O., Estivill-Castro, V. (eds) Similarity Search and Applications. SISAP 2023. Lecture Notes in Computer Science, vol 14289. Springer, Cham. https://doi.org/10.1007/978-3-031-46994-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-46994-7_13
Published: 27 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46993-0
Online ISBN: 978-3-031-46994-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Accelerating k-Means Clustering with Cover Trees

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information

Using a Set of Triangle Inequalities to Accelerate K-means Clustering

A Unified Framework for Clustering Constrained Data Without Locality Property

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Accelerating k-Means Clustering with Cover Trees

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information

Using a Set of Triangle Inequalities to Accelerate K-means Clustering

A Unified Framework for Clustering Constrained Data Without Locality Property

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation