Abstract
We generalize the k-means algorithm presented by the authors [14] and show that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness). We prove these properties for the k-median and the discrete k-means clustering problems, resulting in O(2(k/ε)O(1) dn) time (1+ε)-approximation algorithms for these problems. These are the first algorithms for these problems linear in the size of the input (nd for n points in d dimensions), independent of dimensions in the exponent, assuming k and ε to be fixed. A key ingredient of the k-median result is a (1+ε)-approximation algorithm for the 1-median problem which has running time O(2(1/ε)O(1) d). The previous best known algorithm for this problem had linear running time.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arora, S.: Polynomial time approximation schemes for Euclidean TSP and other geometric problems. In: Proceedings of the 37th Annual Symposium on Foundations of Computer Science, pp. 2–11 (1996)
Arora, S., Raghavan, P., Rao, S.: Approximation schemes for Euclidean k-medians and related problems. In: Proceedings of the thirtieth annual ACM symposium on Theory of computing, pp. 106–113 (1998)
Badoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp. 250–257 (2002)
Bern, M., Eppstein, D.: Approximation algorithms for geometric problems. In: Approximating algorithms for NP-Hard problems, pp. 296–345. PWS Publishing Company (1997)
Broder, A., Glassman, S., Manasse, M., Zweig, G.: Syntactic clustering of the Web. In: Proc. of 6th International World Wide Web Conference, pp. 391–404 (1997)
de la Vega, W.F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pp. 50–58 (2003)
Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2001)
Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and effective querying by image content. Journal of Intelligent Information Systems 3(3), 231–262 (1994)
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pp. 291–300 (2004)
Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In: Proceedings of the tenth annual symposium on Computational Geometry, pp. 332–339 (1994)
Indyk, P.: High Dimensional Computational Geometry. Ph.D. Thesis. Department of Computer Science, Stanford University (2004)
Kolliopoulos, S., Rao, S.: A nearly linear time approximation scheme for the Euclidean k-medians problem. In: Proceedings of the 7th European Symposium on Algorithms, pp. 362–371 (1999)
Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In: Proceedings of the 45th Annual Symposium on Foundations of Computer Science, pp. 454–462 (2004)
Matousek, J.: On approximate geometric k-clustering. Discrete and Computational Geometry 24, 61–84 (2000)
Swain, M.J., Ballard, D.H.: Color indexing. International Journal of Computer Vision, 11–32 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kumar, A., Sabharwal, Y., Sen, S. (2005). Linear Time Algorithms for Clustering Problems in Any Dimensions. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds) Automata, Languages and Programming. ICALP 2005. Lecture Notes in Computer Science, vol 3580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11523468_111
Download citation
DOI: https://doi.org/10.1007/11523468_111
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27580-0
Online ISBN: 978-3-540-31691-6
eBook Packages: Computer ScienceComputer Science (R0)