Faster Algorithms for the Constrained k-means Problem

Bhattacharya, Anup; Jaiswal, Ragesh; Kumar, Amit

doi:10.1007/s00224-017-9820-7

Faster Algorithms for the Constrained k-means Problem

Published: 06 November 2017

Volume 62, pages 93–115, (2018)
Cite this article

Theory of Computing Systems Aims and scope Submit manuscript

Anup Bhattacharya¹,
Ragesh Jaiswal¹ &
Amit Kumar¹

841 Accesses
25 Citations
Explore all metrics

Abstract

The classical center based clustering problems such as k-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise in machine learning where the optimal clusters do not follow such a locality property. For instance, consider the r -gather clustering problem where there is an additional constraint that each of the clusters should have at least r points or the capacitated clustering problem where there is an upper bound on the cluster sizes. Consider a variant of the k-means problem that may be regarded as a general version of such problems. Here, the optimal clusters O ₁, ..., O _k are an arbitrary partition of the dataset and the goal is to output k-centers c ₁, ..., c _k such that the objective function ${\sum }_{i = 1}^{k} {\sum }_{x \in O_{i}} ||x - c_{i}||^{2}$ is minimized. It is not difficult to argue that any algorithm (without knowing the optimal clusters) that outputs a single set of k centers, will not behave well as far as optimizing the above objective function is concerned. However, this does not rule out the existence of algorithms that output a list of such k centers such that at least one of these k centers behaves well. Given an error parameter ε > 0, let ℓ denote the size of the smallest list of k-centers such that at least one of the k-centers gives a (1 + ε) approximation w.r.t. the objective function above. In this paper, we show an upper bound on ℓ by giving a randomized algorithm that outputs a list of $2^{\tilde {O}(k/\varepsilon )}$ k-centers. We also give a closely matching lower bound of $2^{\tilde {\Omega }(k/\sqrt {\varepsilon })}$. Moreover, our algorithm runs in time $O \left (n d \cdot 2^{\tilde {O}(k/\varepsilon )} \right )$. This is a significant improvement over the previous result of Ding and Xu (2015) who gave an algorithm with running time O(n d ⋅ (log n)^k ⋅ 2^poly(k/ε)) and output a list of size O((log n)^k ⋅ 2^poly(k/ε)). Our techniques generalize for the k-median problem and for many other settings where non-Euclidean distance measures are involved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Ding and Xu [5] also gave a discussion on such partition algorithms for a number of clustering problems with side constraints.
For any real numbers $a_{1}, ..., a_{m}, ({\sum }_{r} a_{r})^{2}/m \leq {\sum }_{r} {a_{r}^{2}}$.
Please see [9] for a discussion on such distance measures. This work shows how to extend such D ²-sampling based analysis to settings involving such distance measures.

References

Ackermann, M.R., Blömer, J., Sohler, C.: Clustering for metric and nonmetric distance measures. ACM Trans. Algorithms 6, 59,1–59,26 (2010)
Article MathSciNet MATH Google Scholar
Bādoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, STOC ’02, pp. 250–257. ACM, New York (2002)
Chen, K.: On k-median clustering in high dimensions. In: Proceedings of the Seventeenth annual ACM-SIAM Symposium on Discrete Algorithm, SODA ’06, pp. 1177–1185. ACM, New York (2006)
de la Vega, W.F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, STOC ’03, pp. 50–58. ACM, New York (2003)
Ding, H., Jinhui, X.: A unified framework for clustering constrained data without locality property. In: Proceedings of the Twenty-sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’15, pp. 1471–1490 (2015)
Feldman, D., Monemizadeh, M., Sohler, C.: A PTAS for k-means clustering based on weak coresets. In: Proceedings of the Twenty-third Annual Symposium on Computational Geometry, SCG ’07, pp. 11–18. ACM, New York (2007)
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing, STOC ’04, pp. 291–300. ACM, New York (2004)
Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering (extended abstract). In: Proceedings of the Tenth Annual Symposium on Computational Geometry, SCG ’94, pp. 332–339. ACM, New York (1994)
Jaiswal, R., Kumar, A., Sen, S.: A simple D ²-sampling based PTAS for k-means and other clustering problems. Algorithmica 70(1), 22–46 (2014)
Article MathSciNet MATH Google Scholar
Jaiswal, R., Kumar, M., Yadav, P.: Improved analysis of D ²-sampling based PTAS for k-means and other clustering problems. Inf. Process. Lett. 115(2), 100–103 (2015)
Article MathSciNet MATH Google Scholar
Kumar, A., Sabharwal, Y., Sen, S.: Linear-time approximation schemes for clustering problems in any dimensions. J. ACM 57(2), 5,1–5,32 (2010)
Article MathSciNet MATH Google Scholar
Matoušek, J.: On approximate geometric k -clustering. Discret. Comput. Geom. 24(1), 61–84 (2000)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

Ragesh Jaiswal acknowledges the support of ISF-UGC India-Israel joint research grant 2014.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India
Anup Bhattacharya, Ragesh Jaiswal & Amit Kumar

Authors

Anup Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Ragesh Jaiswal
View author publications
You can also search for this author in PubMed Google Scholar
Amit Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ragesh Jaiswal.

Additional information

This article is part of the Topical Collection on Theoretical Aspects of Computer Science

Õ notation hides a $ O({\mathrm{log}}{\frac {k}{\varepsilon}}) $ factor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhattacharya, A., Jaiswal, R. & Kumar, A. Faster Algorithms for the Constrained k-means Problem. Theory Comput Syst 62, 93–115 (2018). https://doi.org/10.1007/s00224-017-9820-7

Download citation

Published: 06 November 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s00224-017-9820-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Faster Algorithms for the Constrained k-means Problem

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improved PTAS for the constrained k-means problem

The Problem K-Means and Given J-Centers: Polynomial Solvability in One Dimension

Parameterized Approximation Algorithms and Lower Bounds for k-Center Clustering and Variants

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Faster Algorithms for the Constrained k-means Problem

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improved PTAS for the constrained k-means problem

The Problem K-Means and Given J-Centers: Polynomial Solvability in One Dimension

Parameterized Approximation Algorithms and Lower Bounds for k-Center Clustering and Variants

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation