Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3188745.3188882acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Constant approximation for k-median and k-means with outliers via iterative rounding

Published: 20 June 2018 Publication History

Abstract

In this paper, we present a new iterative rounding framework for many clustering problems. Using this, we obtain an (α1 + є ≤ 7.081 + є)-approximation algorithm for k-median with outliers, greatly improving upon the large implicit constant approximation ratio of Chen. For k-means with outliers, we give an (α2+є ≤ 53.002 + є)-approximation, which is the first O(1)-approximation for this problem. The iterative algorithm framework is very versatile; we show how it can be used to give α1- and (α1 + є)-approximation algorithms for matroid and knapsack median problems respectively, improving upon the previous best approximations ratios of 8 due to Swamy and 17.46 due to Byrka et al. The natural LP relaxation for the k-median/k-means with outliers problem has an unbounded integrality gap. In spite of this negative result, our iterative rounding framework shows that we can round an LP solution to an almost-integral solution of small cost, in which we have at most two fractionally open facilities. Thus, the LP integrality gap arises due to the gap between almost-integral and fully-integral solutions. Then, using a pre-processing procedure, we show how to convert an almost-integral solution to a fully-integral solution losing only a constant-factor in the approximation ratio. By further using a sparsification technique, the additive factor loss incurred by the conversion can be reduced to any є > 0.

Supplementary Material

MP4 File (5a-5.mp4)

References

[1]
Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and euclidean k-median by primal-dual algorithms. Proceedings, IEEE Symposium on Foundations of Computer Science (FOCS), abs/1612.07925, 2017.
[2]
Sanjeev Arora, Prabhakar Raghavan, and Satish Rao. Approximation schemes for euclidean k-medians and related problems. In Proceedings of STOC, STOC ’98, pages 106–113, New York, NY, USA, 1998. ACM.
[3]
David Arthur, Bodo Manthey, and Heiko Röglin. Smoothed analysis of the k-means method. J. ACM, 58(5):19:1–19:31, 2011.
[4]
David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In Proceedings of ACM-SIAM SODA 2007.
[5]
Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristic for k-median and facility location problems. In Proceedings of STOC 2001.
[6]
Pranjal Awasthi, Avrim Blum, and Or Sheffet. Stability yields a PTAS for kmedian and k-means clustering. In Proceedings of FOCS 2010, pages 309–318. IEEE Computer Society, 2010.
[7]
Maria-Florina Balcan, Avrim Blum, and Anupam Gupta. Clustering under approximation stability. J. ACM, 60(2):8:1–8:34, 2013.
[8]
Jaroslaw Byrka. An optimal bifactor approximation algorithm for the metric uncapacitated facility location problem. In APPROX/RANDOM 2007, Princeton, NJ, USA, Proceedings, pages 29–43, 2007.
[9]
Jaroslaw Byrka, Thomas Pensyl, Bartosz Rybicki, Joachim Spoerhase, Aravind Srinivasan, and Khoa Trinh. An improved approximation algorithm for knapsack median using sparsification. In Proceedings of ESA 2015, pages 275–287, 2015.
[10]
Jaroslaw Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median and positive correlation in budgeted optimization. ACM Trans. Algorithms, 13(2):23:1–23:31, March 2017.
[11]
M. Charikar, S. Guha, D. Shmoys, and E. Tardos. A constant-factor approximation algorithm for the k-median problem. ACM Symp. on Theory of Computing (STOC), 1999.
[12]
M. Charikar, S. Khuller, D. M. Mount, and G. Narasimhan. Algorithms for facility location problems with outliers. Proceedings, ACM-SIAM Symposium on Discrete Algorithms (SODA), 2001.
[13]
Moses Charikar and Sudipto Guha. Improved combinatorial algorithms for the facility location and k-median problems. In Proceedings of FOCS 1999.
[14]
Moses Charikar and Shi Li. A dependent lp-rounding approach for the k-median problem. In Proceedings of ICALP 2012.
[15]
Sanjay Chawla and Aristides Gionis. k-means–: A unified approach to clustering and outlier detection. In Proceedings of the 13th SIAM International Conference on Data Mining, May 2-4, 2013. Austin, Texas, USA., pages 189–197, 2013.
[16]
Ke Chen. A constant factor approximation algorithm for k-median clustering with outliers. In Proceedings of ACM-SIAM SODA 2008.
[17]
Fabián A. Chudak and David B. Shmoys. Improved approximation algorithms for the uncapacitated facility location problem. SIAM J. Comput., 33(1):1–25, 2003.
[18]
V. Cohen-Addad and C. Schwiegelshohn. On the Local Structure of Stable Clustering Instances. Proceedings, IEEE Symposium on Foundations of Computer Science (FOCS), October 2017.
[19]
Vincent Cohen-Addad, Philip N. Klein, and Claire Mathieu. The power of local search for clustering. Proceedings, IEEE Symposium on Foundations of Computer Science (FOCS), abs/1603.09535, 2016.
[20]
Zachary Friggstad, Kamyar Khodamoradi, Mohsen Rezapour, and Mohammad R. Salavatipour. Approximation schemes for clustering with outliers. Proceedings, ACM-SIAM Symposium on Discrete Algorithms (SODA), abs/1707.04295, 2018.
[21]
Zachary Friggstad, Mohsen Rezapour, and Mohammad R. Salavatipour. Local search yields a PTAS for k-means in doubling metrics. Proceedings, IEEE Symposium on Foundations of Computer Science (FOCS), abs/1603.08976, 2016.
[22]
Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility location algorithms. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’98, pages 649–657, Philadelphia, PA, USA, 1998. Society for Industrial and Applied Mathematics.
[23]
Shalmoli Gupta, Ravi Kumar, Kefu Lu, Benjamin Moseley, and Sergei Vassilvitskii. Local search methods for k-means with outliers. Proceedings, International Conference on Very Large Data Bases (VLDB), 10(7):757–768, March 2017.
[24]
M. Hajiaghayi, R. Khandekar, and G. Kortsarz. Local search algorithms for the red-blue median problem. Algorithmica, 63(4):795–814, Aug 2012.
[25]
K. Jain and V. V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J. ACM, 48(2):274 – 296, 2001.
[26]
Kamal Jain, Mohammad Mahdian, Evangelos Markakis, Amin Saberi, and Vijay V. Vazirani. Greedy facility location algorithms analyzed using dual fitting with factor-revealing lp. J. ACM, 50(6):795–824, November 2003.
[27]
Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In Proceedings of STOC 2002.
[28]
Madhukar R. Korupolu, C. Greg Plaxton, and Rajmohan Rajaraman. Analysis of a local search heuristic for facility location problems. In Proceedings of ACM-SIAM SODA 1998, pages 1–10.
[29]
Ravishankar Krishnaswamy, Amit Kumar, Viswanath Nagarajan, Yogish Sabharwal, and Barna Saha. The matroid median problem. In Proceedings of ACM-SIAM SODA 2011.
[30]
Amit Kumar. Constant factor approximation algorithm for the knapsack median problem. In Proceedings of the Twenty-third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’12, pages 824–832, Philadelphia, PA, USA, 2012.
[31]
Society for Industrial and Applied Mathematics.
[32]
Amit Kumar and Ravindran Kannan. Clustering with spectral norm and the k-means algorithm. In Proceedings of FOCS 2010.
[33]
Euiwoong Lee, Melanie Schmidt, and John Wright. Improved and simplified inapproximability for k-means. Inf. Process. Lett., 120:40–43, 2017.
[34]
S. Li and O. Svensson. Approximating k-median via pseudo-approximation. ACM Symp. on Theory of Computing (STOC), 2013.
[35]
Shi Li. A 1.488 Approximation Algorithm for the Uncapacitated Facility Location Problem, pages 77–88. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011.
[36]
Jyh-Han Lin and Jeffrey Scott Vitter. Approximation algorithms for geometric median problems. Inf. Process. Lett., 44(5):245–249.
[37]
S. Lloyd. Least squares quantization in pcm. IEEE Trans. Inf. Theor., 28(2):129–137, September 2006.
[38]
Mohammad Mahdian, Yinyu Ye, and Jiawei Zhang. Approximation algorithms for metric facility location problems. SIAM J. Comput., 36(2):411–432, 2006.
[39]
Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, and Chaitanya Swamy. The effectiveness of lloyd-type methods for the k-means problem. J. ACM, 59(6):28:1– 28:22, 2012.
[40]
Lionel Ott, Linsey Pang, Fabio T Ramos, and Sanjay Chawla. On integrated clustering and outlier detection. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 1359–1367. 2014.
[41]
N. Rujeerapaiboon, K. Schindler, D. Kuhn, and W. Wiesemann. Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization. ArXiv e-prints, May 2017.
[42]
David B. Shmoys, Éva Tardos, and Karen Aardal. Approximation algorithms for facility location problems (extended abstract). In Proceedings of STOC 1997.
[43]
Chaitanya Swamy. Improved approximation algorithms for matroid and knapsack median problems and applications. ACM Trans. Algorithms, 12(4):49:1–49:22, August 2016.
[44]
David P. Williamson and David B. Shmoys. The Design of Approximation Algorithms. Cambridge University Press, New York, NY, USA, 1st edition, 2011.

Cited By

View all
  • (2025)FPT approximation for capacitated clustering with outliersTheoretical Computer Science10.1016/j.tcs.2024.1150261027:COnline publication date: 19-Feb-2025
  • (2025)Bi-criteria Sublinear Time Algorithms for Clustering with Outliers in High DimensionsComputing and Combinatorics10.1007/978-981-96-1090-7_8(91-103)Online publication date: 5-Mar-2025
  • (2024)Near-linear time approximation algorithms for k-means with outliersProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692864(19723-19756)Online publication date: 21-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing
June 2018
1332 pages
ISBN:9781450355599
DOI:10.1145/3188745
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximation algorithms
  2. iterative rounding
  3. k-means
  4. k-median
  5. outliers

Qualifiers

  • Research-article

Conference

STOC '18
Sponsor:
STOC '18: Symposium on Theory of Computing
June 25 - 29, 2018
CA, Los Angeles, USA

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25
57th Annual ACM Symposium on Theory of Computing (STOC 2025)
June 23 - 27, 2025
Prague , Czech Republic

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)3
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)FPT approximation for capacitated clustering with outliersTheoretical Computer Science10.1016/j.tcs.2024.1150261027:COnline publication date: 19-Feb-2025
  • (2025)Bi-criteria Sublinear Time Algorithms for Clustering with Outliers in High DimensionsComputing and Combinatorics10.1007/978-981-96-1090-7_8(91-103)Online publication date: 5-Mar-2025
  • (2024)Near-linear time approximation algorithms for k-means with outliersProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692864(19723-19756)Online publication date: 21-Jul-2024
  • (2024)Approximate algorithms for k-sparse Wasserstein Barycenter with outliersProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/588(5316-5325)Online publication date: 3-Aug-2024
  • (2024)Overlapping and Robust Edge-Colored Clustering in HypergraphsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635792(143-151)Online publication date: 4-Mar-2024
  • (2024)Distributed Data Placement and Content Delivery in Web Caches with Non-Metric Access CostsProceedings of the ACM Web Conference 202410.1145/3589334.3645654(4340-4351)Online publication date: 13-May-2024
  • (2024)MapReduce Algorithms for Robust Center-Based Clustering in Doubling MetricsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104966(104966)Online publication date: Aug-2024
  • (2024)Structural iterative rounding for generalized k-median problemsMathematical Programming10.1007/s10107-024-02119-7Online publication date: 10-Jul-2024
  • (2024)Connected k-Center and k-Diameter ClusteringAlgorithmica10.1007/s00453-024-01266-986:11(3425-3464)Online publication date: 2-Sep-2024
  • (2024)Clustering with a Knapsack Constraint: Parameterized Approximation Algorithms for the Knapsack Median ProblemFrontiers of Algorithmics10.1007/978-981-97-7752-5_2(21-32)Online publication date: 29-Dec-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media