Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3188745.3188882acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Constant approximation for k-median and k-means with outliers via iterative rounding

Published: 20 June 2018 Publication History

Abstract

In this paper, we present a new iterative rounding framework for many clustering problems. Using this, we obtain an (α1 + є ≤ 7.081 + є)-approximation algorithm for k-median with outliers, greatly improving upon the large implicit constant approximation ratio of Chen. For k-means with outliers, we give an (α2+є ≤ 53.002 + є)-approximation, which is the first O(1)-approximation for this problem. The iterative algorithm framework is very versatile; we show how it can be used to give α1- and (α1 + є)-approximation algorithms for matroid and knapsack median problems respectively, improving upon the previous best approximations ratios of 8 due to Swamy and 17.46 due to Byrka et al. The natural LP relaxation for the k-median/k-means with outliers problem has an unbounded integrality gap. In spite of this negative result, our iterative rounding framework shows that we can round an LP solution to an almost-integral solution of small cost, in which we have at most two fractionally open facilities. Thus, the LP integrality gap arises due to the gap between almost-integral and fully-integral solutions. Then, using a pre-processing procedure, we show how to convert an almost-integral solution to a fully-integral solution losing only a constant-factor in the approximation ratio. By further using a sparsification technique, the additive factor loss incurred by the conversion can be reduced to any є > 0.

Supplementary Material

MP4 File (5a-5.mp4)

References

[1]
Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and euclidean k-median by primal-dual algorithms. Proceedings, IEEE Symposium on Foundations of Computer Science (FOCS), abs/1612.07925, 2017.
[2]
Sanjeev Arora, Prabhakar Raghavan, and Satish Rao. Approximation schemes for euclidean k-medians and related problems. In Proceedings of STOC, STOC ’98, pages 106–113, New York, NY, USA, 1998. ACM.
[3]
David Arthur, Bodo Manthey, and Heiko Röglin. Smoothed analysis of the k-means method. J. ACM, 58(5):19:1–19:31, 2011.
[4]
David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In Proceedings of ACM-SIAM SODA 2007.
[5]
Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristic for k-median and facility location problems. In Proceedings of STOC 2001.
[6]
Pranjal Awasthi, Avrim Blum, and Or Sheffet. Stability yields a PTAS for kmedian and k-means clustering. In Proceedings of FOCS 2010, pages 309–318. IEEE Computer Society, 2010.
[7]
Maria-Florina Balcan, Avrim Blum, and Anupam Gupta. Clustering under approximation stability. J. ACM, 60(2):8:1–8:34, 2013.
[8]
Jaroslaw Byrka. An optimal bifactor approximation algorithm for the metric uncapacitated facility location problem. In APPROX/RANDOM 2007, Princeton, NJ, USA, Proceedings, pages 29–43, 2007.
[9]
Jaroslaw Byrka, Thomas Pensyl, Bartosz Rybicki, Joachim Spoerhase, Aravind Srinivasan, and Khoa Trinh. An improved approximation algorithm for knapsack median using sparsification. In Proceedings of ESA 2015, pages 275–287, 2015.
[10]
Jaroslaw Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median and positive correlation in budgeted optimization. ACM Trans. Algorithms, 13(2):23:1–23:31, March 2017.
[11]
M. Charikar, S. Guha, D. Shmoys, and E. Tardos. A constant-factor approximation algorithm for the k-median problem. ACM Symp. on Theory of Computing (STOC), 1999.
[12]
M. Charikar, S. Khuller, D. M. Mount, and G. Narasimhan. Algorithms for facility location problems with outliers. Proceedings, ACM-SIAM Symposium on Discrete Algorithms (SODA), 2001.
[13]
Moses Charikar and Sudipto Guha. Improved combinatorial algorithms for the facility location and k-median problems. In Proceedings of FOCS 1999.
[14]
Moses Charikar and Shi Li. A dependent lp-rounding approach for the k-median problem. In Proceedings of ICALP 2012.
[15]
Sanjay Chawla and Aristides Gionis. k-means–: A unified approach to clustering and outlier detection. In Proceedings of the 13th SIAM International Conference on Data Mining, May 2-4, 2013. Austin, Texas, USA., pages 189–197, 2013.
[16]
Ke Chen. A constant factor approximation algorithm for k-median clustering with outliers. In Proceedings of ACM-SIAM SODA 2008.
[17]
Fabián A. Chudak and David B. Shmoys. Improved approximation algorithms for the uncapacitated facility location problem. SIAM J. Comput., 33(1):1–25, 2003.
[18]
V. Cohen-Addad and C. Schwiegelshohn. On the Local Structure of Stable Clustering Instances. Proceedings, IEEE Symposium on Foundations of Computer Science (FOCS), October 2017.
[19]
Vincent Cohen-Addad, Philip N. Klein, and Claire Mathieu. The power of local search for clustering. Proceedings, IEEE Symposium on Foundations of Computer Science (FOCS), abs/1603.09535, 2016.
[20]
Zachary Friggstad, Kamyar Khodamoradi, Mohsen Rezapour, and Mohammad R. Salavatipour. Approximation schemes for clustering with outliers. Proceedings, ACM-SIAM Symposium on Discrete Algorithms (SODA), abs/1707.04295, 2018.
[21]
Zachary Friggstad, Mohsen Rezapour, and Mohammad R. Salavatipour. Local search yields a PTAS for k-means in doubling metrics. Proceedings, IEEE Symposium on Foundations of Computer Science (FOCS), abs/1603.08976, 2016.
[22]
Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility location algorithms. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’98, pages 649–657, Philadelphia, PA, USA, 1998. Society for Industrial and Applied Mathematics.
[23]
Shalmoli Gupta, Ravi Kumar, Kefu Lu, Benjamin Moseley, and Sergei Vassilvitskii. Local search methods for k-means with outliers. Proceedings, International Conference on Very Large Data Bases (VLDB), 10(7):757–768, March 2017.
[24]
M. Hajiaghayi, R. Khandekar, and G. Kortsarz. Local search algorithms for the red-blue median problem. Algorithmica, 63(4):795–814, Aug 2012.
[25]
K. Jain and V. V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. J. ACM, 48(2):274 – 296, 2001.
[26]
Kamal Jain, Mohammad Mahdian, Evangelos Markakis, Amin Saberi, and Vijay V. Vazirani. Greedy facility location algorithms analyzed using dual fitting with factor-revealing lp. J. ACM, 50(6):795–824, November 2003.
[27]
Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In Proceedings of STOC 2002.
[28]
Madhukar R. Korupolu, C. Greg Plaxton, and Rajmohan Rajaraman. Analysis of a local search heuristic for facility location problems. In Proceedings of ACM-SIAM SODA 1998, pages 1–10.
[29]
Ravishankar Krishnaswamy, Amit Kumar, Viswanath Nagarajan, Yogish Sabharwal, and Barna Saha. The matroid median problem. In Proceedings of ACM-SIAM SODA 2011.
[30]
Amit Kumar. Constant factor approximation algorithm for the knapsack median problem. In Proceedings of the Twenty-third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’12, pages 824–832, Philadelphia, PA, USA, 2012.
[31]
Society for Industrial and Applied Mathematics.
[32]
Amit Kumar and Ravindran Kannan. Clustering with spectral norm and the k-means algorithm. In Proceedings of FOCS 2010.
[33]
Euiwoong Lee, Melanie Schmidt, and John Wright. Improved and simplified inapproximability for k-means. Inf. Process. Lett., 120:40–43, 2017.
[34]
S. Li and O. Svensson. Approximating k-median via pseudo-approximation. ACM Symp. on Theory of Computing (STOC), 2013.
[35]
Shi Li. A 1.488 Approximation Algorithm for the Uncapacitated Facility Location Problem, pages 77–88. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011.
[36]
Jyh-Han Lin and Jeffrey Scott Vitter. Approximation algorithms for geometric median problems. Inf. Process. Lett., 44(5):245–249.
[37]
S. Lloyd. Least squares quantization in pcm. IEEE Trans. Inf. Theor., 28(2):129–137, September 2006.
[38]
Mohammad Mahdian, Yinyu Ye, and Jiawei Zhang. Approximation algorithms for metric facility location problems. SIAM J. Comput., 36(2):411–432, 2006.
[39]
Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, and Chaitanya Swamy. The effectiveness of lloyd-type methods for the k-means problem. J. ACM, 59(6):28:1– 28:22, 2012.
[40]
Lionel Ott, Linsey Pang, Fabio T Ramos, and Sanjay Chawla. On integrated clustering and outlier detection. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 1359–1367. 2014.
[41]
N. Rujeerapaiboon, K. Schindler, D. Kuhn, and W. Wiesemann. Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization. ArXiv e-prints, May 2017.
[42]
David B. Shmoys, Éva Tardos, and Karen Aardal. Approximation algorithms for facility location problems (extended abstract). In Proceedings of STOC 1997.
[43]
Chaitanya Swamy. Improved approximation algorithms for matroid and knapsack median problems and applications. ACM Trans. Algorithms, 12(4):49:1–49:22, August 2016.
[44]
David P. Williamson and David B. Shmoys. The Design of Approximation Algorithms. Cambridge University Press, New York, NY, USA, 1st edition, 2011.

Cited By

View all
  • (2024)Overlapping and Robust Edge-Colored Clustering in HypergraphsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635792(143-151)Online publication date: 4-Mar-2024
  • (2024)Distributed Data Placement and Content Delivery in Web Caches with Non-Metric Access CostsProceedings of the ACM Web Conference 202410.1145/3589334.3645654(4340-4351)Online publication date: 13-May-2024
  • (2024)MapReduce Algorithms for Robust Center-Based Clustering in Doubling MetricsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104966(104966)Online publication date: Aug-2024
  • Show More Cited By

Index Terms

  1. Constant approximation for k-median and k-means with outliers via iterative rounding

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing
    June 2018
    1332 pages
    ISBN:9781450355599
    DOI:10.1145/3188745
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. approximation algorithms
    2. iterative rounding
    3. k-means
    4. k-median
    5. outliers

    Qualifiers

    • Research-article

    Conference

    STOC '18
    Sponsor:
    STOC '18: Symposium on Theory of Computing
    June 25 - 29, 2018
    CA, Los Angeles, USA

    Acceptance Rates

    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)72
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Overlapping and Robust Edge-Colored Clustering in HypergraphsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635792(143-151)Online publication date: 4-Mar-2024
    • (2024)Distributed Data Placement and Content Delivery in Web Caches with Non-Metric Access CostsProceedings of the ACM Web Conference 202410.1145/3589334.3645654(4340-4351)Online publication date: 13-May-2024
    • (2024)MapReduce Algorithms for Robust Center-Based Clustering in Doubling MetricsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104966(104966)Online publication date: Aug-2024
    • (2024)Structural iterative rounding for generalized k-median problemsMathematical Programming10.1007/s10107-024-02119-7Online publication date: 10-Jul-2024
    • (2024)Connected k-Center and k-Diameter ClusteringAlgorithmica10.1007/s00453-024-01266-986:11(3425-3464)Online publication date: 2-Sep-2024
    • (2024)Approximation Algorithms for Robust Clustering Problems Using Local Search TechniquesTheory and Applications of Models of Computation10.1007/978-981-97-2340-9_17(197-208)Online publication date: 3-May-2024
    • (2024)Capacitated Facility Location with Outliers and Uniform Facility CostsInteger Programming and Combinatorial Optimization10.1007/978-3-031-59835-7_7(85-98)Online publication date: 22-May-2024
    • (2023)Fast algorithms for distributed k-clustering with outliersProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618970(13845-13868)Online publication date: 23-Jul-2023
    • (2023)Approximation algorithms for fair range clusteringProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618948(13270-13284)Online publication date: 23-Jul-2023
    • (2023)Clustering what mattersProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i6.25818(6666-6674)Online publication date: 7-Feb-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media