Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/780542.780550acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
Article

Approximation schemes for clustering problems

Published: 09 June 2003 Publication History

Abstract

Let k be a fixed integer. We consider the problem of partitioning an input set of points endowed with a distance function into k clusters. We give polynomial time approximation schemes for the following three clustering problems: Metric k-Clustering, l 22 k-Clustering, and l22 k-Median. In the k-Clustering problem, the objective is to minimize the sum of all intra-cluster distances. In the k-Median problem, the goal is to minimize the sum of distances from points in a cluster to the (best choice of) cluster center. In metric instances, the input distance function is a metric. In l 22 instances, the points are in R d and the distance between two points x,y is measured by x−y 22 (notice that (R d, ⋅ 22 is not a metric space). For the first two problems, our results are the first polynomial time approximation schemes. For the third problem, the running time of our algorithms is a vast improvement over previous work.

References

[1]
P. K. Agarwal and C. M. Procopiuc. Exact and approximation algorithms for clustering. In Proc. of the 9th Ann. ACM-SIAM Symp. on Discrete Algorithms, January 1998, pages 658--667.
[2]
N. Alon, S. Dar, M. Parnas, and D. Ron. Testing of clustering. In Proc. of the 41th Ann. IEEE Symp. on Foundations of Computer Science (FOCS) 2000, 240--250.
[3]
N. Alon and B. Sudakov. On two segmentation problems. Journal of Algorithms, 33:173--184, 1999.
[4]
S. Arora, D. Karger, and M. Karpinski. Polynomial time approximation schemes for dense instances of NP-hard problems. J. Comp. System. Sci., 58:193--210, 1999.
[5]
S. Arora, P. Raghavan, and S. Rao. Approximation schemes for Euclidean k-medians and related problems. In Proc. of the 30th Ann. ACM Symp. on Theory of Computing, 1998.
[6]
M. Bádoiu, S. Har-Peled, and P. Indyk. Approximate clustering via Core-Sets. Proc. 34th ACM STOC (2002), pages 250--257.
[7]
Y. Bartal, M. Charikar, and D. Raz. Approximating min-sum k-clustering in metric spaces. In Proc. of the 33rd Ann. ACM Symp. on Theory of Computing, July 2001, pages 11--20.
[8]
A. Broder, S. Glassman, M. Manasse, and G. Zweig. Syntactic clustering of the Web. In Proc. of the 6th Int'l World Wide Web Conf. (WWW), 1997, pages 391--404.
[9]
B. Carl and I. Stephani. Entropy, Compactness and the Approximation of Operators. Cambridge University Press, 1990.
[10]
M. Charikar and S. Guha. Improved combinatorial algorithms for the facility location and k-median problems. In Proc. of the 40th Ann. IEEE Symp. on Foundations of Computer Science, 1999.
[11]
M. Charikar, S. Guha, D.B. Shmoys, and É. Tardos. A constant factor approximation algorithm for the k-median problem. In Proc. of the 31st Ann. ACM Symp. on Theory of Computing, 1999.
[12]
S. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6):391--407, 1990.
[13]
W. Fernandez de la Vega, M. Karpinski, and C. Kenyon. A polynomial time approximation scheme for metric MIN-BISECTION. ECCC TR02-041, 2002.
[14]
W. Fernandez de la Vega and C. Kenyon. A randomized approximation scheme for metric MAX CUT. In Proc. of the 39th Ann. IEEE Symp. on Foundations of Computer Science (FOCS), 1998, pages 468--471, also in JCSS 63 (2001). pages 531--541.
[15]
P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering in large graphs and matrices. In Proc. of the 10th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA), 1999, pages 291--299.
[16]
C. Faloutsos, R. Barber, M. Flickner, J. Hafner, W. Niblack, D. Petkovic, and W. Equitz. Efficient and effective querying by image content. Journal of Intelligent Information Systems, 3(3):231--262, 1994.
[17]
N. Garg, V. Vazirani, and M. Yannakakis. Approximate max-flow min-(multi)cut theorems and their applications. SIAM Journal on Computing, 25(2):235--251, 1996.
[18]
O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approximation. J. of the ACM, 45:653--750, 1998.
[19]
S. Guha and S. Khuller. Greedy strikes back: Improved facility location algorithms. In Proc. of the 9th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA), January 1998, 649--657.
[20]
N. Guttmann-Beck and R. Hassin. Approximation algorithms for min-sum p-clustering. Disc. Applied Math., 89:125--142, 1998.
[21]
P. Indyk. A sublinear time approximation scheme for clustering in metric spaces. In Proc. of the 40th Ann. IEEE Symp. on Foundations of Computer Science (FOCS), 1999, 154--159.
[22]
K. Jain and V.V. Vazirani. Primal-dual approximation algorithms for metric facility location and k-median problems. In Proc. of the 40th Ann. IEEE Symp. on Foundations of Computer Science, 1999.
[23]
V. Kann, S. Khanna, J. Lagergren, and A. Panconesi. On the hardness of approximating Max k-Cut and its dual. In Proc. of the 4th Israeli Symp. on Theory of Computing and Systems (ISTCS), 1996. Also in Chicago Journal of Theoretical Computer Science, 1997.
[24]
J. Kleinberg, C. Papadimitriou, and P. Raghavan. Segmentation problems. In Proc. of the 30th Ann. ACM Symp. on Theory of Computing (STOC), 1998, pages 473--482.
[25]
N. Mishra, D. Oblinger, and L. Pitt. Sublinear time approximate clustering. In Proc. of the 12th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA), January 2001, pages 439--447.
[26]
R. Ostrovsky and Y. Rabani. Polynomial time approximation schemes for geometric clustering problems. J. of the ACM, 49(2):139--156, March 2002.
[27]
S. Sahni and T. Gonzalez. P-complete approximation problems. Journal of the ACM, 23(3):555--565, 1976.
[28]
L. J. Schulman. Clustering for edge-cost minimization. In Proc. of the 32nd Ann. ACM Symp. on Theory of Computing (STOC), 2000, pages 547--555.
[29]
R. Shamir and R. Sharan. Algorithmic approaches to clustering gene expression data. In T. Jiang, T. Smith, Y. Xu, M.Q. Zhang eds., Current Topics in Computational Biology, MIT Press, to appear.
[30]
M. J. Swain and D. H. Ballard. Color indexing. International Journal of Computer Vision, 7:11--32, 1991.

Cited By

View all
  • (2024)Parameterized Approximation Algorithms and Lower Bounds for k-Center Clustering and VariantsAlgorithmica10.1007/s00453-024-01236-186:8(2557-2574)Online publication date: 13-May-2024
  • (2024)Speeding Up Constrained k-Means Through 2-MeansAlgorithmic Aspects in Information and Management10.1007/978-981-97-7801-0_5(52-63)Online publication date: 19-Sep-2024
  • (2023)Reconstruction of Viral Variants via Monte Carlo ClusteringJournal of Computational Biology10.1089/cmb.2023.015430:9(1009-1018)Online publication date: 1-Sep-2023
  • Show More Cited By

Index Terms

  1. Approximation schemes for clustering problems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC '03: Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
    June 2003
    740 pages
    ISBN:1581136749
    DOI:10.1145/780542
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2003

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    STOC03
    Sponsor:

    Acceptance Rates

    STOC '03 Paper Acceptance Rate 80 of 270 submissions, 30%;
    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Parameterized Approximation Algorithms and Lower Bounds for k-Center Clustering and VariantsAlgorithmica10.1007/s00453-024-01236-186:8(2557-2574)Online publication date: 13-May-2024
    • (2024)Speeding Up Constrained k-Means Through 2-MeansAlgorithmic Aspects in Information and Management10.1007/978-981-97-7801-0_5(52-63)Online publication date: 19-Sep-2024
    • (2023)Reconstruction of Viral Variants via Monte Carlo ClusteringJournal of Computational Biology10.1089/cmb.2023.015430:9(1009-1018)Online publication date: 1-Sep-2023
    • (2022)Entropy Based Clustering of Viral SequencesBioinformatics Research and Applications10.1007/978-3-031-23198-8_33(369-380)Online publication date: 14-Nov-2022
    • (2021)On approximability of clustering problems without candidate centersProceedings of the Thirty-Second Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3458064.3458220(2635-2648)Online publication date: 10-Jan-2021
    • (2021)Parameterized k-Clustering: Tractability islandJournal of Computer and System Sciences10.1016/j.jcss.2020.10.005117(50-74)Online publication date: May-2021
    • (2020)Differentially private clusteringProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496064(4040-4054)Online publication date: 6-Dec-2020
    • (2020)Weighted Mutual Information for Aggregated Kernel ClusteringEntropy10.3390/e2203035122:3(351)Online publication date: 18-Mar-2020
    • (2020)A Survey on Approximation in Parameterized Complexity: Hardness and AlgorithmsAlgorithms10.3390/a1306014613:6(146)Online publication date: 19-Jun-2020
    • (2020)Robust Hierarchical Overlapping Community Detection With Personalized PageRankIEEE Access10.1109/ACCESS.2020.29988608(102867-102882)Online publication date: 2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media