Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1283383.1283495acmconferencesArticle/Chapter ViewAbstractPublication PagessodaConference Proceedingsconference-collections
Article

Spectral clustering with limited independence

Published: 07 January 2007 Publication History

Abstract

This paper considers the well-studied problem of clustering a set of objects under a probabilistic model of data in which each object is represented as a vector over the set of features, and there are only k different types of objects. In general, earlier results (mixture models and "planted" problems on graphs) often assumed that all coordinates of all objects are independent random variables. They then appeal to the theory of random matrices in order to infer spectral properties of the feature x object matrix. However, in most practical applications, assuming full independence is not realistic.
Instead, we only assume that the objects are independent, but the coordinates of each object may not be. We first generalize the required results for random matrices to this case of limited independence using some new techniques developed in Functional Analysis. Surprisingly, we are able to prove results that are quite similar to the fully independent case modulo an extra logarithmic factor. Using these bounds, we develop clustering algorithms for the more general mixture models. Our clustering algorithms have a substantially different and perhaps simpler "clean-up" phase than known algorithms. We show that our model subsumes not only the planted partition random graph models, but also another set of models under which there is a body of clustering algorithms, namely the Gaussian and log-concave mixture models.

References

[1]
Dimitris Achlioptas and Frank McSherry, On spectral learning of mixtures of distributions, Conference on Learning Theory (COLT) 2005, 458--469.
[2]
Noga Alon and Nabil Kahale, A spectral technique for coloring random 3-colorable graphs, SIAM Journal on Computing 26 (1997), n. 6. 1733--1748.
[3]
Noga Alon, Michael Krivelevich and Benny Sudakov, Finding a large hidden clique in a random graph, Proceedings of the 9 t h Annual ACM-SIAM Symposium on Discrete Algorithms, 1998.
[4]
Sanjeev Arora and Ravi Kannan, Learning mixtures of arbitrary gaussians, Proceedings of the 32 nd annual ACM Symposium on Theory of computing (2001), 247--257.
[5]
Yossi Azar, Amos Fiat, Anna R. Karlin, Frank McSherry and Jared Saia, Spectral analysis of data, Proceedings of the 32 nd annual ACM Symposium on Theory of computing (2001), 619--626.
[6]
Rajendra Bhatia, Matrix Analysis, New York, Springer-Verlag, 1997.
[7]
Ravi Boppana, Eigenvalues and graph bisection: an average case analysis, Proceedings of the 28 th IEEE Symposium on Foundations of Computer Science (1987).
[8]
Anirban Dasgupta, John Hopcroft and Frank McSherry, Spectral analysis of random Graphs with skewed degree distributions, Proceedings of the 42 nd IEEE Symposium on Foundations of Computer Science (2004), 602--610.
[9]
Sanjoy Dasgupta and Leonard Schulman, A two-round variant of EM for gaussian mixtures, UAI (2000), 152--159.
[10]
Petros Drineas, Iordanis Kerenidis, and Prabhakar Raghavan, Competitive recommendation systems, Proceedings of the 34 th ACM Symposium on Theory of Computing (STOC), pp. 82--90, 2002.
[11]
Martin Dyer and Alan Frieze, The solution of some random NP-hard problems in polynomial expected time, Journal of Algorithms, 10, 1989, 451--489.
[12]
Uriel Feige and Joe Kilian, Heuristics for semirandom graph problems, Journal of Computer and System Sciences, 63, 2001, 639--671.
[13]
Uriel Feige and Eran Ofek, Spectral techniques applied to sparse random graphs, Random Structures and Algorithms, 27(2), 251--275, September 2005.
[14]
Joel Friedman, Jeff Kahn and Endre Szemeredi, On the second eigenvalue of random regular graphs, Proceedings of the 21 st annual ACM Symposium on Theory of computing (1989), 587--598.
[15]
Zoltan Furedi and Janos Komlos, The eigenvalues of random symmetric matrices, Combinatorica 1, 3, (1981), 233--241.
[16]
G. Golub, C. Van Loan (1996), Matrix computations, third edition, The Johns Hopkins University Press Ltd., London.
[17]
Ravi Kannan, Hadi Salmasian and Santosh Vempala, The spectral method for general mixture models, Conference on Learning Theory (COLT) (2005), 444--457.
[18]
L. Kucera, Expected complexity of graph partitioning problems, Discrete Applied Mathematics 57 (1995), 193--212.
[19]
Frank McSherry, Spectral partitioning of random graphs, Proceedings of the 42 nd IEEE Symposium on Foundations of Computer Science (2001), 529--537.
[20]
Christos Papadimitriou, Prabhakar Raghavan Hisao Tamaki and Santosh Vempala, Latent semantic indexing: A probabilistic analysis, Journal of Computer and System Sciences (special issue for PODS '01), 61, (2000), 217--235.
[21]
Mark Rudelson, Random vectors in isotropic positions, Journal of Functional Analysis, 164, (1999), 60--72.
[22]
Eugene Wigner, Characteristic vectors of bordered matrices with infinite dimensions, Annals of Mathematics, 62, (1955), 548--564.
[23]
Eugene Wigner, On the distribution of the roots of certain symmetric matrices, Annals of Mathematics, 67, (1958), 325--328.
[24]
Santosh Vempala and Grant Wang, A spectral algorithm for learning mixture models, Journal of Computer and System Sciences, 68(4), (2004), 841--860.
[25]
Van Vu, Spectral norm of random matrices, Proceedings of the 36 th annual ACM Symposium on Theory of computing (2005), 619--626.

Cited By

View all
  • (2018)Bipartite stochastic block models with tiny clustersProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327302(3871-3881)Online publication date: 3-Dec-2018
  • (2010)Spectral methods for matrices and tensorsProceedings of the forty-second ACM symposium on Theory of computing10.1145/1806689.1806691(1-12)Online publication date: 5-Jun-2010
  1. Spectral clustering with limited independence

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SODA '07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
    January 2007
    1322 pages
    ISBN:9780898716245
    • Conference Chair:
    • Harold Gabow

    Sponsors

    Publisher

    Society for Industrial and Applied Mathematics

    United States

    Publication History

    Published: 07 January 2007

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    SODA '07 Paper Acceptance Rate 139 of 382 submissions, 36%;
    Overall Acceptance Rate 411 of 1,322 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Bipartite stochastic block models with tiny clustersProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327302(3871-3881)Online publication date: 3-Dec-2018
    • (2010)Spectral methods for matrices and tensorsProceedings of the forty-second ACM symposium on Theory of computing10.1145/1806689.1806691(1-12)Online publication date: 5-Jun-2010

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media