Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1772690.1772862acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
poster

Web-scale k-means clustering

Published: 26 April 2010 Publication History
  • Get Citation Alerts
  • Abstract

    We present two modifications to the popular k-means clustering algorithm to address the extreme requirements for latency, scalability, and sparsity encountered in user-facing web applications. First, we propose the use of mini-batch optimization for k-means clustering. This reduces computation cost by orders of magnitude compared to the classic batch algorithm while yielding significantly better solutions than online stochastic gradient descent. Second, we achieve sparsity with projected gradient descent, and give a fast ε-accurate projection onto the L1-ball. Source code is freely available: http://code.google.com/p/sofia-ml

    References

    [1]
    L. Bottou and Y. Bengio. Convergence properties of the kmeans algorithm. In Advances in Neural Information Processing Systems. 1995.
    [2]
    J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l1-ball for learning in high dimensions. In ICML '08: Proceedings of the 25th international conference on Machine learning, 2008.
    [3]
    C. Elkan. Using the triangle inequality to accelerate k-means. In ICML '03: Proceedings of the 20th international conference on Machine learning, 2003.
    [4]
    D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res., 5, 2004.
    [5]
    D. Witten and R. Tibshirani. A framework for feature selection in clustering. To Appear: Journal of the American Statistical Association, 2010.
    [6]
    X. Wu and V. Kumar. The Top Ten Algorithms in Data Mining. Chapman & Hall/CRC, 2009.

    Cited By

    View all
    • (2024)Continuous Online Semantic Implicit Representation for Autonomous Ground Robot Navigation in Unstructured EnvironmentsRobotics10.3390/robotics1307010813:7(108)Online publication date: 18-Jul-2024
    • (2024)High-Performance Hybrid Algorithm for Minimum Sum-of-Squares Clustering of Infinitely Tall DataMathematics10.3390/math1213193012:13(1930)Online publication date: 21-Jun-2024
    • (2024)Learning in Deep Radial Basis Function NetworksEntropy10.3390/e2605036826:5(368)Online publication date: 26-Apr-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '10: Proceedings of the 19th international conference on World wide web
    April 2010
    1407 pages
    ISBN:9781605587998
    DOI:10.1145/1772690

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 April 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. scalability
    2. sparse solutions
    3. unsupervised clustering

    Qualifiers

    • Poster

    Conference

    WWW '10
    WWW '10: The 19th International World Wide Web Conference
    April 26 - 30, 2010
    North Carolina, Raleigh, USA

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)435
    • Downloads (Last 6 weeks)51
    Reflects downloads up to 14 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Continuous Online Semantic Implicit Representation for Autonomous Ground Robot Navigation in Unstructured EnvironmentsRobotics10.3390/robotics1307010813:7(108)Online publication date: 18-Jul-2024
    • (2024)High-Performance Hybrid Algorithm for Minimum Sum-of-Squares Clustering of Infinitely Tall DataMathematics10.3390/math1213193012:13(1930)Online publication date: 21-Jun-2024
    • (2024)Learning in Deep Radial Basis Function NetworksEntropy10.3390/e2605036826:5(368)Online publication date: 26-Apr-2024
    • (2024)SCALE-BOSS-MR: Scalable Time Series Classification Using Multiple Symbolic RepresentationsApplied Sciences10.3390/app1402068914:2(689)Online publication date: 13-Jan-2024
    • (2024)PhyberSIM: a tool for the generation of ground truth to evaluate brain fiber clustering algorithmsFrontiers in Neuroscience10.3389/fnins.2024.139651818Online publication date: 30-May-2024
    • (2024)A comprehensive investigation of clustering algorithms for User and Entity Behavior AnalyticsFrontiers in Big Data10.3389/fdata.2024.13758187Online publication date: 9-May-2024
    • (2024)City composition and accessibility statistics in and around ParisFrontiers in Big Data10.3389/fdata.2024.13540077Online publication date: 1-Mar-2024
    • (2024)A 60-cm water body map obtained using aerial photography: Application to the Tama and Tsurumi riversHydrological Research Letters10.3178/hrl.18.118:1(1-6)Online publication date: 2024
    • (2024)A privacy-preserving vehicle trajectory clustering framework隐私保护下的车辆轨迹聚类方法研究Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.230036925:7(988-1002)Online publication date: 27-Jul-2024
    • (2024)Low frequency signal classification using clustering methodsPROBLEMS IN PROGRAMMING10.15407/pp2024.01.048(48-56)Online publication date: Jan-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    ePub

    View this article in ePub.

    ePub

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media