Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3298483.3298603guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Compressed K-means for large-scale clustering

Published: 04 February 2017 Publication History

Abstract

Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-the-art large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

References

[1]
Arthur, D., and Vassilvitskii, S. 2007. k-means++: The advantages of careful seeding. In Proceedings of ACM-SIAM Symposium on Discrete Algorithms, 1027-1035.
[2]
Bachem, O.; Lucic, M.; Hassani, S. H.; and Krause, A. 2016. Approximate k-means++ in sublinear time. In AAAI, 1459-1467.
[3]
Chen, X., and Cai, D. 2011. Large scale spectral clustering with landmark-based representation. In AAAI, 313-318.
[4]
Chen, W.-Y.; Song, Y.; Bai, H.; Lin, C.-J.; and Chang, E. Y. 2011. Parallel spectral clustering in distributed systems. TPAMI 33(3):568-586.
[5]
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei-Fei, L. 2009. Imagenet: A large-scale hierarchical image database. In CVPR, 248-255.
[6]
Ding, Y.; Zhao, Y.; Shen, X.; Musuvathi, M.; and Mytkowicz, T. 2015. Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup. In ICML, 579-587.
[7]
Drake, J., and Hamerly, G. 2012. Accelerated k-means with adaptive distance bounds. In 5th NIPS workshop on optimization for machine learning, 42-53.
[8]
Elkan, C. 2003. Using the triangle inequality to accelerate k-means. In ICML, volume 3, 147-153.
[9]
Gionis, A.; Indyk, P.; Motwani, R.; et al. 1999. Similarity search in high dimensions via hashing. In VLDB, 518-529.
[10]
Gong, Y., and Lazebnik, S. 2011. Iterative quantization: A procrustean approach to learning binary codes. In CVPR, 817-824.
[11]
Gong, Y.; Pawlowski, M.; Yang, F.; Brandy, L.; Bourdev, L.; and Fergus, R. 2015. Web scale photo hash clustering on a single machine. In CVPR, 19-27.
[12]
Hamerly, G. 2010. Making k-means even faster. In SDM, 130-140.
[13]
Hartigan, J. A., and Wong, M. A. 1979. Algorithm as 136: A k-means clustering algorithm. Applied Statistics 28(1):100-108.
[14]
Hazan, T.; Keshet, J.; and McAllester, D. A. 2010. Direct loss minimization for structured prediction. In NIPS, 1594-1602.
[15]
Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In NIPS, 1097-1105.
[16]
Li, Y.-F.; Tsang, I. W.; Kwok, J. T.-Y.; and Zhou, Z.-H. 2009. Tighter and convex maximum margin clustering. In AISTATS, 344-351.
[17]
Li, Y.; Nie, F.; Huang, H.; and Huang, J. 2015. Large-scale multi-view spectral clustering via bipartite graph. In AAAI, 2750-2756.
[18]
Lin, G.; Shen, C.; and van den Hengel, A. 2015. Supervised hashing using graph cuts and boosted decision trees. TPAMI 37(11):2317-2331.
[19]
Liu, W., and Tsang, I. W. 2015. Large margin metric learning for multi-label prediction. In AAAI, 2800-2806.
[20]
Ng, A. Y.; Jordan, M. I.; Weiss, Y.; et al. 2001. On spectral clustering: Analysis and an algorithm. In NIPS, 849-856.
[21]
Norouzi, M.; Fleet, D. J.; and Salakhutdinov, R. R. 2012. Hamming distance metric learning. In NIPS, 1061-1069.
[22]
Shen, X.; Shen, F.; Sun, Q.-S.; Yang, Y.; Yuan, Y.-H.; and Shen, H. T. 2017. Semi-paired discrete hashing: Learning latent hash codes for semi-paired cross-view retrieval. TCYB.
[23]
Shi, J., and Malik, J. 2000. Normalized cuts and image segmentation. TPAMI 22(8):888-905.
[24]
Song, D.; Liu, W.; and Meyer, D. A. 2016. Fast structural binary coding. In IJCAI, 2018-2024.
[25]
Wang, H.; Nie, F.; Huang, H.; and Makedon, F. 2011a. Fast nonnegative matrix tri-factorization for large-scale data co-clustering. In IJCAI, 1553-1558.
[26]
Wang, Y.; Jiang, Y.; Wu, Y.; and Zhou, Z.-H. 2011b. Local and structural consistency for multi-manifold clustering. In IJCAI, 1559-1564.
[27]
Wang, J.; Liu, W.; Kumar, S.; and Chang, S.-F. 2016. Learning to hash for indexing big data-a survey. Proceedings of the IEEE 104(1):34-57.
[28]
Wu, X.; Kumar, V.; Quinlan, J. R.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G. J.; Ng, A.; Liu, B.; Philip, S. Y.; et al. 2008. Top 10 algorithms in data mining. Knowledge and Information Systems 14(1):1-37.
[29]
Yu, C.-N. J., and Joachims, T. 2009. Learning structural svms with latent variables. In ICML, 1169-1176.
[30]
Zhang, R., and Lu, Z. 2016. Large scale sparse clustering. In IJCAI, 2336-2342.
[31]
Zhou, J. T.; Xu, X.; Pan, S. J.; Tsang, I. W.; Qin, Z.; and Goh, R. S. M. 2016. Transfer hashing with privileged information. In IJCAI, 2414-2420.

Cited By

View all
  • (2023)Binary multi-view clustering with spectral embeddingNeurocomputing10.1016/j.neucom.2023.126733557:COnline publication date: 7-Nov-2023
  • (2022)Coreset for line-sets clusteringProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602978(37363-37375)Online publication date: 28-Nov-2022
  • (2021)Fast k-means Clustering Based on the Neighbor Information2021 International Symposium on Electrical, Electronics and Information Engineering10.1145/3459104.3459194(551-555)Online publication date: 19-Feb-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence
February 2017
5106 pages

Sponsors

  • Association for the Advancement of Artificial Intelligence
  • amazon: amazon
  • Infosys
  • Facebook: Facebook
  • IBM: IBM

Publisher

AAAI Press

Publication History

Published: 04 February 2017

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Binary multi-view clustering with spectral embeddingNeurocomputing10.1016/j.neucom.2023.126733557:COnline publication date: 7-Nov-2023
  • (2022)Coreset for line-sets clusteringProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602978(37363-37375)Online publication date: 28-Nov-2022
  • (2021)Fast k-means Clustering Based on the Neighbor Information2021 International Symposium on Electrical, Electronics and Information Engineering10.1145/3459104.3459194(551-555)Online publication date: 19-Feb-2021
  • (2021)Incremental Community Detection on Large Complex Attributed NetworkACM Transactions on Knowledge Discovery from Data10.1145/345121615:6(1-20)Online publication date: 19-May-2021
  • (2019)Pseudo supervised matrix factorization in discriminative subspaceProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367676(4554-4560)Online publication date: 10-Aug-2019
  • (2019)Community Detection on Large Complex Attribute NetworkProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330721(2041-2049)Online publication date: 25-Jul-2019
  • (2019)Community Detection in Multi-Layer Networks Using Joint Nonnegative Matrix FactorizationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.283220531:2(273-286)Online publication date: 1-Feb-2019

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media