Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Integrate and Conquer: Double-Sided Two-Dimensional k-Means Via Integrating of Projection and Manifold Construction

Published: 01 June 2018 Publication History

Abstract

In this article, we introduce a novel, general methodology, called integrate and conquer, for simultaneously accomplishing the tasks of feature extraction, manifold construction, and clustering, which is taken to be superior to building a clustering method as a single task. When the proposed novel methodology is used on two-dimensional (2D) data, it naturally induces a new clustering method highly effective on 2D data. Existing clustering algorithms usually need to convert 2D data to vectors in a preprocessing step, which, unfortunately, severely damages 2D spatial information and omits inherent structures and correlations in the original data. The induced new clustering method can overcome the matrix-vectorization-related issues to enhance the clustering performance on 2D matrices. More specifically, the proposed methodology mutually enhances three tasks of finding subspaces, learning manifolds, and constructing data representation in a seamlessly integrated fashion. When used on 2D data, we seek two projection matrices with optimal numbers of directions to project the data into low-rank, noise-mitigated, and the most expressive subspaces, in which manifolds are adaptively updated according to the projections, and new data representation is built with respect to the projected data by accounting for nonlinearity via adaptive manifolds. Consequently, the learned subspaces and manifolds are clean and intrinsic, and the new data representation is discriminative and robust. Extensive experiments have been conducted and the results confirm the effectiveness of the proposed methodology and algorithm.

References

[1]
Kais Allab, Lazhar Labiod, and Mohamed Nadif. 2015. Simultaneous semi-NMF and PCA for clustering. In Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM’15). IEEE, Los Alamitos, CA, 679--684.
[2]
Sharon Alpert, Meirav Galun, Achi Brandt, and Ronen Basri. 2012. Image segmentation by probabilistic bottom-up aggregation and cue integration. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 2, 315--327.
[3]
Phipps Arabie. 1994. Cluster analysis in marketing research. In Advanced Methods in Marketing Research, R. P. Bagozzi (Ed.). Blackwell 8 Company, Oxford, England, 160--189.
[4]
Peter N. Belhumeur, João P. Hespanha, and David J. Kriegman. 1997. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 7, 711--720.
[5]
Alex Bewley and Ben Upcroft. 2013. Advantages of exploiting projection structure for segmenting dense 3D point clouds. In Proceedings of the Australian Conference on Robotics and Automation.
[6]
Antoni Buades, Bartomeu Coll, and J.-M. Morel. 2005. A non-local algorithm for image denoising. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2. IEEE, Los Alamitos, CA, 60--65.
[7]
Robin Burke. 2002. Hybrid recommender systems: Survey and experiments. User Modeling and User-Adapted Interaction 12, 4, 331--370.
[8]
Deng Cai, Xiaofei He, Jiawei Han, and Thomas S. Huang. 2011. Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 8, 1548--1560.
[9]
Deng Cai, Chiyuan Zhang, and Xiaofei He. 2010. Unsupervised feature selection for multi-cluster data. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 333--342.
[10]
Xiao Cai, Feiping Nie, and Heng Huang. 2013. Multi-view k-means clustering on big data. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2598--2604.
[11]
Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal component analysis? Journal of the ACM 58, 3, 11.
[12]
Wei-Chien Chang. 1983. On using principal components before separating a mixture of two multivariate normal distributions. Applied Statistics 32, 3, 267--275.
[13]
Fan R. K. Chung. 1997. Spectral Graph Theory. Vol. 92. American Mathematical Society.
[14]
Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. 2007. Weighted graph cuts without eigenvectors a multilevel approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 11, 1944--1957.
[15]
Chris H. Q. Ding, Tao Li, and Michael I. Jordan. 2010. Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1, 45--55.
[16]
Ehsan Elhamifar and René Vidal. 2009. Sparse subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, Los Alamitos, CA, 2790--2797.
[17]
Roman Filipovych, Susan M. Resnick, and Christos Davatzikos. 2011. Semi-supervised cluster analysis of imaging data. NeuroImage 54, 3, 2185--2197.
[18]
Athinodoros S. Georghiades, Peter N. Belhumeur, and David J. Kriegman. 2001. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 6, 643--660.
[19]
Jiawei Han, Jian Pei, and Micheline Kamber. 2011. Data Mining: Concepts and Techniques. Elsevier.
[20]
Darryl Hond and Libor Spacek. 1997. Distinctive descriptions for face processing. In Proceedings of the British Machine Vision Conference (BMVC’97). 1--4.
[21]
Andreas Hotho, Steffen Staab, and Gerd Stumme. 2003. Ontologies improve text document clustering. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03). IEEE, Los Alamitos, CA, 541--544.
[22]
Yao Hu, Debing Zhang, Jieping Ye, Xuelong Li, and Xiaofei He. 2013. Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 9, 2117--2130.
[23]
Jin Huang, Feiping Nie, Heng Huang, and Chris Ding. 2014. Robust manifold nonnegative matrix factorization. ACM Transactions on Knowledge Discovery from Data 8, 3, 11.
[24]
Ian Jolliffe. 2002. Principal Component Analysis. Wiley Online Library.
[25]
Zhao Kang, Chong Peng, and Qiang Cheng. 2017. Kernel-driven similarity learning. Neurocomputing 267, C, 210--219.
[26]
Alexander Kolesnikov, Elena Trichina, and Tuomo Kauranne. 2015. Estimating the number of clusters in a numerical data set via quantization error modeling. Pattern Recognition 48, 3, 941--952.
[27]
Daniel D. Lee and H. Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755, 788--791.
[28]
Zhouchen Lin, Minming Chen, and Yi Ma. 2010. The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv:1009.5055.
[29]
Guangcan Liu and Shuicheng Yan. 2011. Latent low-rank representation for subspace segmentation and feature extraction. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV’11). IEEE, Los Alamitos, CA, 1615--1622.
[30]
Nina Mishra, Robert Schreiber, Isabelle Stanton, and Robert E. Tarjan. 2007. Clustering social networks. In Proceedings of the International Workshop on Algorithms and Models for the Web-Graph. 56--67.
[31]
Pabitra Mitra, C. A. Murthy, and Sankar K. Pal. 2002. Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 3, 301--312.
[32]
Andrew Y. Ng, Michael I. Jordan, Yair Weiss, and others. 2002. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 2, 849--856.
[33]
Vishal M. Patel, Hien Van Nguyen, and René Vidal. 2013. Latent space sparse subspace clustering. In Proceedings of the IEEE International Conference on Computer Vision. 225--232.
[34]
Chong Peng, Zhao Kang, and Qiang Cheng. 2017a. Integrating feature and graph learning with low-rank representation. Neurocomputing 249, 106--116.
[35]
Chong Peng, Zhao Kang, Yunhong Hu, Jie Cheng, and Qiang Cheng. 2017b. Nonnegative matrix factorization with integrated graph and feature learning. ACM Transactions on Intelligent Systems and Technology 8, 3, 42.
[36]
Chong Peng, Zhao Kang, Yunhong Hu, Jie Cheng, and Qiang Cheng. 2017c. Robust graph regularized nonnegative matrix factorization for clustering. ACM Transactions on Knowledge Discovery from Data 11, 3, 33.
[37]
Chong Peng, Zhao Kang, Huiqing Li, and Qiang Cheng. 2015. Subspace clustering using log-determinant rank approximation. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 925--934.
[38]
Chong Peng, Zhao Kang, Ming Yang, and Qiang Cheng. 2016. Feature selection embedded subspace clustering. IEEE Signal Processing Letters 23, 7, 1018--1022.
[39]
Ferdinando S. Samaria and Andy C. Harter. 1994. Parameterisation of a stochastic model for human face identification. In Proceedings of the 2nd IEEE Workshop on Applications of Computer Vision. IEEE, Los Alamitos, CA, 138--142.
[40]
Bernhard Schiilkopf. 2001. The kernel trick for distances. In Proceedings of the 2000 Conference on Advances in Neural Information Processing Systems, Vol. 13. 301.
[41]
Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. 1998. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10, 5, 1299--1319.
[42]
Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 8, 888--905.
[43]
Subhash K. Shinde and Uday Kulkarni. 2012. Hybrid personalized recommender system using centering-bunching based clustering algorithm. Expert Systems with Applications 39, 1, 1381--1387.
[44]
Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and Computing 17, 4, 395--416.
[45]
Jingyuan Wang, Qian Gu, Junjie Wu, Guannan Liu, and Zhang Xiong. 2016. Traffic speed prediction and congestion source exploration: A deep learning method. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, Los Alamitos, CA, 499--508.
[46]
Jim Jing-Yan Wang, Halima Bensmail, and Xin Gao. 2014. Feature selection and multi-kernel learning for sparse representation on a manifold. Neural Networks 51, 9--16.
[47]
Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, S. Yu Philip, et al. 2008. Top 10 algorithms in data mining. Knowledge and Information Systems 14, 1, 1--37.
[48]
Jian Yang, David Zhang, Alejandro F. Frangi, and Jing-Yu Yang. 2004. Two-dimensional PCA: A new approach to appearance-based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 1, 131--137.
[49]
Ron Zass and Amnon Shashua. 2005. A unifying approach to hard and probabilistic clustering. In Proceedings of the 10th International Conference on Computer Vision (ICCV’05), Vol. 1. IEEE, Los Alamitos, CA, 294--301.
[50]
Daoqiang Zhang and Zhi-Hua Zhou. 2005. (2D) 2PCA: Two-directional two-dimensional PCA for efficient face representation and recognition. Neurocomputing 69, 1, 224--231.
[51]
Min-Ling Zhang and Zhi-Hua Zhou. 2014. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering 26, 8, 1819--1837.
[52]
Li Zheng, Tao Li, and Chris Ding. 2010. Hierarchical ensemble clustering. In Proceedings of the 2010 IEEE 10th International Conference on Data Mining (ICDM’10). IEEE, Los Alamitos, CA, 1199--1204.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 9, Issue 5
Research Survey and Regular Papers
September 2018
274 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3210369
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2018
Accepted: 01 February 2018
Revised: 01 December 2017
Received: 01 May 2017
Published in TIST Volume 9, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Clustering
  2. feature extraction
  3. two-dimensional data
  4. unsupervised learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • Fundamental Research Fund for the Central Universities of China
  • Science and Technology Planning Project of Guangdong Province, China
  • Foundation Program of Yuncheng University
  • Research Project Supported by Shanxi Scholarship Council of China
  • National Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)9
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Clustering accuracyApplied Computing and Intelligence10.3934/aci.20240034:1(24-44)Online publication date: 2024
  • (2023)Reweighted multi-view clustering with tissue-like P systemPLOS ONE10.1371/journal.pone.026987818:2(e0269878)Online publication date: 10-Feb-2023
  • (2022)Two-dimensional semi-nonnegative matrix factorization for clusteringInformation Sciences10.1016/j.ins.2021.12.098Online publication date: Jan-2022
  • (2022)Multi-view Clustering Based on Low-rank Representation and Adaptive Graph LearningNeural Processing Letters10.1007/s11063-021-10634-354:1(265-283)Online publication date: 1-Feb-2022
  • (2021)A Folded Concave Penalty Regularized Subspace Clustering Method to Integrate Affinity and ClusteringMathematical Problems in Engineering10.1155/2021/66411802021(1-13)Online publication date: 17-May-2021
  • (2021)Significant DBSCAN+: Statistically Robust Density-based ClusteringACM Transactions on Intelligent Systems and Technology10.1145/347484212:5(1-26)Online publication date: 24-Nov-2021
  • (2021)Discriminative Ridge Machine: A Classifier for High-Dimensional Data or Imbalanced DataIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.300687732:6(2595-2609)Online publication date: Jun-2021
  • (2021)An Inception Convolutional Autoencoder Model for Chinese Healthcare Question ClusteringIEEE Transactions on Cybernetics10.1109/TCYB.2019.291658051:4(2019-2031)Online publication date: Apr-2021
  • (2021)Nonnegative matrix factorization with local similarity learningInformation Sciences10.1016/j.ins.2021.01.087562(325-346)Online publication date: Jul-2021
  • (2020)Towards Clustering-friendly RepresentationsProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3413597(3081-3089)Online publication date: 12-Oct-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media