research-article

Public Access

Integrate and Conquer: Double-Sided Two-Dimensional k-Means Via Integrating of Projection and Manifold Construction

Authors:

Qiang ChengAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 9, Issue 5

Article No.: 57, Pages 1 - 25

https://doi.org/10.1145/3200488

Published: 01 June 2018 Publication History

Abstract

In this article, we introduce a novel, general methodology, called integrate and conquer, for simultaneously accomplishing the tasks of feature extraction, manifold construction, and clustering, which is taken to be superior to building a clustering method as a single task. When the proposed novel methodology is used on two-dimensional (2D) data, it naturally induces a new clustering method highly effective on 2D data. Existing clustering algorithms usually need to convert 2D data to vectors in a preprocessing step, which, unfortunately, severely damages 2D spatial information and omits inherent structures and correlations in the original data. The induced new clustering method can overcome the matrix-vectorization-related issues to enhance the clustering performance on 2D matrices. More specifically, the proposed methodology mutually enhances three tasks of finding subspaces, learning manifolds, and constructing data representation in a seamlessly integrated fashion. When used on 2D data, we seek two projection matrices with optimal numbers of directions to project the data into low-rank, noise-mitigated, and the most expressive subspaces, in which manifolds are adaptively updated according to the projections, and new data representation is built with respect to the projected data by accounting for nonlinearity via adaptive manifolds. Consequently, the learned subspaces and manifolds are clean and intrinsic, and the new data representation is discriminative and robust. Extensive experiments have been conducted and the results confirm the effectiveness of the proposed methodology and algorithm.

References

[1]

Kais Allab, Lazhar Labiod, and Mohamed Nadif. 2015. Simultaneous semi-NMF and PCA for clustering. In Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM’15). IEEE, Los Alamitos, CA, 679--684.

Digital Library

[2]

Sharon Alpert, Meirav Galun, Achi Brandt, and Ronen Basri. 2012. Image segmentation by probabilistic bottom-up aggregation and cue integration. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 2, 315--327.

Digital Library

[3]

Phipps Arabie. 1994. Cluster analysis in marketing research. In Advanced Methods in Marketing Research, R. P. Bagozzi (Ed.). Blackwell 8 Company, Oxford, England, 160--189.

[4]

Peter N. Belhumeur, João P. Hespanha, and David J. Kriegman. 1997. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 7, 711--720.

Digital Library

[5]

Alex Bewley and Ben Upcroft. 2013. Advantages of exploiting projection structure for segmenting dense 3D point clouds. In Proceedings of the Australian Conference on Robotics and Automation.

[6]

Antoni Buades, Bartomeu Coll, and J.-M. Morel. 2005. A non-local algorithm for image denoising. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2. IEEE, Los Alamitos, CA, 60--65.

Digital Library

[7]

Robin Burke. 2002. Hybrid recommender systems: Survey and experiments. User Modeling and User-Adapted Interaction 12, 4, 331--370.

Digital Library

[8]

Deng Cai, Xiaofei He, Jiawei Han, and Thomas S. Huang. 2011. Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 8, 1548--1560.

Digital Library

[9]

Deng Cai, Chiyuan Zhang, and Xiaofei He. 2010. Unsupervised feature selection for multi-cluster data. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 333--342.

Digital Library

[10]

Xiao Cai, Feiping Nie, and Heng Huang. 2013. Multi-view k-means clustering on big data. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2598--2604.

Digital Library

[11]

Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal component analysis? Journal of the ACM 58, 3, 11.

Digital Library

[12]

Wei-Chien Chang. 1983. On using principal components before separating a mixture of two multivariate normal distributions. Applied Statistics 32, 3, 267--275.

[13]

Fan R. K. Chung. 1997. Spectral Graph Theory. Vol. 92. American Mathematical Society.

Digital Library

[14]

Inderjit S. Dhillon, Yuqiang Guan, and Brian Kulis. 2007. Weighted graph cuts without eigenvectors a multilevel approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 11, 1944--1957.

Digital Library

[15]

Chris H. Q. Ding, Tao Li, and Michael I. Jordan. 2010. Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1, 45--55.

Digital Library

[16]

Ehsan Elhamifar and René Vidal. 2009. Sparse subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, Los Alamitos, CA, 2790--2797.

[17]

Roman Filipovych, Susan M. Resnick, and Christos Davatzikos. 2011. Semi-supervised cluster analysis of imaging data. NeuroImage 54, 3, 2185--2197.

[18]

Athinodoros S. Georghiades, Peter N. Belhumeur, and David J. Kriegman. 2001. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 6, 643--660.

Digital Library

[19]

Jiawei Han, Jian Pei, and Micheline Kamber. 2011. Data Mining: Concepts and Techniques. Elsevier.

Digital Library

[20]

Darryl Hond and Libor Spacek. 1997. Distinctive descriptions for face processing. In Proceedings of the British Machine Vision Conference (BMVC’97). 1--4.

[21]

Andreas Hotho, Steffen Staab, and Gerd Stumme. 2003. Ontologies improve text document clustering. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03). IEEE, Los Alamitos, CA, 541--544.

Digital Library

[22]

Yao Hu, Debing Zhang, Jieping Ye, Xuelong Li, and Xiaofei He. 2013. Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 9, 2117--2130.

Digital Library

[23]

Jin Huang, Feiping Nie, Heng Huang, and Chris Ding. 2014. Robust manifold nonnegative matrix factorization. ACM Transactions on Knowledge Discovery from Data 8, 3, 11.

Digital Library

[24]

Ian Jolliffe. 2002. Principal Component Analysis. Wiley Online Library.

[25]

Zhao Kang, Chong Peng, and Qiang Cheng. 2017. Kernel-driven similarity learning. Neurocomputing 267, C, 210--219.

Digital Library

[26]

Alexander Kolesnikov, Elena Trichina, and Tuomo Kauranne. 2015. Estimating the number of clusters in a numerical data set via quantization error modeling. Pattern Recognition 48, 3, 941--952.

Digital Library

[27]

Daniel D. Lee and H. Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755, 788--791.

[28]

Zhouchen Lin, Minming Chen, and Yi Ma. 2010. The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv:1009.5055.

[29]

Guangcan Liu and Shuicheng Yan. 2011. Latent low-rank representation for subspace segmentation and feature extraction. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV’11). IEEE, Los Alamitos, CA, 1615--1622.

Digital Library

[30]

Nina Mishra, Robert Schreiber, Isabelle Stanton, and Robert E. Tarjan. 2007. Clustering social networks. In Proceedings of the International Workshop on Algorithms and Models for the Web-Graph. 56--67.

Digital Library

[31]

Pabitra Mitra, C. A. Murthy, and Sankar K. Pal. 2002. Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 3, 301--312.

Digital Library

[32]

Andrew Y. Ng, Michael I. Jordan, Yair Weiss, and others. 2002. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 2, 849--856.

Digital Library

[33]

Vishal M. Patel, Hien Van Nguyen, and René Vidal. 2013. Latent space sparse subspace clustering. In Proceedings of the IEEE International Conference on Computer Vision. 225--232.

Digital Library

[34]

Chong Peng, Zhao Kang, and Qiang Cheng. 2017a. Integrating feature and graph learning with low-rank representation. Neurocomputing 249, 106--116.

[35]

Chong Peng, Zhao Kang, Yunhong Hu, Jie Cheng, and Qiang Cheng. 2017b. Nonnegative matrix factorization with integrated graph and feature learning. ACM Transactions on Intelligent Systems and Technology 8, 3, 42.

Digital Library

[36]

Chong Peng, Zhao Kang, Yunhong Hu, Jie Cheng, and Qiang Cheng. 2017c. Robust graph regularized nonnegative matrix factorization for clustering. ACM Transactions on Knowledge Discovery from Data 11, 3, 33.

Digital Library

[37]

Chong Peng, Zhao Kang, Huiqing Li, and Qiang Cheng. 2015. Subspace clustering using log-determinant rank approximation. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 925--934.

Digital Library

[38]

Chong Peng, Zhao Kang, Ming Yang, and Qiang Cheng. 2016. Feature selection embedded subspace clustering. IEEE Signal Processing Letters 23, 7, 1018--1022.

[39]

Ferdinando S. Samaria and Andy C. Harter. 1994. Parameterisation of a stochastic model for human face identification. In Proceedings of the 2nd IEEE Workshop on Applications of Computer Vision. IEEE, Los Alamitos, CA, 138--142.

[40]

Bernhard Schiilkopf. 2001. The kernel trick for distances. In Proceedings of the 2000 Conference on Advances in Neural Information Processing Systems, Vol. 13. 301.

Digital Library

[41]

Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. 1998. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10, 5, 1299--1319.

Digital Library

[42]

Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 8, 888--905.

Digital Library

[43]

Subhash K. Shinde and Uday Kulkarni. 2012. Hybrid personalized recommender system using centering-bunching based clustering algorithm. Expert Systems with Applications 39, 1, 1381--1387.

Digital Library

[44]

Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and Computing 17, 4, 395--416.

Digital Library

[45]

Jingyuan Wang, Qian Gu, Junjie Wu, Guannan Liu, and Zhang Xiong. 2016. Traffic speed prediction and congestion source exploration: A deep learning method. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, Los Alamitos, CA, 499--508.

[46]

Jim Jing-Yan Wang, Halima Bensmail, and Xin Gao. 2014. Feature selection and multi-kernel learning for sparse representation on a manifold. Neural Networks 51, 9--16.

Digital Library

[47]

Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, S. Yu Philip, et al. 2008. Top 10 algorithms in data mining. Knowledge and Information Systems 14, 1, 1--37.

Digital Library

[48]

Jian Yang, David Zhang, Alejandro F. Frangi, and Jing-Yu Yang. 2004. Two-dimensional PCA: A new approach to appearance-based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 1, 131--137.

Digital Library

[49]

Ron Zass and Amnon Shashua. 2005. A unifying approach to hard and probabilistic clustering. In Proceedings of the 10th International Conference on Computer Vision (ICCV’05), Vol. 1. IEEE, Los Alamitos, CA, 294--301.

Digital Library

[50]

Daoqiang Zhang and Zhi-Hua Zhou. 2005. (2D) 2PCA: Two-directional two-dimensional PCA for efficient face representation and recognition. Neurocomputing 69, 1, 224--231.

Digital Library

[51]

Min-Ling Zhang and Zhi-Hua Zhou. 2014. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering 26, 8, 1819--1837.

[52]

Li Zheng, Tao Li, and Chris Ding. 2010. Hierarchical ensemble clustering. In Proceedings of the 2010 IEEE 10th International Conference on Data Mining (ICDM’10). IEEE, Los Alamitos, CA, 1199--1204.

Digital Library

Cited By

Fränti PSieranoja S(2024)Clustering accuracyApplied Computing and Intelligence10.3934/aci.20240034:1(24-44)Online publication date: 2024
https://doi.org/10.3934/aci.2024003
Chen HLiu X(2023)Reweighted multi-view clustering with tissue-like P systemPLOS ONE10.1371/journal.pone.026987818:2(e0269878)Online publication date: 10-Feb-2023
https://doi.org/10.1371/journal.pone.0269878
Peng CZhang ZChen CKang ZCheng Q(2022)Two-dimensional semi-nonnegative matrix factorization for clusteringInformation Sciences10.1016/j.ins.2021.12.098Online publication date: Jan-2022
https://doi.org/10.1016/j.ins.2021.12.098
Show More Cited By

Index Terms

Integrate and Conquer: Double-Sided Two-Dimensional k-Means Via Integrating of Projection and Manifold Construction

Recommendations

Separable linear discriminant analysis

Linear discriminant analysis (LDA) is a popular technique for supervised dimension reduction. Due to the curse of dimensionality usually suffered by LDA when applied to 2D data, several two-dimensional LDA (2DLDA) methods have been proposed in recent ...
Face recognition using discriminant locality preserving projections based on maximum margin criterion

In this paper, we propose a new discriminant locality preserving projections based on maximum margin criterion (DLPP/MMC). DLPP/MMC seeks to maximize the difference, rather than the ratio, between the locality preserving between-class scatter and ...
Locality preserving projection with symmetric graph embedding for unsupervised dimensionality reduction
Highlights
- LPP_SGE is a new unsupervised projection and symmetric graph joint learning framework.
- LPP_SGE not only simultaneously considers the original space and subspace structures for graph learning but also considers the adaptive ...
Abstract
Preserving the intrinsic structure of data is very important for unsupervised dimensionality reduction. For structure preserving, graph embedding technique is widely considered. However, most of the existing unsupervised graph embedding based ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 9, Issue 5

Research Survey and Regular Papers

September 2018

274 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3210369

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2018

Accepted: 01 February 2018

Revised: 01 December 2017

Received: 01 May 2017

Published in TIST Volume 9, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Natural Science Foundation of China
Fundamental Research Fund for the Central Universities of China
Science and Technology Planning Project of Guangdong Province, China
Foundation Program of Yuncheng University
Research Project Supported by Shanxi Scholarship Council of China
National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
453
Total Downloads

Downloads (Last 12 months)52
Downloads (Last 6 weeks)9

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fränti PSieranoja S(2024)Clustering accuracyApplied Computing and Intelligence10.3934/aci.20240034:1(24-44)Online publication date: 2024
https://doi.org/10.3934/aci.2024003
Chen HLiu X(2023)Reweighted multi-view clustering with tissue-like P systemPLOS ONE10.1371/journal.pone.026987818:2(e0269878)Online publication date: 10-Feb-2023
https://doi.org/10.1371/journal.pone.0269878
Peng CZhang ZChen CKang ZCheng Q(2022)Two-dimensional semi-nonnegative matrix factorization for clusteringInformation Sciences10.1016/j.ins.2021.12.098Online publication date: Jan-2022
https://doi.org/10.1016/j.ins.2021.12.098
Huang YXiao QDu SYu Y(2022)Multi-view Clustering Based on Low-rank Representation and Adaptive Graph LearningNeural Processing Letters10.1007/s11063-021-10634-354:1(265-283)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1007/s11063-021-10634-3
Zhang WFeng XXiao FChen Y(2021)A Folded Concave Penalty Regularized Subspace Clustering Method to Integrate Affinity and ClusteringMathematical Problems in Engineering10.1155/2021/66411802021(1-13)Online publication date: 17-May-2021
https://doi.org/10.1155/2021/6641180
Xie YJia XShekhar SBao HZhou X(2021)Significant DBSCAN+: Statistically Robust Density-based ClusteringACM Transactions on Intelligent Systems and Technology10.1145/347484212:5(1-26)Online publication date: 24-Nov-2021
https://dl.acm.org/doi/10.1145/3474842
Peng CCheng Q(2021)Discriminative Ridge Machine: A Classifier for High-Dimensional Data or Imbalanced DataIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.300687732:6(2595-2609)Online publication date: Jun-2021
https://doi.org/10.1109/TNNLS.2020.3006877
Dai DTang JYu ZWong HYou JCao WHu YChen C(2021)An Inception Convolutional Autoencoder Model for Chinese Healthcare Question ClusteringIEEE Transactions on Cybernetics10.1109/TCYB.2019.291658051:4(2019-2031)Online publication date: Apr-2021
https://doi.org/10.1109/TCYB.2019.2916580
Peng CZhang ZKang ZChen CCheng Q(2021)Nonnegative matrix factorization with local similarity learningInformation Sciences10.1016/j.ins.2021.01.087562(325-346)Online publication date: Jul-2021
https://doi.org/10.1016/j.ins.2021.01.087
Ma ZKang ZLuo GTian LChen WWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Towards Clustering-friendly RepresentationsProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3413597(3081-3089)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3413597
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents