Fast Fusion Clustering via Double Random Projection
Abstract
:1. Introduction
2. Methodology
2.1. Random Projection Based ADMM
- Bernoulli distribution-based random projections ADMM
- For the Lasso penalty [25],
- For SCAD penalty [26] with ,
- For the MCP [27] with ,
- For the TLP [8] with ,
Algorithm 1 RP-ADMM for fusion clustering |
Input: data ; Initialize , ; tuning parameter, Output: an estimate of for do compute using (5) compute using (6) compute using (7) if convergence criterion is met, then Stop and denote the last iteration by , else end if end for |
2.2. Selection of Optimal Tuning Parameter
2.3. Recursive RP-ADMM and Cluster Matrix
Algorithm 2 RRP-ADMM for fusion clustering |
Input: data ; M; Initialize , ; tuning parameter, Output: an estimate of for, M do compute using RP-ADMM end for while do compute and from (13) end while |
3. Simulation
3.1. Low-Dimensional Setting
3.2. High-Dimensional Setting
4. Real Data Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Proof of Lemma 1
Appendix B. Proof of Theorem 1
References
- Haq, M.A. CDLSTM: A novel model for climate change forecasting. Comput. Mater. Contin. 2022, 71, 2. [Google Scholar] [CrossRef]
- Haq, M.A. SMOTEDNN: A novel model for air pollution forecasting and AQI classification. Comput. Mater. Contin. 2022, 71, 1. [Google Scholar] [CrossRef]
- Van Der Kloot, W.A.; Spaans, A.M.J.; Heiser, W.J. Instability of hierarchical cluster analysis due to input order of the data: The PermuCLUSTER solution. Psychol. Methods 2005, 10, 468. [Google Scholar] [CrossRef] [PubMed]
- Xu, R.; Wunsch, D. Survey of clustering algorithms. IEEE Trans. Neural Netw. 2005, 16, 645–678. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Yan, X.; Huang, J. High-dimensional integrative analysis with homogeneity and sparsity recovery. J. Multivar. Anal. 2019, 174, 104529. [Google Scholar] [CrossRef]
- Chi, E.C.; Lange, K. Splitting methods for convex clustering. J. Comput. Graph. Stat. 2015, 24, 994–1013. [Google Scholar] [CrossRef] [PubMed]
- Lindsten, F.; Ohlsson, H.; Ljung, L. Clustering using sum-of-norms regularization: With application to particle filter output computation. In Proceedings of the 2011 IEEE Statistical Signal Processing Workshop (SSP), Nice, France, 28–30 June 2011; pp. 201–204. [Google Scholar] [CrossRef]
- Pan, W.; Shen, X.; Liu, B. Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty. J. Mach. Learn. Res. 2013, 14, 1865. [Google Scholar]
- Yang, X.; Yan, X. Mechanism and a new algorithm for nonconvex clustering. J. Stat. Comput. Sim. 2020, 90, 719–746. [Google Scholar] [CrossRef]
- Paul, D.; Chakraborty, S.; Das, S.; Xu, J. Implicit annealing in kernel spaces: A strongly consistent clustering approach. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 5862–5871. [Google Scholar] [CrossRef]
- Shah, S.A.; Koltun, V. Robust continuous clustering. Proc. Natl. Acad. Sci. USA 2017, 114, 9814–9819. [Google Scholar] [CrossRef]
- Hocking, T.D.; Joulin, A.; Bach, F.; Vert, J.P. Clusterpath an algorithm for clustering using convex fusion penalties. In Proceedings of the 28th International Conference on Machine Learning, Washington, DC, USA, 28 June–2 July 2011; p. 1. [Google Scholar]
- Radchenko, P.; Mukherjee, G. Convex clustering via l1 fusion penalization. J. R. Stat. Soc. B. 2017, 79, 1527–1546. [Google Scholar] [CrossRef]
- Wang, B.; Zhang, Y.; Sun, W.W.; Fang, Y. Sparse convex clustering. J. Comput. Graph. Stat. 2018, 27, 393–403. [Google Scholar] [CrossRef]
- Yan, X.; Yin, G.; Zhao, X. Subgroup analysis in censored linear regression. Stat. Sinica 2021, 31, 1027–1054. [Google Scholar] [CrossRef]
- Yan, X.; Wang, H.; Zhou, Y.; Yan, J.; Wang, Y.; Wang, W.; Xie, J.; Yang, S.; Zeng, Z.; Chen, X. Heterogeneous logistic regression for estimation of subgroup effects on hypertension. J. Biopharm. Stat. 2022, 32, 969–985. [Google Scholar] [CrossRef] [PubMed]
- Zhu, C.; Xu, H.; Leng, C.; Yan, S. Convex optimization procedure for clustering: Theoretical revisit. Adv. Neural Inf. Process. Syst. 2014, 1619–1627. [Google Scholar]
- Ma, S.; Huang, J. A concave pairwise fusion approach to subgroup analysis. J. Am. Stat. Assoc. 2017, 112, 410–423. [Google Scholar] [CrossRef]
- Ma, S.; Huang, J. Estimating subgroup-specific treatment effects via concave fusion. arXiv 2016, arXiv:1607.03717. [Google Scholar]
- Marchetti, Y.; Zhou, Q. Iterative subsampling in solution path clustering of noisy big data. arXiv 2014, arXiv:1412.1559. [Google Scholar] [CrossRef]
- Achlioptas, D. Database-Friendly Random Projections. In Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA, USA, 21–23 May 2001; Association for Computing Machinery: New York, NY, USA, 2001; pp. 274–281. [Google Scholar] [CrossRef]
- Ailon, N.; Chazelle, B. The Fast Johnson–Lindenstrauss Transform and Approximate Nearest Neighbors. SIAM J. Comput. 2009, 39, 302–322. [Google Scholar] [CrossRef]
- Bingham, E.; Mannila, H. Random Projection in Dimensionality Reduction: Applications to Image and Text Data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 26–29 August 2001; Association for Computing Machinery: New York, NY, USA, 2001; pp. 245–250. [Google Scholar] [CrossRef]
- Kane, D.M.; Nelson, J. Sparser johnson-lindenstrauss transforms. J. ACM 2014, 61, 1–23. [Google Scholar] [CrossRef]
- Tibshirani, R.; Walther, G. Cluster validation by prediction strength. J. Comput. Graph. Stat. 2005, 14, 511–528. [Google Scholar] [CrossRef]
- Fan, J.; Lv, J. Nonconcave penalized likelihood with NP-dimensionality. IEEE T. Inform. Theory 2011, 57, 5467–5484. [Google Scholar] [CrossRef]
- Zhang, C.-H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [PubMed]
- Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
- Ghadimi, E.; Teixeira, A.; Shames, I.; Johansson, M. Optimal parameter selection for the alternating direction method of multipliers (ADMM): Quadratic problems. IEEE Trans. Autom. Control 2014, 60, 644–658. [Google Scholar] [CrossRef]
- Liu, B.; Shen, X.; Pan, W. Integrative and regularized principal component analysis of multiple sources of data. Stat. Med. 2016, 35, 2235–2250. [Google Scholar] [CrossRef]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Rand, W.M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 1971, 66, 846–850. [Google Scholar] [CrossRef]
- Zheng, P.; Aravkin, A. Relax-and-split method for nonconvex inverse problems. Inverse Probl. 2020, 36, 095013. [Google Scholar] [CrossRef]
- Chakraborty, S.; Xu, J. Biconvex clustering. J. Comput. Graph. Stat. 2023, 32, 1524–1536. [Google Scholar] [CrossRef]
ADMM | RRP-ADMM | RS-ADMM | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Sample Size | SI | RI | Time | SI | RI | Time | SI | RI | Time | ||
0.081 | 0.921 | 7 | 0.059 | 0.933 | 2 | 0.080 | 0.925 | 10 | |||
0.058 | 0.945 | 88 | 0.046 | 0.957 | 7 | 0.056 | 0.947 | 121 | |||
0.049 | 0.962 | 352 | 0.045 | 0.974 | 17 | 0.047 | 0.966 | 551 | |||
0.042 | 0.973 | 1582 | 0.040 | 0.986 | 41 | 0.042 | 0.978 | 1864 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, H.; Li, N.; Zhou, Y.; Yan, J.; Jiang, B.; Kong, L.; Yan, X. Fast Fusion Clustering via Double Random Projection. Entropy 2024, 26, 376. https://doi.org/10.3390/e26050376
Wang H, Li N, Zhou Y, Yan J, Jiang B, Kong L, Yan X. Fast Fusion Clustering via Double Random Projection. Entropy. 2024; 26(5):376. https://doi.org/10.3390/e26050376
Chicago/Turabian StyleWang, Hongni, Na Li, Yanqiu Zhou, Jingxin Yan, Bei Jiang, Linglong Kong, and Xiaodong Yan. 2024. "Fast Fusion Clustering via Double Random Projection" Entropy 26, no. 5: 376. https://doi.org/10.3390/e26050376
APA StyleWang, H., Li, N., Zhou, Y., Yan, J., Jiang, B., Kong, L., & Yan, X. (2024). Fast Fusion Clustering via Double Random Projection. Entropy, 26(5), 376. https://doi.org/10.3390/e26050376