Infinite ensemble clustering

Liu, Hongfu; Shao, Ming; Li, Sheng; Fu, Yun

doi:10.1007/s10618-017-0539-5

Infinite ensemble clustering

Published: 20 August 2017

Volume 32, pages 385–416, (2018)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Hongfu Liu ORCID: orcid.org/0000-0002-0821-8640¹,
Ming Shao²,
Sheng Li³ &
…
Yun Fu^1,4

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Ensemble clustering aims to fuse several diverse basic partitions into a consensus one, which has been widely recognized as a promising tool to discover novel clusters and deliver robust partitions, while representation learning with deep structure shows appealing performance in unsupervised feature pre-treatment. In the literature, it has been empirically found that with the increasing number of basic partitions, ensemble clustering gets better performance and lower variances, yet the best number of basic partitions for a given data set is a pending problem. In light of this, we propose the Infinite Ensemble Clustering (IEC), which incorporates marginalized denoising auto-encoder with dropout noises to generate the expectation representation for infinite basic partitions. Generally speaking, a set of basic partitions is firstly generated from the data. Then by converting the basic partitions to the 1-of-K codings, we link the marginalized denoising auto-encoder to the infinite basic partition representation. Finally, we follow the layer-wise training procedure and feed the concatenated deep features to K-means for final clustering. According to different types of marginalized auto-encoders, the linear and non-linear versions of IEC are proposed. Extensive experiments on diverse vision data sets with different levels of visual descriptors demonstrate the superior performance of IEC compared to the state-of-the-art ensemble clustering and deep clustering methods. Moreover, we evaluate the performance of IEC in the application of pan-omics gene expression analysis application via survival analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble Clustering with Heterogeneous Transfer Learning

Accelerating Infinite Ensemble of Clustering by Pivot Features

Article 27 July 2018

Hierarchical Ensemble for Multi-view Clustering

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Ayad H, Kamel M (2008) Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans Pattern Anal Mach Intell 30(1):160–173
Article Google Scholar
Bengio Y (2009) Learning deep architectures for AI. Found Trends${\textregistered }$ Mach Learn 2(1):1–127
Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2007) Greedy layer-wise training of deep networks. Advances in neural information processing systems (NIPS-06), pp 153–160
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Biankin AV, Piantadosi S, Hollingsworth SJ (2015) Patient-centric trials for therapeutic development in precision oncology. Nature 526(7573):361–370
Article Google Scholar
Bolouri H, Zhao LP, Holland EC (2016) Big data visualization identifies the multidimensional molecular landscape of human gliomas. In: Proceedings of the national academy of sciences
Carreira-Perpinn M, Raziperchikolaei R (2015) Hashing with binary autoencoders. In: Proceedings of computer vision and pattern recognition
Chang HY, Nuyten DS, Sneddon JB, Hastie T, Tibshirani R, Sorlie T et al (2005) Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. In: Proceedings of the national academy of sciences
Chen M, Xu Z, Weinberger K, Sha F (2012) Marginalized stacked denoising autoencoders for domain adaptation. In: Proceedings of international conference on machine learning
Chen G, Sullivan PF, Kosorok MR (2013) Biclustering with heterogeneous variance. In: Proceedings of the national academy of sciences
Chen M, Weinberger K, Sha F, Bengio Y (2014) Marginalized denoising autoencoders for nonlinear representation. In: Proceedings of international conference on machine learning
Ding Z, Shao M, Fu Y (2015) Deep low-rank coding for transfer learning. In: Proceedings of AAAI conference on artificial intelligence
Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Trans Knowl Discov Data 2(4):17
Article Google Scholar
Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850
Article Google Scholar
Galdi P, Napolitano F, Tagliaferri R (2014) Consensus clustering in gene expression. In: International meeting on computational intelligence methods for bioinformatics and biostatistics
Ghifary M, Kleijn W, Zhang M, Balduzzi D (2015) Domain generalization for object recognition with multi-task autoencoders. In: Proceedings of international conference on computer vision
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Huang P, Huang Y, Wang W, Wang L (2014) Deep embedding network for clustering. In: Proceedings of international conference on pattern recognition
Iam-on N, Boongoen T, Garrett S (2010) Lce: a link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 26(12):1513–1519
Article Google Scholar
Kan M, Shan S, Chang H, Xilin C (2014) Stacked progressive auto-encoders (SPAE) for face recognition across poses. In: Proceedings of computer vision and pattern recognition
Li T, Chris D, Jordan M (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: Proceedings of international conference on data mining
Li S, Jiang Y, Zhou Z (2014) Partial multi-view clustering. In: Proceedings of AAAI conference on artificial intelligence
Liu H, Liu T, Wu J, Tao D, Fu Y (2015a) Spectral ensemble clustering. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining
Liu H, Wu J, Tao D, Zhang Y, Fu. Dias Y (2015b) A disassemble-assemble framework for highly sparse text clustering. In: Proceedings of SIAM international conference on data mining
Liu H, Shao M, Li S, Fu Y (2016) Infinite ensemble for image clustering. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining
Liu H, Wu J, Liu T, Tao D, Fu Y (2017a) Spectral ensemble clustering via weighted k-means: theoretical and practical evidence. IEEE Trans Knowl Data Eng 29(5):1129–1143
Article Google Scholar
Liu H, Zhao R, Fang H, Cheng F, Fu Y, Liu Y-Y (2017b) A novel clustering method for patient stratification. Bioinformatics 167:1–8
Lu Z, Peng Y, Xiao J (2008) From comparing clusterings to combining clusterings. In: Proceedings of AAAI conference on artificial intelligence
Luo D, Ding C, Huang H, Nie F (2011) Consensus spectral clustering in near-linear time. In: Proceedings of international conference on data engineering
Miller J, Rupert G (2011) Survival analysis. Wiley, New York
Google Scholar
Mirkin B (2001) Reinterpreting the category utility function. Mach Learn 45(2):219–228
Article MATH Google Scholar
Nguyen N, Caruana R (2007) Consensus clusterings. In: Proceedings of IEEE international conference on data mining
Shao M, Li S, Ding Z, Fu Y (2015) Deep linear coding for fast graph clustering. In: Proceedings of international joint conference on artificial intelligence
Song C, Liu F, Huang Y, Wang L, Tan T (2013) Auto-encoder based data clustering. In: Iberoamerican congress on pattern recognition. Springer, Berlin, Heidelberg, p 117–124
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining partitions. J Mach Learn Res 3:583–617
MathSciNet MATH Google Scholar
Tao Z, Liu H, Li S, Fu Y (2016) Robust spectral ensemble clustering. In: Proceedings of conference on information and knowledge management
Tao Z, Liu H, Fu Y (2017) Simultaneous clustering and ensemble. In: Proceedings of AAAI conference on artificial intelligence
Tian F, Gao B, Cui Q, Chen E, Liu T (2014) Learning deep representations for graph clustering. In: Proceedings of AAAI conference on artificial intelligence
Topchy A, Jain A, Punch W (2003) Combining multiple weak clusterings. In: Proceedings of international conference on data mining
Topchy A, Jain A, Punch W (2004) A mixture model for clustering ensembles. In: Proceedings of SIAM international conference on data mining
Uhlén M, Hallström BM, Lindskog C, Mardinoglu A, Pontén F, Nielsen J (2016) Transcriptomics resources of human tissues and organs. Mol Syst Biol, 12(4):862:1–12
Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recognit Artif Intell 25(3):337–372
Article MathSciNet Google Scholar
Vega-Pons S, Correa-Morris J, Ruiz-Shulcloper J (2010) Weighted partition consensus via kernels. Pattern Recognit 43(8):2712–2724
Article MATH Google Scholar
Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of international conference on machine learning
Wu J, Liu H, Xiong H, Cao J (2013) A theoretic framework of k-means-based consensus clustering. In: Proceedings of international joint conference on artificial intelligence
Wu J, Liu H, Xiong H, Cao J, Chen J (2015) K-means-based consensus clustering: a unified view. IEEE Trans Knowl Data Eng 27(1):155–169
Article Google Scholar
Xie G-S, Zhang X-Y, Liu C-L (2015) Efficient feature coding based on auto-encoder network for image classification. In: Proceedings of Asian conference on computer vision
Zhu Q, Wong AK, Krishnan A, Aure MR, Tadych A, Zhang R et al (2015) Targeted exploration and analysis of large cross-platform human transcriptomic compendia. Nat Methods 12(3):211–214
Article Google Scholar

Download references

Acknowledgements

This work is supported in part by the NSF IIS award 1651902, ONR Young Investigator Award N00014-14-1-0484, and U.S. Army Research Office Young Investigator Award W911NF-14-1-0218.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA
Hongfu Liu & Yun Fu
Department of Computer and Information Science, University of Massachusetts Dartmouth, Dartmouth, MA, USA
Ming Shao
Adobe Research, San Jose, CA, USA
Sheng Li
College of Computer and Information Science, Northeastern University, Boston, MA, USA
Yun Fu

Authors

Hongfu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ming Shao
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yun Fu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongfu Liu.

Additional information

Responsible editor: Pierre Baldi.

Appendix

See Tables 7, 8, 9, 10 and 11.

Table 7 Survival analysis of different clustering algorithms on protein expression data

Full size table

Table 8 Survival analysis of different clustering algorithms on miRNA expression data

Full size table

Table 9 Survival analysis of different clustering algorithms on mRNA expression data

Full size table

Table 10 Survival analysis of different clustering algorithms on SCNA data

Full size table

Table 11 Survival analysis of IEC on pan-omics gene expression

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Shao, M., Li, S. et al. Infinite ensemble clustering. Data Min Knowl Disc 32, 385–416 (2018). https://doi.org/10.1007/s10618-017-0539-5

Download citation

Received: 01 February 2017
Accepted: 07 August 2017
Published: 20 August 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10618-017-0539-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Infinite ensemble clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Ensemble Clustering with Heterogeneous Transfer Learning

Accelerating Infinite Ensemble of Clustering by Pivot Features

Hierarchical Ensemble for Multi-view Clustering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Infinite ensemble clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Ensemble Clustering with Heterogeneous Transfer Learning

Accelerating Infinite Ensemble of Clustering by Pivot Features

Hierarchical Ensemble for Multi-view Clustering

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation