Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Infinite ensemble clustering

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Ensemble clustering aims to fuse several diverse basic partitions into a consensus one, which has been widely recognized as a promising tool to discover novel clusters and deliver robust partitions, while representation learning with deep structure shows appealing performance in unsupervised feature pre-treatment. In the literature, it has been empirically found that with the increasing number of basic partitions, ensemble clustering gets better performance and lower variances, yet the best number of basic partitions for a given data set is a pending problem. In light of this, we propose the Infinite Ensemble Clustering (IEC), which incorporates marginalized denoising auto-encoder with dropout noises to generate the expectation representation for infinite basic partitions. Generally speaking, a set of basic partitions is firstly generated from the data. Then by converting the basic partitions to the 1-of-K codings, we link the marginalized denoising auto-encoder to the infinite basic partition representation. Finally, we follow the layer-wise training procedure and feed the concatenated deep features to K-means for final clustering. According to different types of marginalized auto-encoders, the linear and non-linear versions of IEC are proposed. Extensive experiments on diverse vision data sets with different levels of visual descriptors demonstrate the superior performance of IEC compared to the state-of-the-art ensemble clustering and deep clustering methods. Moreover, we evaluate the performance of IEC in the application of pan-omics gene expression analysis application via survival analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://deeplearning.net/software/theano/.

  2. http://archive.ics.uci.edu/ml.

  3. https://www.eecs.berkeley.edu/~jhoffman/domainadapt.

  4. http://www.cad.zju.edu.cn/home/dengcai.

  5. http://www.cs.dartmouth.edu/~chenfang.

  6. https://cancergenome.nih.gov/.

  7. https://cran.r-project.org/web/packages/survival/index.html.

References

  • Ayad H, Kamel M (2008) Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans Pattern Anal Mach Intell 30(1):160–173

    Article  Google Scholar 

  • Bengio Y (2009) Learning deep architectures for AI. Found Trends\({\textregistered }\) Mach Learn 2(1):1–127

  • Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2007) Greedy layer-wise training of deep networks. Advances in neural information processing systems (NIPS-06), pp 153–160

  • Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  • Biankin AV, Piantadosi S, Hollingsworth SJ (2015) Patient-centric trials for therapeutic development in precision oncology. Nature 526(7573):361–370

    Article  Google Scholar 

  • Bolouri H, Zhao LP, Holland EC (2016) Big data visualization identifies the multidimensional molecular landscape of human gliomas. In: Proceedings of the national academy of sciences

  • Carreira-Perpinn M, Raziperchikolaei R (2015) Hashing with binary autoencoders. In: Proceedings of computer vision and pattern recognition

  • Chang HY, Nuyten DS, Sneddon JB, Hastie T, Tibshirani R, Sorlie T et al (2005) Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. In: Proceedings of the national academy of sciences

  • Chen M, Xu Z, Weinberger K, Sha F (2012) Marginalized stacked denoising autoencoders for domain adaptation. In: Proceedings of international conference on machine learning

  • Chen G, Sullivan PF, Kosorok MR (2013) Biclustering with heterogeneous variance. In: Proceedings of the national academy of sciences

  • Chen M, Weinberger K, Sha F, Bengio Y (2014) Marginalized denoising autoencoders for nonlinear representation. In: Proceedings of international conference on machine learning

  • Ding Z, Shao M, Fu Y (2015) Deep low-rank coding for transfer learning. In: Proceedings of AAAI conference on artificial intelligence

  • Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. ACM Trans Knowl Discov Data 2(4):17

    Article  Google Scholar 

  • Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850

    Article  Google Scholar 

  • Galdi P, Napolitano F, Tagliaferri R (2014) Consensus clustering in gene expression. In: International meeting on computational intelligence methods for bioinformatics and biostatistics

  • Ghifary M, Kleijn W, Zhang M, Balduzzi D (2015) Domain generalization for object recognition with multi-task autoencoders. In: Proceedings of international conference on computer vision

  • Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  • Huang P, Huang Y, Wang W, Wang L (2014) Deep embedding network for clustering. In: Proceedings of international conference on pattern recognition

  • Iam-on N, Boongoen T, Garrett S (2010) Lce: a link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 26(12):1513–1519

    Article  Google Scholar 

  • Kan M, Shan S, Chang H, Xilin C (2014) Stacked progressive auto-encoders (SPAE) for face recognition across poses. In: Proceedings of computer vision and pattern recognition

  • Li T, Chris D, Jordan M (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: Proceedings of international conference on data mining

  • Li S, Jiang Y, Zhou Z (2014) Partial multi-view clustering. In: Proceedings of AAAI conference on artificial intelligence

  • Liu H, Liu T, Wu J, Tao D, Fu Y (2015a) Spectral ensemble clustering. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining

  • Liu H, Wu J, Tao D, Zhang Y, Fu. Dias Y (2015b) A disassemble-assemble framework for highly sparse text clustering. In: Proceedings of SIAM international conference on data mining

  • Liu H, Shao M, Li S, Fu Y (2016) Infinite ensemble for image clustering. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining

  • Liu H, Wu J, Liu T, Tao D, Fu Y (2017a) Spectral ensemble clustering via weighted k-means: theoretical and practical evidence. IEEE Trans Knowl Data Eng 29(5):1129–1143

    Article  Google Scholar 

  • Liu H, Zhao R, Fang H, Cheng F, Fu Y, Liu Y-Y (2017b) A novel clustering method for patient stratification. Bioinformatics 167:1–8

  • Lu Z, Peng Y, Xiao J (2008) From comparing clusterings to combining clusterings. In: Proceedings of AAAI conference on artificial intelligence

  • Luo D, Ding C, Huang H, Nie F (2011) Consensus spectral clustering in near-linear time. In: Proceedings of international conference on data engineering

  • Miller J, Rupert G (2011) Survival analysis. Wiley, New York

    Google Scholar 

  • Mirkin B (2001) Reinterpreting the category utility function. Mach Learn 45(2):219–228

    Article  MATH  Google Scholar 

  • Nguyen N, Caruana R (2007) Consensus clusterings. In: Proceedings of IEEE international conference on data mining

  • Shao M, Li S, Ding Z, Fu Y (2015) Deep linear coding for fast graph clustering. In: Proceedings of international joint conference on artificial intelligence

  • Song C, Liu F, Huang Y, Wang L, Tan T (2013) Auto-encoder based data clustering. In: Iberoamerican congress on pattern recognition. Springer, Berlin, Heidelberg, p 117–124

  • Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  • Tao Z, Liu H, Li S, Fu Y (2016) Robust spectral ensemble clustering. In: Proceedings of conference on information and knowledge management

  • Tao Z, Liu H, Fu Y (2017) Simultaneous clustering and ensemble. In: Proceedings of AAAI conference on artificial intelligence

  • Tian F, Gao B, Cui Q, Chen E, Liu T (2014) Learning deep representations for graph clustering. In: Proceedings of AAAI conference on artificial intelligence

  • Topchy A, Jain A, Punch W (2003) Combining multiple weak clusterings. In: Proceedings of international conference on data mining

  • Topchy A, Jain A, Punch W (2004) A mixture model for clustering ensembles. In: Proceedings of SIAM international conference on data mining

  • Uhlén M, Hallström BM, Lindskog C, Mardinoglu A, Pontén F, Nielsen J (2016) Transcriptomics resources of human tissues and organs. Mol Syst Biol, 12(4):862:1–12

  • Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recognit Artif Intell 25(3):337–372

    Article  MathSciNet  Google Scholar 

  • Vega-Pons S, Correa-Morris J, Ruiz-Shulcloper J (2010) Weighted partition consensus via kernels. Pattern Recognit 43(8):2712–2724

    Article  MATH  Google Scholar 

  • Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of international conference on machine learning

  • Wu J, Liu H, Xiong H, Cao J (2013) A theoretic framework of k-means-based consensus clustering. In: Proceedings of international joint conference on artificial intelligence

  • Wu J, Liu H, Xiong H, Cao J, Chen J (2015) K-means-based consensus clustering: a unified view. IEEE Trans Knowl Data Eng 27(1):155–169

    Article  Google Scholar 

  • Xie G-S, Zhang X-Y, Liu C-L (2015) Efficient feature coding based on auto-encoder network for image classification. In: Proceedings of Asian conference on computer vision

  • Zhu Q, Wong AK, Krishnan A, Aure MR, Tadych A, Zhang R et al (2015) Targeted exploration and analysis of large cross-platform human transcriptomic compendia. Nat Methods 12(3):211–214

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by the NSF IIS award 1651902, ONR Young Investigator Award N00014-14-1-0484, and U.S. Army Research Office Young Investigator Award W911NF-14-1-0218.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongfu Liu.

Additional information

Responsible editor: Pierre Baldi.

Appendix

Appendix

See Tables 7, 8, 9, 10 and 11.

Table 7 Survival analysis of different clustering algorithms on protein expression data
Table 8 Survival analysis of different clustering algorithms on miRNA expression data
Table 9 Survival analysis of different clustering algorithms on mRNA expression data
Table 10 Survival analysis of different clustering algorithms on SCNA data
Table 11 Survival analysis of IEC on pan-omics gene expression

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Shao, M., Li, S. et al. Infinite ensemble clustering. Data Min Knowl Disc 32, 385–416 (2018). https://doi.org/10.1007/s10618-017-0539-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-017-0539-5

Keywords