Abstract
People often encounter two major problems for the practical clustering problems. One is the problem arising from improper extraction of feature sets, such as the weakness of the features and the feature vector usually has the property of high-dimensional and multisource. The other is that the outliers interfere with the clustering results. In this paper, we use the idea of co-clustering to cluster datasets and feature sources at the same time, and use the information which received from the information sharing between tasks to improve the accuracy of clustering tasks through the idea of multitask. And we used the advantage of the typical degree to construct a new parameter selection index to identify the outliers, and to correct each parameter by weakening the influence of the identified outliers on the clustering results. In order to reflect the applicability and robustness of the algorithm, we extend the algorithm to the non-precise dataset and evaluate the algorithm from multiple aspects through experiments. Experiments show that the proposed algorithms not only improve the clustering accuracy, but also greatly reduce the interference of outliers to clustering results.
















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Pedrycz W (2002) Collaborative fuzzy clustering. Pattern Recognit Lett 23(14):1675–1686
Loia V, Pedrycz W, Senatore S (2007) Semantic web content analysis: a study in proximity-based collaborative clustering. IEEE Trans Fuzzy Syst 15(6):1294–1312
Coletta Luiz FS, Vendramin L, Hruschka ER, Campello Ricardo JGB, Pedrycz W (2012) Collaborative fuzzy clustering algorithms: some refinements and design guidelines. IEEE Trans Fuzzy Syst 20(3):444–462
Mandhani B, Joshi S, Kummamuru K (2003) A matrix density based algorithm to hierarchically co-cluster documents and words. In: International Conference on World Wide Web, pp 511–518. https://doi.org/10.1145/775152.775225
Tjhi W-C, Chen L (2008) Dual fuzzy-possibilistic coclustering for categorization of documents. IEEE Trans Fuzzy Syst 17(3):532–543. https://doi.org/10.1109/TFUZZ.2008.924332
Yan Y, Chen L, Tjhi W-C (2013) Fuzzy semi-supervised co-clustering for text documents. Fuzzy Sets Syst 215:74–89. https://doi.org/10.1016/j.fss.2012.10.016
Huang S, Wang H, Lib D, Yang Y, Li T (2015) Spectral co-clustering ensemble. Knowl Based Syst 84:46–55
Laclau C, Nadif M (2016) Hard and fuzzy diagonal co-clustering for document-term partitioning. Neurocomputing 193:133–147
Zhang J, Zhang C (2011) Multitask Bregman clustering. Neurocomputing 74(10):1720–1734. https://doi.org/10.1016/j.neucom.2011.02.004
Zhang Z, Zhou J (2012) Multi-task clustering via domain adaptation. Pattern Recognit 45(1):465–473. https://doi.org/10.1016/j.patcog.2011.05.011
Zhang X, Zhang X (2013) Smart multi-task Bregman clustering and multi-task kernel clustering. In: Proceedings of the Twenty-Seventh AAAI conference on artificial intelligence, pp 1034–1040
Huy TN, Shao H, Tong B, Suzuki E (2013) A feature-free and parameter-light multi-task clustering framework. Knowl Inf Syst 36(1):251–276. https://doi.org/10.1007/s10115-012-0550-5
Yang P, Huang K, Liu C-L (2013) A multi-task framework for metric learning with common subspace. Neural Comput Appl 22(7–8):1337–1347. https://doi.org/10.1007/s00521-012-0956-8
Tang X, Miao Q, Quan Y, Tang J, Deng K (2015) Predicting individual retweet behavior by user similarity: a multi-task learning approach. Knowl Based Syst. 89:681–688. https://doi.org/10.1016/j.knosys.2015.09.008
Sokhandan A, Adibi P, Sajadi M (2017) Multitask fuzzy Bregman co-clustering approach for clustering data with multisource features. Neurocomputing 247:102–114. https://doi.org/10.1016/j.neucom.2017.03.062
Yang M-S, Ko C-H (1996) On a class of fuzzy c-numbers clustering procedures for fuzzy data. Fuzzy Sets Syst 84(1):49–60. https://doi.org/10.1016/0165-0114(95)00308-8
Hathaway RJ, Bezdek JC, Yingkang H (2000) Generalized fuzzy c-means clustering strategies using Lp norm distances. IEEE Trans Fuzzy Syst 8(5):576–582. https://doi.org/10.1109/91.873580
Lim CP, Kuan MM, Harrison RF (2005) Application of fuzzy ARTMAP and fuzzy c-means clustering to pattern classification with incomplete data. Neural Comput Appl 14(2):104–113. https://doi.org/10.1007/s00521-004-0445-9
Zhang H, Jing L (2009) Semi-supervised fuzzy clustering: a kernel-based approach. Knowl Based Syst 22(6):477–481. https://doi.org/10.1016/j.knosys.2009.06.009
Lam Y-K, Tsang PWM, Leung C-S (2013) PSO-based K-means clustering with enhanced cluster matching for gene expression data. Neural Comput Appl 22(7–8):1349–1355. https://doi.org/10.1007/s00521-012-0959-5
Izakian H, Pedrycz W (2014) Agreement-based fuzzy C-means for clustering data with blocks of features. Neurocomputing 127:266–280. https://doi.org/10.1016/j.neucom.2013.08.006
Hung W-L, Yang J-H (2015) Automatic clustering algorithm for fuzzy data. J Appl Stat 42(7):1503–1518. https://doi.org/10.1080/02664763.2014.1001326
Zhang H, Wang S, Xu X, Chow TWS, Wu QJ (2018) Tree2Vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2018.2797060.
Krishnapuram R, Keller JM (1993) A possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110. https://doi.org/10.1109/91.227387
Timm H, Borgelt C, Doring C, Kruse R (2004) An extension to possibilistic fuzzy cluster analysis. Fuzzy Sets Syst 147(1):3–16. https://doi.org/10.1016/j.fss.2003.11.009
Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530. https://doi.org/10.1109/TFUZZ.2004.840099
Yang M-S, Wub K-L (2006) Unsupervised possibilistic clustering. Pattern Recognit J Pattern Recognit Soc 39(1):5–21
Xie Z, Wang S, Chung FL (2008) An enhanced possibilistic C-Means clustering algorithm EPCM. Soft Comput 12(6):593–611. https://doi.org/10.1007/s00500-007-0231-6
Hamasuna Y, Endo Y, Miyamoto S (2010) On tolerant fuzzy c-means clustering and tolerant possibilistic clustering. Soft Comput 14(5):487–494. https://doi.org/10.1007/s00500-009-0451-z
Ferraro MB, Giordani P (2017) Possibilistic and fuzzy clustering methods for robust analysis of non-precise data. Int J Approx Reason 88:23–38. https://doi.org/10.1016/j.ijar.2017.05.002
Shanthi I, Valarmathi ML (2013) SAR image despeckling using possibilistic fuzzy C-means clustering and edge detection in bandelet domain. Neural Comput Appl 23:279–291. https://doi.org/10.1007/s00521-013-1394-y
Kannan SR, Devi R, Ramathilagam S, Hong TP (2017) Effective fuzzy possibilistic c-means: an analyzing cancer medical database. Soft Comput 21(11):2835–2845. https://doi.org/10.1007/s00500-016-2198-7
Truong HQ, Ngo LT, Pedrycz W (2017) Granular fuzzy possibilistic C-means clustering approach to DNA microarray problem. Knowl Based Syst 133:53–65. https://doi.org/10.1016/j.knosys.2017.06.019
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(8):841–847
Pedrycz W, Bezdek JC, Hathaway RJ, Rogers GW (1998) Two nonparametric models for fusing heterogeneous fuzzy data. IEEE Trans Fuzzy Syst 6(3):411–425. https://doi.org/10.1109/91.705509
Hung W-L, Yang M-S (2005) Fuzzy clustering on LR-type fuzzy numbers with an application in Taiwanese tea evaluation. Fuzzy Sets Syst 150(3):561–577. https://doi.org/10.1016/j.fss.2004.04.007
Quost B, Denoeux T (2010) Clustering fuzzy data using the fuzzy EM algorithm. Scalable Uncertain Manag 6379:333–346
Zarandi MHF, Razaee ZS (2011) A fuzzy clustering model for fuzzy data with outliers. Int J Fuzzy Syst Appl 1(2):29–42. https://doi.org/10.4018/ijfsa.2011040103
Coppi R, DUrso P, Giordani P (2012) Fuzzy and possibilistic clustering for fuzzy data. Comput Stat Data Anal 56(4):915–927. https://doi.org/10.1016/j.csda.2010.09.013
Zaki MJ, Meira W Jr (2014) Data mining and analysis: foundamental concepts and algorithms. Cambridge University Press, Cambridge, pp 425–428
Zhang H, Chow TWS, Wu QMJ (2016) Organizing books and authors by multilayer SOM. IEEE Transa Neural Netw Learn Syst 27(12):2537–2550. https://doi.org/10.1109/TNNLS.2015.2496281
Che J, Yang Y, Li L, Bai X, Zhang S, Deng C (2017) Maximum relevance minimum common redundancy feature selection for nonlinear data. Inf Sci 409–410(10):68–86. https://doi.org/10.1016/j.ins.2017.05.013
Acknowledgements
We are grateful to anonymous reviewers for their critical and valuable comments. This work was supported by the National Natural Science Foundation of China (Grant 61573266).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We wish to confirm that there are no known conflicts of interest associated with this publication. We also confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Ren, J., Yang, Y. Multitask possibilistic and fuzzy co-clustering algorithm for clustering data with multisource features. Neural Comput & Applic 32, 4785–4804 (2020). https://doi.org/10.1007/s00521-018-3851-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3851-0