Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

T-Distributed Stochastic Neighbor Embedding for Co-Representation Learning

Published: 22 February 2024 Publication History

Abstract

Co-clustering is the simultaneous clustering of the samples and attributes of a data matrix that provides deeper insight into data than traditional clustering. However, there is a lack of representation learning algorithms that serve this mechanism of co-clustering, and the current representation learning algorithms are limited to the sample perspective and lack the use of information in the attribute perspective. To solve this problem, in this article, ctSNE, a co-representation learning model based on t-distributed stochastic neighbor embedding, is proposed for unsupervised co-clustering, where ctSNE makes the dataset representation outputted more discriminative of row and column clusters (i.e. co-discrimination). On the basis of t-distributed stochastic neighbor embedding retaining the sample data distribution and local data structure, the philosophy of collaboration is introduced (i.e., row and column hidden relationship information) so that the ctSNE model is equipped with co-representation learning capability, which can effectively improve the performance of co-clustering. To prove the effectiveness of the ctSNE model, several classic co-clustering algorithms are used to check the co-representation performance of ctSNE, and a novel internal index based on an internal clustering index, known as total inertia, is proposed to demonstrate the effect of co-clustering. The numerous experimental results show that ctSNE has tremendous co-representation capability and can significantly improve the performance of co-clustering algorithms.

References

[1]
Herve Abdi and Lynne J. Williams. 2010. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2, 4 (2010), 433–459.
[2]
Ahsan Abdullah and Amir Hussain. 2015. A cognitively inspired approach to two-way cluster extraction from one-way clustered data. Cognitive Computation 7, 1 (2015), 161–182.
[3]
Motoshi Abe, Junichi Miyao, and Takio Kurita. 2021. q-SNE: Visualizing using q-Gaussian distributed stochastic neighbor embedding. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR ’21). 1051–1058.
[4]
Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, Srujana Merugu, and Dharmendra S. Modha. 2004. A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 509–514.
[5]
Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1798–1828.
[6]
Clemens Bloechl, Rana Ali Amjad, and Bernhard C. Geiger. 2018. Co-clustering via information-theoretic Markov aggregation. IEEE Transactions on Knowledge and Data Engineering 31, 4 (2018), 720–732.
[7]
Wei Chen, Hongjun Wang, Zhiguo Long, and Tianrui Li. 2023. Fast flexible bipartite graph model for co-clustering. IEEE Transactions on Knowledge and Data Engineering 35, 7 (2023), 6930–6940.
[8]
Xiaojun Chen, Joshua Z. Huang, Qingyao Wu, and Min Yang. 2017. Subspace weighting co-clustering of gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 16, 2 (2017), 352–364.
[9]
Yizong Cheng and George M. Church. 2000. Biclustering of expression data. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, Vol. 8. 93–103.
[10]
Yichen Cheng, Xinlei Wang, and Yusen Xia. 2021. Supervised t-distributed stochastic neighbor embedding for data visualization and classification. INFORMS Journal on Computing 33, 2 (2021), 566–585.
[11]
Nicoletta Del Buono and Gianvito Pio. 2015. Non-negative matrix tri-factorization for co-clustering: An analysis of the block matrix. Information Sciences 301 (2015), 13–26.
[12]
Ping Deng, Tianrui Li, Hongjun Wang, Shi-Jinn Horng, Zeng Yu, and Xiaomin Wang. 2021. Tri-regularized nonnegative matrix tri-factorization for co-clustering. Knowledge-Based Systems 226 (2021), 107101.
[13]
Inderjit S. Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 269–274.
[14]
Inderjit S. Dhillon, Subramanyam Mallela, and Dharmendra S. Modha. 2003. Information-theoretic co-clustering. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 89–98.
[15]
Liang Feng, Qianchuan Zhao, and Cangqi Zhou. 2020. Improving performances of top-N recommendations with co-clustering method. Expert Systems with Applications 143 (2020), 113078.
[16]
Gérard Govaert and Mohamed Nadif. 2013. Co-Clustering: Models, Algorithms and Applications. Wiley-ISTE.
[17]
Quanquan Gu and Jie Zhou. 2009. Co-clustering on manifolds. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 359–368.
[18]
Francesco Gullo, A. K. M. Khaled Talukder, Sean Luke, Carlotta Domeniconi, and Andrea Tagarelli. 2012. Multiobjective optimization of co-clustering ensembles. In Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation. 1495–1496.
[19]
John A. Hartigan. 1972. Direct clustering of a data matrix. Journal of the American Statistical Association 67, 337 (1972), 123–129.
[20]
Geoffrey E. Hinton and Sam Roweis. 2002. Stochastic neighbor embedding. Advances in Neural Information Processing Systems 15 (2002), 1–8.
[21]
Daniel Lee and H. Sebastian Seung. 2000. Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems 13 (2000), 1–7.
[22]
Jaejun Lee, Hyun Chul Lee, and Tomasz Palczewski. 2021. CI-GAN: Co-clustering by information maximizing generative adversarial networks. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME ’21). 1–6.
[23]
Hao Li, Meng Wang, and Xian-Sheng Hua. 2009. MSRA-MM 2.0: A large-scale web multimedia dataset. In Proceedings of the 2009 IEEE International Conference on Data Mining Workshops. 164–169.
[24]
Zehao Liu, Hongjun Wang, Wei Chen, Luqing Wang, and Tianrui Li. 2022. Bilateral discriminative autoencoder model orienting co-representation learning. Knowledge-Based Systems 245 (2022), 108653. DOI:
[25]
Zhoumin Lu, Genggeng Liu, and Shiping Wang. 2020. Sparse neighbor constrained co-clustering via category consistency learning. Knowledge-Based Systems 201 (2020), 105987.
[26]
Dalton Lunga and Okan Ersoy. 2012. Spherical stochastic neighbor embedding of hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing 51, 2 (2012), 857–871.
[27]
Amir Mirzaeinia and Mostafa Hassanalian. 2019. Minimum-cost drone–nest matching through the Kuhn–Munkres algorithm in smart cities: Energy management and efficiency enhancement. Aerospace 6, 11 (2019), 125.
[28]
Feiping Nie, Xiaoqian Wang, Cheng Deng, and Heng Huang. 2017. Learning a structured optimal bipartite graph for co-clustering. Advances in Neural Information Processing Systems 30 (2017), 1–10.
[29]
Nicola Pezzotti, Thomas Höllt, B. Lelieveldt, Elmar Eisemann, and Anna Vilanova. 2016. Hierarchical stochastic neighbor embedding. Computer Graphics Forum 35 (2016), 21–30.
[30]
Rodolphe Priam, Mohamed Nadif, and Gérard Govaert. 2013. Gaussian topographic co-clustering model. In Proceedings of the International Symposium on Intelligent Data Analysis. 345–356.
[31]
Jiaqi Ren and Youlong Yang. 2020. Multitask possibilistic and fuzzy co-clustering algorithm for clustering data with multisource features. Neural Computing and Applications 32, 9 (2020), 4785–4804.
[32]
Karl Rohe, Tai Qin, and Bin Yu. 2016. Co-clustering directed graphs to discover asymmetries and directional communities. Proceedings of the National Academy of Sciences 113, 45 (2016), 12679–12684.
[33]
François Role, Stanislas Morbieu, and Mohamed Nadif. 2019. CoClust: A Python package for co-clustering. Journal of Statistical Software 88 (2019), 1–29.
[34]
Sam T. Roweis and Lawrence K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 5500 (2000), 2323–2326.
[35]
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533–536.
[36]
Erich Schubert and Michael Gertz. 2017. Intrinsic t-stochastic neighbor embedding for visualization and outlier detection. In Proceedings of the International Conference on Similarity Search and Applications. 188–203.
[37]
Margot Selosse, Julien Jacques, and Christophe Biernacki. 2020. Textual data summarization using the self-organized co-clustering model. Pattern Recognition 103 (2020), 107315.
[38]
Hanhuai Shan and Arindam Banerjee. [n.d.] Bayesian co-clustering. In Proceedings of the 8th IEEE International Conference on Data Mining. 530–539.
[39]
Xiaoxiao Shi, Wei Fan, and S. Yu Philip. 2010. Efficient semi-supervised spectral co-clustering with constraints. In Proceedings of the 2010 IEEE International Conference on Data Mining. 1043–1048.
[40]
Yosra Ben Slimen, Sylvain Allio, and Julien Jacques. 2018. Model-based co-clustering for functional data. Neurocomputing 291 (2018), 97–108.
[41]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008).
[42]
Nha Van Pham, Long The Pham, Thao Duc Nguyen, and Long Thanh Ngo. 2018. A new cluster tendency assessment method for fuzzy co-clustering in hyperspectral image analysis. Neurocomputing 307 (2018), 213–226.
[43]
Hua Wang, Feiping Nie, Heng Huang, and Fillia Makedon. 2011. Fast nonnegative matrix tri-factorization for large-scale data co-clustering. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence.
[44]
Shangming Yang, Yongguo Liu, Qiaoqin Li, Wen Yang, Yi Zhang, and Chuanbiao Wen. 2020. Non-negative matrix factorization with symmetric manifold regularization. Neural Processing Letters 51, 1 (2020), 723–748.

Cited By

View all
  • (2024)Co-clustering: A Survey of the Main Methods, Recent Trends, and Open ProblemsACM Computing Surveys10.1145/369887557:2(1-33)Online publication date: 4-Oct-2024
  • (2024)On the Attractive and Repulsive Forces of Generalized Stochastic Neighbor Embedding With Alpha-DivergenceIEEE Access10.1109/ACCESS.2024.342042512(90380-90394)Online publication date: 2024
  • (2024)Detecting of Robotic Imitation of Human on-the-Website Activity With Advanced Vector Analysis and Fractional DerivativesIEEE Access10.1109/ACCESS.2024.339137712(56707-56718)Online publication date: 2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 15, Issue 2
April 2024
481 pages
EISSN:2157-6912
DOI:10.1145/3613561
  • Editor:
  • Huan Liu
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2024
Online AM: 13 October 2023
Accepted: 29 September 2023
Revised: 26 September 2023
Received: 12 November 2022
Published in TIST Volume 15, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Co-clustering
  2. co-representation
  3. t-distribution stochastic neighbor embedding
  4. unsupervised learning

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • Open Fund of the Key Lab of Internet Natural Language Processing of the Sichuan Provincial Education Department, Leshan Normal University

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)206
  • Downloads (Last 6 weeks)7
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Co-clustering: A Survey of the Main Methods, Recent Trends, and Open ProblemsACM Computing Surveys10.1145/369887557:2(1-33)Online publication date: 4-Oct-2024
  • (2024)On the Attractive and Repulsive Forces of Generalized Stochastic Neighbor Embedding With Alpha-DivergenceIEEE Access10.1109/ACCESS.2024.342042512(90380-90394)Online publication date: 2024
  • (2024)Detecting of Robotic Imitation of Human on-the-Website Activity With Advanced Vector Analysis and Fractional DerivativesIEEE Access10.1109/ACCESS.2024.339137712(56707-56718)Online publication date: 2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media