Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Clustering Heterogeneous Information Network by Joint Graph Embedding and Nonnegative Matrix Factorization

Published: 10 June 2021 Publication History

Abstract

Many complex systems derived from nature and society consist of multiple types of entities and heterogeneous interactions, which can be effectively modeled as heterogeneous information network (HIN). Structural analysis of heterogeneous networks is of great significance by leveraging the rich semantic information of objects and links in the heterogeneous networks. And, clustering heterogeneous networks aims to group vertices into classes, which sheds light on revealing the structure–function relations of the underlying systems. The current algorithms independently perform the feature extraction and clustering, which are criticized for not fully characterizing the structure of clusters. In this study, we propose a learning model by joint <underline>G</underline>raph <underline>E</underline>mbedding and <underline>N</underline>onnegative <underline>M</underline>atrix <underline>F</underline>actorization (aka GEjNMF), where feature extraction and clustering are simultaneously learned by exploiting the graph embedding and latent structure of networks. We formulate the objective function of GEjNMF and transform the heterogeneous network clustering problem into a constrained optimization problem, which is effectively solved by l0-norm optimization. The advantage of GEjNMF is that features are selected under the guidance of clustering, which improves the performance and saves the running time of algorithms at the same time. The experimental results on three benchmark heterogeneous networks demonstrate that GEjNMF achieves the best performance with the least running time compared with the best state-of-the-art methods. Furthermore, the proposed algorithm is robust across heterogeneous networks from various fields. The proposed model and method provide an effective alternative for heterogeneous network clustering.

References

[1]
Ralitsa Angelova, Gjergji Kasneci, and Gerhard Weikum. 2012. Graffiti: Graph-based classification in heterogeneous networks. World Wide Web 15, 2 (2012), 139–170.
[2]
Justin Balthrop, Stephanie Forrest, M.E.J. Newman, and Matthew M. Williamson. 2004. Technological networks and the spread of computer viruses. Science 304, 5670 (2004), 527–529.
[3]
Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. 2006. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research 7 (2006), 2399–2434.
[4]
Dimitri P. Bertsekas and Werner Rheinboldt. 1982. Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific.
[5]
Thomas Blumensath and Mike E. Davies. 2009. Iterative hard thresholding for compressed sensing. Applied and Computational Harmonic Analysis 27, 3 (2009), 265–274.
[6]
Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert Gorke, Martin Hoefer, Zoran Nikoloski, and Dorothea Wagner. 2008. On modularity clustering. IEEE Transactions on Knowledge and Data Engineering 20, 2 (2008), 172–188.
[7]
Deng Cai, Xiaofei He, Jiawei Han, and Thomas S. Huang. 2010. Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 8 (2010), 1548–1560.
[8]
Yun Chi, Xiaodan Song, Dengyong Zhou, Koji Hino, and Belle L. Tseng. 2009. On evolutionary spectral clustering. ACM Transactions on Knowledge Discovery from Data 3, 4 (2009), 1–30.
[9]
Leon Danon, Albert Diaz-Guilera, Jordi Duch, and Alex Arenas. 2005. Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment 2005, 09 (2005), P09008.
[10]
Chris H. Q. Ding, Tao Li, and Michael I. Jordan. 2008. Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1 (2008), 45–55.
[11]
Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 135–144.
[12]
Santo Fortunato and Darko Hric. 2016. Community detection in networks: A user guide. Physics Reports 659 (2016), 1–44.
[13]
Samuel P. Fraiberger, Roberta Sinatra, Magnus Resch, Christoph Riedl, and Albert-László Barabási. 2018. Quantifying reputation and success in art. Science 362, 6416 (2018), 825–829.
[14]
Tao-yang Fu, Wang-Chien Lee, and Zhen Lei. 2017. Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1797–1806.
[15]
Michelle Girvan and Mark E. J. Newman. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12 (2002), 7821–7826.
[16]
William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation learning on graphs: Methods and applications. IEEE Data Engineering Bulletin 40, 3 (2017), 52–74.
[17]
Magnus R. Hestenes. 1969. Multiplier and gradient methods. Journal of Optimization Theory and Applications 4, 5 (1969), 303–320.
[18]
Jin Huang, Feiping Nie, Heng Huang, and Chris Ding. 2014. Robust manifold nonnegative matrix factorization. ACM Transactions on Knowledge Discovery from Data 8, 3 (2014), 1–21.
[19]
Rana Hussein, Dingqi Yang, and Philippe Cudré-Mauroux. 2018. Are meta-paths necessary? Revisiting heterogeneous graph embeddings. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 437–446.
[20]
Ming Ji, Jiawei Han, and Marina Danilevsky. 2011. Ranking-based classification of heterogeneous information networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1298–1306.
[21]
Zhao Kang, Chong Peng, and Qiang Cheng. 2015. Robust PCA via nonconvex rank approximation. In Proceedings of the 2015 IEEE International Conference on Data Mining. IEEE, 211–220.
[22]
Elena Kuzmin, Benjamin VanderSluis, Wen Wang, Guihong Tan, Raamesh Deshpande, Yiqun Chen, Matej Usaj, Attila Balint, Mojca Mattiazzi Usaj, Jolanda van Leeuwen, Elizabeth N. Koch, Carles Pons, Andrius J. Dagilis, Michael Pryszlak, Jason Zi Yang Wang, Julia Hanchard, Margot Riggi, Kaicong Xu, Hamed Heydari, Bryan-Joseph San Luis, Ermira Shuteriqi, Hongwei Zhu, Nydia Van Dyk, Sara Sharifpoor, Michael Costanzo, Robbie Loewith, Amy Caudy, Daniel Bolnick, Grant W. Brown, Brenda J. Andrews, Charles Boone, and Chad L. Myers. 2018. Systematic analysis of complex genetic interactions. Science 360, 6386 (2018), eaao1729.
[23]
Daniel D. Lee and H. Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 (1999), 788.
[24]
Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization. In Proceedings of the Advances in Neural Information Processing Systems. 2177–2185.
[25]
Michael Ley. 2002. The DBLP computer science bibliography: Evolution, research issues, perspectives. In Proceedings of the International Symposium on String Processing and Information Retrieval. Springer, 1–10.
[26]
Xiang Li, Ben Kao, Zhaochun Ren, and Dawei Yin. 2019. Spectral clustering in heterogeneous information networks. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 4221–4228.
[27]
Zhouchen Lin, Minming Chen, and Yi Ma. 2010. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055 (2010).
[28]
Fuchen Liu, David Choi, Lu Xie, and Kathryn Roeder. 2018. Global spectral clustering in dynamic networks. Proceedings of the National Academy of Sciences 115, 5 (2018), 927–932.
[29]
Jialu Liu and Jiawei Han. 2013. HINMF: A matrix factorization method for clustering in heterogeneous information networks. In Proceedings of the 2013 IJCAI Workshop on Heterogeneous Information Network Analysis.
[30]
Michael Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. Harvard Business School NOM Unit, Working Paper12-016 (2016).
[31]
Xiaoke Ma and Di Dong. 2017. Evolutionary nonnegative matrix factorization algorithms for community detection in dynamic networks. IEEE Transactions on Knowledge and Data Engineering 29, 5 (2017), 1045–1058.
[32]
Xiaoke Ma, Di Dong, and Quan Wang. 2019. Community detection in multi-layer networks using joint nonnegative matrix factorization. IEEE Transactions on Knowledge and Data Engineering 31, 2 (2019), 273–286.
[33]
Xiaoke Ma, Penggang Sun, and Guimin Qin. 2017. Nonnegative matrix factorization algorithms for link prediction in temporal networks using graph communicability. Pattern Recognition 71 (2017), 361–374.
[34]
Xiaoke Ma, Penggang Sun, and Zhong-Yuan Zhang. 2019. An integrative framework for protein interaction network and methylation data to discover epigenetic modules. IEEE/ACM Transactions on Computational Biology and Bioinformatics 16, 6 (2019), 1855–1866.
[35]
Xiaoke Ma, Bingbo Wang, and Liang Yu. 2018. Semi-supervised spectral algorithms for community detection in complex networks based on equivalence of clustering methods. Physica A: Statistical Mechanics and Its Applications 490 (2018), 786–802.
[36]
Dimitrios Mavroeidis. 2010. Accelerating spectral clustering with partial supervision. Data Mining and Knowledge Discovery 21, 2 (2010), 241–258.
[37]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems. 3111–3119.
[38]
Sumit Negi and Santanu Chaudhury. 2016. Link prediction in heterogeneous social networks. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 609–617.
[39]
Chong Peng, Zhao Kang, Yunhong Hu, Jie Cheng, and Qiang Cheng. 2017. Robust graph regularized nonnegative matrix factorization for clustering. ACM Transactions on Knowledge Discovery from Data 11, 3 (2017), 1–30.
[40]
Satu Elisa Schaeffer. 2007. Graph clustering. Computer Science Review 1, 1 (2007), 27–64.
[41]
Peter H. Schönemann. 1966. A generalized solution of the orthogonal procrustes problem. Psychometrika 31, 1 (1966), 1–10.
[42]
Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S. Yu Philip. 2016. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 1 (2016), 17–37.
[43]
Chuan Shi, Ran Wang, Yitong Li, Philip S Yu, and Bin Wu. 2014. Ranking-based clustering on general heterogeneous information networks by network projection. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 699–708.
[44]
Yizhou Sun and Jiawei Han. 2013. Mining heterogeneous information networks: A structural analysis approach. ACM SIGKDD Explorations Newsletter 14, 2 (2013), 20–28.
[45]
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4, 11 (2011), 992–1003.
[46]
Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, and Tianyi Wu. 2009. Rankclus: Integrating clustering with ranking for heterogeneous information network analysis. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. 565–576.
[47]
Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S. Yu, and Xiao Yu. 2013. Pathselclus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. ACM Transactions on Knowledge Discovery from Data 7, 3 (2013), 11.
[48]
Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 797–806.
[49]
Jie Tang. 2016. AMiner: Toward understanding big scholar data. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. ACM, 467–467.
[50]
Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. Pte: Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1165–1174.
[51]
Marc Vidal, Michael E. Cusick, and Albert-László Barabási. 2011. Interactome networks and human disease. Cell 144, 6 (2011), 986–998.
[52]
Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research 11 (2010), 2837–2854.
[53]
Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and Computing 17, 4 (2007), 395–416.
[54]
Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schrödl. 2001. Constrained k-means clustering with background knowledge. In Proceedings of the 18th International Conference on Machine Learning. Vol. 1. 577–584.
[55]
Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. 2019. Heterogeneous Graph Attention Network. In Proceedings of the World Wide Web Conference. ACM, 2022–2032.
[56]
Siqi Wu, Antony Joseph, Ann S. Hammonds, Susan E. Celniker, Bin Yu, and Erwin Frise. 2016. Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks. Proceedings of the National Academy of Sciences 113, 16 (2016), 4290–4295.
[57]
Jiawei Zhang, Xiangnan Kong, and Philip S. Yu. 2014. Transferring heterogeneous links across location-based social networks. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining. ACM, 303–312.
[58]
Jiawei Zhang, Philip S. Yu, and Yuanhua Lv. 2015. Organizational chart inference. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1435–1444.
[59]
Lijun Zhang, Chun Chen, Jiajun Bu, Zhengguang Chen, Deng Cai, and Jiawei Han. 2012. Locally discriminative coclustering. IEEE Transactions on Knowledge and Data Engineering 24, 6 (2012), 1025–1035.
[60]
Lefei Zhang, Qian Zhang, Bo Du, Dacheng Tao, and Jane You. 2017. Robust manifold matrix factorization for joint clustering and feature extraction. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 1662–1668.
[61]
Xianchao Zhang, Haixin Li, Wenxin Liang, and Jiebo Luo. 2016. Multi-type co-clustering of general heterogeneous information networks via nonnegative matrix tri-factorization. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 1353–1358.
[62]
Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, and Philip S. Yu. 2015. Cosnet: Connecting heterogeneous social networks with local and global consistency. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1485–1494.
[63]
Ying Zhao and George Karypis. 2004. Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning 55, 3 (2004), 311–331.
[64]
Yang Zhou and Ling Liu. 2014. Activity-edge centric multi-label classification for mining heterogeneous information networks. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1276–1285.

Cited By

View all
  • (2025)Efficient method for symmetric nonnegative matrix factorization with an approximate augmented Lagrangian schemeJournal of Computational and Applied Mathematics10.1016/j.cam.2024.116218454(116218)Online publication date: Jan-2025
  • (2024)Research on the construction and reform path of online and offline mixed English teaching model in the internet eraJournal of Intelligent Systems10.1515/jisys-2023-023033:1Online publication date: 15-Mar-2024
  • (2024)Clustering on heterogeneous IoT information network based on meta pathScience Progress10.1177/00368504241257389107:2Online publication date: 17-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 4
August 2021
486 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3458847
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2021
Accepted: 01 December 2020
Revised: 01 September 2020
Received: 01 January 2020
Published in TKDD Volume 15, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Heterogeneous information network
  2. Non-negative matrix factorization
  3. clustering

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • NFSC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)227
  • Downloads (Last 6 weeks)39
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Efficient method for symmetric nonnegative matrix factorization with an approximate augmented Lagrangian schemeJournal of Computational and Applied Mathematics10.1016/j.cam.2024.116218454(116218)Online publication date: Jan-2025
  • (2024)Research on the construction and reform path of online and offline mixed English teaching model in the internet eraJournal of Intelligent Systems10.1515/jisys-2023-023033:1Online publication date: 15-Mar-2024
  • (2024)Clustering on heterogeneous IoT information network based on meta pathScience Progress10.1177/00368504241257389107:2Online publication date: 17-Jun-2024
  • (2024)Bayesian Graph Local Extrema Convolution with Long-tail Strategy for Misinformation DetectionACM Transactions on Knowledge Discovery from Data10.1145/363940818:4(1-21)Online publication date: 12-Feb-2024
  • (2024)Deep Neighborhood Structure-Preserving Hashing for Large-Scale Image RetrievalIEEE Transactions on Multimedia10.1109/TMM.2023.328976526(1881-1893)Online publication date: 1-Jan-2024
  • (2024)Relaxed Energy Preserving Hashing for Image RetrievalIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.335184125:7(7388-7400)Online publication date: 1-Jul-2024
  • (2024)Sparse graphs-based dynamic attention networksHeliyon10.1016/j.heliyon.2024.e3593810:16(e35938)Online publication date: Aug-2024
  • (2024)WHRIME: A weight-based recursive hierarchical RIME optimizer for breast cancer histopathology image segmentationDisplays10.1016/j.displa.2024.10264882(102648)Online publication date: Apr-2024
  • (2024)Hierarchical clustering algorithm based on natural local density peaksSignal, Image and Video Processing10.1007/s11760-024-03446-018:11(7989-8004)Online publication date: 11-Aug-2024
  • (2023)A Survey on Deep Hashing MethodsACM Transactions on Knowledge Discovery from Data10.1145/353262417:1(1-50)Online publication date: 20-Feb-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media