Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3580305.3599376acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Free access

Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks

Published: 04 August 2023 Publication History

Abstract

Representation learning on networks aims to derive a meaningful vector representation for each node, thereby facilitating downstream tasks such as link prediction, node classification, and node clustering. In heterogeneous text-rich networks, this task is more challenging due to (1) presence or absence of text: Some nodes are associated with rich textual information, while others are not; (2) diversity of types: Nodes and edges of multiple types form a heterogeneous network structure. As pretrained language models (PLMs) have demonstrated their effectiveness in obtaining widely generalizable text representations, a substantial amount of effort has been made to incorporate PLMs into representation learning on text-rich networks. However, few of them can jointly consider heterogeneous structure (network) information as well as rich textual semantic information of each node effectively. In this paper, we propose Heterformer, a Heterogeneous Network-Empowered Transformer that performs contextualized text encoding and heterogeneous structure encoding in a unified model. Specifically, we inject heterogeneous structure information into each Transformer layer when encoding node texts. Meanwhile, Heterformer is capable of characterizing node/edge type heterogeneity and encoding nodes with or without texts. We conduct comprehensive experiments on three tasks (i.e., link prediction, node classification, and node clustering) on three large-scale datasets from different domains, where Heterformer outperforms competitive baselines significantly and consistently. The code can be found at https://github.com/PeterGriffinJin/Heterformer.

Supplementary Material

MOV File (990-2min-promo.mov)
Heterogeneous text-rich networks are everywhere in the real world, e.g., academic networks and social media networks. We propose Heterformer, a network-empowered Transformer architecture that simultaneously captures text semantics and heterogeneous structure information. Experiments are conducted in three real-world large-scale datasets, where we demonstrate the effectiveness of Heterformer.

References

[1]
Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. In NeurIPS.
[2]
Yukuo Cen, Xu Zou, Jianwei Zhang, Hongxia Yang, Jingren Zhou, and Jie Tang. 2019. Representation learning for attributed multiplex heterogeneous network. In KDD. 1358--1368.
[3]
Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S Dhillon. 2020. Taming pretrained transformers for extreme multi-label text classification. In KDD. 3163--3171.
[4]
Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In ICLR.
[5]
Peng Cui, Le Hu, and Yuanchao Liu. 2020. Enhancing extractive text summarization with topic-aware graph neural networks. In COLING.
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186.
[7]
Xin Luna Dong, Xiang He, Andrey Kan, Xian Li, Yan Liang, Jun Ma, Yifan Ethan Xu, Chenwei Zhang, Tong Zhao, Gabriel Blanco Saldana, et al. 2020a. AutoKnow: Self-driving knowledge collection for products of thousands of types. In KDD.
[8]
Yuxiao Dong, Ziniu Hu, Kuansan Wang, Yizhou Sun, and Jie Tang. 2020b. Heterogeneous Network Representation Learning. In IJCAI. 4861--4867.
[9]
Ahmed El-Kishky, Thomas Markovich, Serim Park, Chetan Verma, Baekjin Kim, Ramy Eskander, Yury Malkov, Frank Portman, Sof'ia Samaniego, Ying Xiao, et al. 2022. Twhin: Embedding the twitter heterogeneous information network for personalized recommendation. In KDD. 2842--2850.
[10]
Dumitru Erhan, Pierre-Antoine Manzagol, Yoshua Bengio, Samy Bengio, and Pascal Vincent. 2009. The difficulty of training deep architectures and the effect of unsupervised pre-training. In AISTATS. 153--160.
[11]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 249--256.
[12]
William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. NIPS. 1025--1035.
[13]
Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. 2020. Heterogeneous graph transformer. In WWW. 2704--2710.
[14]
Lawrence Hubert and Phipps Arabie. 1985. Comparing partitions. Journal of classification, Vol. 2, 1 (1985), 193--218.
[15]
Bowen Jin, Chen Gao, Xiangnan He, Depeng Jin, and Yong Li. 2020. Multi-behavior recommendation with graph convolutional networks. In SIGIR.
[16]
Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Xinyang Zhang, Qi Zhu, and Jiawei Han. 2023 b. Patton: Language Model Pretraining on Text-Rich Networks. ACL.
[17]
Bowen Jin, Yu Zhang, Yu Meng, and Jiawei Han. 2023 a. Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks. In ICLR.
[18]
Di Jin, Xiangchen Song, Zhizhi Yu, Ziyang Liu, Heling Zhang, Zhaomeng Cheng, and Jiawei Han. 2021. BiTe-GCN: A New GCN Architecture via Bidirectional Convolution of Topology and Features on Text-Rich Networks. In WSDM.
[19]
Tapas Kanungo, David M Mount, Nathan S Netanyahu, Christine D Piatko, Ruth Silverman, and Angela Y Wu. 2002. An efficient k-means clustering algorithm: Analysis and implementation. IEEE TPAMI, Vol. 24, 7 (2002), 881--892.
[20]
Vladimir Karpukhin, Barlas Oug uz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for open-domain question answering. In EMNLP.
[21]
Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR.
[22]
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR.
[23]
Chaozhuo Li, Bochen Pang, Yuming Liu, Hao Sun, Zheng Liu, Xing Xie, Tianqi Yang, Yanling Cui, Liangjie Zhang, and Qi Zhang. 2021. AdsGNN: Behavior-Graph Augmented Relevance Modeling in Sponsored Search. In SIGIR. 223--232.
[24]
Yang Liu. 2019. Fine-tune BERT for extractive summarization. arXiv preprint arXiv:1903.10318 (2019).
[25]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692 (2019).
[26]
Zhenghao Liu, Chenyan Xiong, Maosong Sun, and Zhiyuan Liu. 2020. Fine-grained Fact Verification with Kernel Graph Attention Network. In ACL.
[27]
Qingsong Lv, Ming Ding, Qiang Liu, Yuxiang Chen, Wenzheng Feng, Siming He, Chang Zhou, Jianguo Jiang, Yuxiao Dong, and Jie Tang. 2021. Are we really making much progress?: Revisiting, benchmarking and refining heterogeneous graph neural networks. In KDD. 1150--1160.
[28]
Miller McPherson, Lynn Smith-Lovin, and James M Cook. 2001. Birds of a feather: Homophily in social networks. Annual review of sociology, Vol. 27, 1 (2001), 415--444.
[29]
Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, and Jiawei Han. 2022. Topic discovery via latent space clustering of pretrained language model representations. In WWW. 3143--3152.
[30]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS. 3111--3119.
[31]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In EMNLP. 1532--1543.
[32]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD. 701--710.
[33]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In NAACL-HLT. 2227--2237.
[34]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.
[35]
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In ESWC. 593--607.
[36]
Jingbo Shang, Xinyang Zhang, Liyuan Liu, Sha Li, and Jiawei Han. 2020. Nettaxo: Automated topic taxonomy construction from text-rich network. In WWW.
[37]
Yu Shi, Jiaming Shen, Yuchen Li, Naijing Zhang, Xinwei He, Zhengzhi Lou, Qi Zhu, Matthew Walker, Myunghwan Kim, and Jiawei Han. 2019. Discovering hypernymy in text-rich heterogeneous information network by exploiting context granularity. In CIKM. 599--608.
[38]
Suzanna Sia, Ayush Dalmia, and Sabrina J Mielke. 2020. Tired of topic models? clusters of pretrained word embeddings make for fast and good topics too! EMNLP (2020).
[39]
Yizhou Sun and Jiawei Han. 2012. Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery, Vol. 3, 2 (2012), 1--159.
[40]
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. PVLDB, Vol. 4, 11 (2011), 992--1003.
[41]
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnetminer: extraction and mining of academic social networks. In KDD. 990--998.
[42]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. JMLR, Vol. 9, 11 (2008).
[43]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS. 5998--6008.
[44]
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR.
[45]
Mengting Wan and Julian McAuley. 2018. Item recommendation on monotonic behavior chains. RecSys. 86--94.
[46]
Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. 2019. Heterogeneous graph attention network. In WWW. 2022--2032.
[47]
Guangxu Xun, Kishlay Jha, Jianhui Sun, and Aidong Zhang. 2020. Correlation networks for extreme multi-label text classification. In KDD. 1074--1082.
[48]
Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun, and Jiawei Han. 2020. Heterogeneous network representation learning: A unified framework with survey and benchmark. IEEE TKDE (2020).
[49]
Jaewon Yang and Jure Leskovec. 2011. Patterns of temporal variation in online media. In WSDM. 177--186.
[50]
Junhan Yang, Zheng Liu, Shitao Xiao, Chaozhuo Li, Defu Lian, Sanjay Agrawal, Amit Singh, Guangzhong Sun, and Xing Xie. 2021. GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph. In NeurIPS.
[51]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In NeurIPS. 5754--5764.
[52]
Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim. 2019. Graph transformer networks. NeurIPS, Vol. 32 (2019), 11983--11993.
[53]
Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V Chawla. 2019a. Heterogeneous graph neural network. In KDD. 793--803.
[54]
Chuxu Zhang, Ananthram Swami, and Nitesh V Chawla. 2019b. Shne: Representation learning for semantic-associated heterogeneous networks. In WSDM.
[55]
Chao Zhang, Guangyu Zhou, Quan Yuan, Honglei Zhuang, Yu Zheng, Lance Kaplan, Shaowen Wang, and Jiawei Han. 2016. Geoburst: Real-time local event detection in geo-tagged tweet streams. In SIGIR. 513--522.
[56]
Xinyang Zhang, Chenwei Zhang, Xin Luna Dong, Jingbo Shang, and Jiawei Han. 2021b. Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks. In WWW. 3258--3268.
[57]
Yu Zhang, Zhihong Shen, Yuxiao Dong, Kuansan Wang, and Jiawei Han. 2021a. MATCH: Metadata-Aware Text Classification in A Large Hierarchy. In WWW.
[58]
Jie Zhou, Xu Han, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2019. GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification. In ACL. 892--901.
[59]
Jason Zhu, Yanling Cui, Yuming Liu, Hao Sun, Xue Li, Markus Pelger, Tianqi Yang, Liangjie Zhang, Ruofei Zhang, and Huasha Zhao. 2021. Textgnn: Improving text encoder via graph neural network in sponsored search. In WWW. 2848--2857.

Cited By

View all
  • (2024)MIMA: Multi-Feature Interaction Meta-Path Aggregation Heterogeneous Graph Neural Network for RecommendationsFuture Internet10.3390/fi1608027016:8(270)Online publication date: 29-Jul-2024
  • (2024)Deep Pre-Training Transformers for Scientific Paper RepresentationElectronics10.3390/electronics1311212313:11(2123)Online publication date: 29-May-2024
  • (2024)Text-Attributed Graph Representation Learning: Methods, Applications, and ChallengesCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3641255(1298-1301)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
      August 2023
      5996 pages
      ISBN:9798400701030
      DOI:10.1145/3580305
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 August 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. pretrained language model
      2. text-rich network
      3. transformer

      Qualifiers

      • Research-article

      Funding Sources

      • National Science Foundation IIS-17-41317
      • INCAS Program
      • National Science Foundation IIS-19-56151
      • the Institute for Geospatial Understanding through an Integrative Discovery Environment
      • US DARPA KAIROS Program
      • National Science Foundation IIS 17-04532
      • Molecule Maker Lab Institute

      Conference

      KDD '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)545
      • Downloads (Last 6 weeks)40
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)MIMA: Multi-Feature Interaction Meta-Path Aggregation Heterogeneous Graph Neural Network for RecommendationsFuture Internet10.3390/fi1608027016:8(270)Online publication date: 29-Jul-2024
      • (2024)Deep Pre-Training Transformers for Scientific Paper RepresentationElectronics10.3390/electronics1311212313:11(2123)Online publication date: 29-May-2024
      • (2024)Text-Attributed Graph Representation Learning: Methods, Applications, and ChallengesCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3641255(1298-1301)Online publication date: 13-May-2024
      • (2024)Text-Rich Graph Neural Networks With Subjective-Objective Semantic ModelingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337891436:9(4956-4967)Online publication date: Sep-2024
      • (2024)Predicting collaborative relationship among scholars by integrating scholars’ content-based and structure-based featuresScientometrics10.1007/s11192-024-05012-4129:6(3225-3244)Online publication date: 1-Jun-2024
      • (2024)Type-adaptive graph Transformer for heterogeneous information networksApplied Intelligence10.1007/s10489-024-05793-454:22(11496-11509)Online publication date: 24-Aug-2024
      • (2023)E2EG: End-to-End Node Classification Using Graph Topology and Text-based Node Attributes2023 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW60847.2023.00142(1084-1091)Online publication date: 4-Dec-2023

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media

      Access Granted

      The conference sponsors are committed to making content openly accessible in a timely manner.
      This article is provided by ACM and the conference, through the ACM OpenTOC service.