Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3357384.3357866acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Public Access

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

Published: 03 November 2019 Publication History

Abstract

Text-rich heterogeneous information networks (text-rich HINs) are ubiquitous in real-world applications. Hypernymy, also known as is-a relation or subclass-of relation, lays in the core of many knowledge graphs and benefits many downstream applications. Existing methods of hypernymy discovery either leverage textual patterns to extract explicitly mentioned hypernym-hyponym pairs, or learn a distributional representation for each term of interest based its context. These approaches rely on statistical signals from the textual corpus, and their effectiveness would therefore be hindered when the signals from the corpus are not sufficient for all terms of interest. In this work, we propose to discover hypernymy in text-rich HINs, which can introduce additional high-quality signals. We develop a new framework, named HyperMine, that exploits multi-granular contexts and combines signals from both text and network without human labeled data. HyperMine extends the definition of "context" to the scenario of text-rich HIN. For example, we can define typed nodes and communities as contexts. These contexts encode signals of different granularities and we feed them into a hypernymy inference model. HyperMine learns this model using weak supervision acquired based on high-precision textual patterns. Extensive experiments on two large real-world datasets demonstrate the effectiveness of HyperMine and the utility of modeling context granularity. We further show a case study that a high-quality taxonomy can be generated solely based on the hypernymy discovered by HyperMine.

References

[1]
Tuan Luu Anh, Jung-jae Kim, and See Kiong Ng. 2014. Taxonomy construction using syntactic contextual evidence. In EMNLP .
[2]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary G. Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In ISWC/ASWC .
[3]
Marco Baroni and Alessandro Lenci. 2010. Distributional Memory: A General Framework for Corpus-Based Semantics. Computational Linguistics (2010).
[4]
Georgeta Bordea, Paul Buitelaar, Stefano Faralli, and Roberto Navigli. 2015. SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval). In SemEval@NAACL-HLT .
[5]
Georgeta Bordea, Els Lefever, and Paul Buitelaar. 2016. SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2). In SemEval@NAACL-HLT .
[6]
Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard S"ackinger, and Roopak Shah. 1993. Signature Verification Using a Siamese Time Delay Neural Network. In IJPRAI .
[7]
José Camacho-Collados, Claudio Delli Bovi, Luis Espinosa Anke, Sergio Oramas, Tommaso Pasini, Enrico Santus, Vered Shwartz, Roberto Navigli, and Horacio Saggion. 2018. SemEval-2018 Task 9: Hypernym Discovery. In SemEval@NAACL-HLT .
[8]
Daoud Clarke. 2009. Context-theoretic semantics for natural language: an overview. In ACL-GEMS .
[9]
Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2018. A survey on network embedding. TKDE (2018).
[10]
Hongbo Deng, Jiawei Han, Bo Zhao, Yintao Yu, and Cindy Xide Lin. 2011. Probabilistic topic models with biased propagation on heterogeneous information networks. In KDD .
[11]
Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S Weld, and Alexander Yates. 2004. Web-scale information extraction in knowitall. In WWW .
[12]
Maayan Geffet and Ido Dagan. 2005. The distributional inclusion hypotheses and lexical entailment. In ACL .
[13]
Marti A Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Computational linguistics .
[14]
Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, and Xiaofang Zhou. 2017. Understand Short Texts by Harvesting and Analyzing Semantic Knowledge. TKDE (2017).
[15]
Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2015. Sensembed: Learning sense embeddings for word and relational similarity. In ACL .
[16]
Lili Kotlerman, Ido Dagan, Idan Szpektor, and Maayan Zhitomirsky-Geffet. 2010. Directional distributional similarity for lexical inference. Natural Language Engineering, Vol. 16, 4 (2010), 359--389.
[17]
Zornitsa Kozareva and Eduard Hovy. 2010. A semi-supervised method to learn and construct taxonomies using the web. In EMNLP .
[18]
Alessandro Lenci and Giulia Benotto. 2012. Identifying hypernyms in distributional semantic spaces. In ACL-SEM .
[19]
Omer Levy, Steffen Remus, Chris Biemann, and Ido Dagan. 2015. Do supervised distributional methods really learn lexical inference relations?. In NAACL .
[20]
Dekang Lin and others. 1998. An information-theoretic definition of similarity. In ICML .
[21]
Jialu Liu, Xiang Ren, Jingbo Shang, Taylor Cassidy, Clare R Voss, and Jiawei Han. 2016. Representing documents via latent keyphrase inference. In WWW .
[22]
Yuning Mao, Xiang Ren, Jiaming Shen, Xiaotao Gu, and Jiawei Han. 2018. End-to-End Reinforcement Learning for Automatic Taxonomy Induction. ACL (2018).
[23]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS .
[24]
Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. 2012. PATTY: a taxonomy of relational patterns with semantic types. In EMNLP .
[25]
Maximillian Nickel and Douwe Kiela. 2017. Poincaré embeddings for learning hierarchical representations. In NIPS .
[26]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In EMNLP .
[27]
Dragomir R. Radev, Pradeep Muthukrishnan, Vahed Qazvinian, and Amjad Abu-Jbara. 2009. The ACL anthology network corpus. LREC (2009).
[28]
Marek Rei and Ted Briscoe. 2014. Looking for hyponyms in vector space. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning. 68--77.
[29]
Marek Rei, Daniela Gerz, and Ivan Vulić. 2018. Scoring Lexical Entailment with a Supervised Directional Similarity Network. arXiv preprint arXiv:1805.09355 (2018).
[30]
Laura Rimell. 2014. Distributional lexical entailment by topic coherence. In EACL .
[31]
Stephen Roller, Katrin Erk, and Gemma Boleda. 2014. Inclusive yet selective: Supervised distributional hypernymy detection. In COLING .
[32]
Stephen Roller, Douwe Kiela, and Maximilian Nickel. 2018. Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora. In ACL .
[33]
Mark Sanderson and W. Bruce Croft. 1999. Deriving Concept Hierarchies from Text. In SIGIR .
[34]
Enrico Santus, Alessandro Lenci, Qin Lu, and S Schulte im Walde. 2014. Chasing hypernyms in vector spaces with entropy. EACL .
[35]
Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, and Xinbing Wang. 2016. Modeling Topic-Level Academic Influence in Scientific Literatures. In AAAI SBD .
[36]
Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, and Jiawei Han. 2017. SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble. In ECML/PKDD .
[37]
Jiaming Shen, Zeqiu Wu, Dongming Lei, Chao Zhang, Xiang Ren, Michelle T. Vanni, Brian M. Sadler, and Jiawei Han. 2018a. HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion. In KDD .
[38]
Jiaming Shen, Jinfeng Xiao, Xinwei He, Jingbo Shang, Saurabh Sinha, and Jiawei Han. 2018b. Entity Set Search of Scientific Literature: An Unsupervised Ranking Approach. In SIGIR .
[39]
Zhihong Shen, Hao Ma, and Kuansan Wang. 2018. A Web-scale system for scientific knowledge exploration. In ACL .
[40]
Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. 2017. A survey of heterogeneous information network analysis. TKDE, Vol. 29, 1 (2017), 17--37.
[41]
Yu Shi, Qi Zhu, Fang Guo, Chao Zhang, and Jiawei Han. 2018a. Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks. In KDD .
[42]
Yu Shi, Qi Zhu, Fang Guo, Chao Zhang, and Jiawei Han. 2018b. Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks. In KDD .
[43]
Vered Shwartz and Ido Dagan. 2016a. CogALex-V Shared Task: LexNET - Integrated Path-based and Distributional Method for the Identification of Semantic Relations. In CogALex@COLING .
[44]
Vered Shwartz and Ido Dagan. 2016b. Path-based vs. distributional information in recognizing lexical semantic relations. arXiv preprint arXiv:1608.05014 (2016).
[45]
Vered Shwartz, Yoav Goldberg, and Ido Dagan. 2016. Improving hypernymy detection with an integrated path-based and distributional method. arXiv preprint arXiv:1603.06076 (2016).
[46]
Vered Shwartz, Enrico Santus, and Dominik Schlechtweg. 2017. Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection. In EACL .
[47]
Rion Snow, Daniel Jurafsky, and Andrew Y Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. In NIPS .
[48]
Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In WWW .
[49]
Yizhou Sun and Jiawei Han. 2013. Mining heterogeneous information networks: a structural analysis approach. SIGKDD Explorations, Vol. 14, 2 (2013), 20--28.
[50]
Peter D Turney and Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, Vol. 37 (2010), 141--188.
[51]
Paola Velardi, Stefano Faralli, and Roberto Navigli. 2013. Ontolearn reloaded: A graph-based algorithm for taxonomy induction. Computational Linguistics, Vol. 39, 3 (2013), 665--707.
[52]
Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM, Vol. 57 (2014), 78--85.
[53]
Chengyu Wang, Xiaofeng He, and Aoying Zhou. 2017. A Short Survey on Taxonomy Learning from Text Corpora: Issues, Resources and Recent Advances. In EMNLP .
[54]
Chenguang Wang, Yangqiu Song, Ahmed El-Kishky, Dan Roth, Ming Zhang, and Jiawei Han. 2015. Incorporating world knowledge to document clustering via heterogeneous information networks. In KDD .
[55]
Chenguang Wang, Yangqiu Song, Haoran Li, Yizhou Sun, Ming Zhang, and Jiawei Han. Distant meta-path similarities for text-based heterogeneous information networks. In CIKM .
[56]
Julie Weeds, David Weir, and Diana McCarthy. 2004. Characterising measures of lexical distributional similarity. In Proceedings of the 20th international conference on Computational Linguistics. Association for Computational Linguistics, 1015.
[57]
Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In SIGMOD .
[58]
Josuke Yamane, Tomoya Takatani, Hitoshi Yamada, Makoto Miwa, and Yutaka Sasaki. 2016. Distributional hypernym generation by jointly learning clusters and projections. In COLING .
[59]
Carl Yang, Yichen Feng, Pan Li, Yu Shi, and Jiawei Han. 2018a. Meta-graph based hin spectral embedding: Methods, analyses, and insights. In ICDM . 657--666.
[60]
Carl Yang, Mengxiong Liu, Frank He, Xikun Zhang, Jian Peng, and Jiawei Han. 2018b. Similarity Modeling on Heterogeneous Networks via Automatic Path Discovery. In ECML-PKDD . 37--54.
[61]
Wenpeng Yin and Dan Roth. 2018. Term Definitions Help Hypernymy Detection. In *SEM@NAACL-HLT .
[62]
Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian M. Sadler, Michelle T. Vanni, and Jiawei Han. 2018. TaxoGen: Constructing Topical Concept Taxonomy by Adaptive Term Embedding and Clustering. In KDD .
[63]
Chao Zhang, Guangyu Zhou, Quan Yuan, Honglei Zhuang, Yu Zheng, Lance M. Kaplan, Shaowen Wang, and Jiawei Han. 2016. GeoBurst: Real-Time Local Event Detection in Geo-Tagged Tweet Streams. In SIGIR .
[64]
Maayan Zhitomirsky-Geffet and Ido Dagan. 2005. The Distributional Inclusion Hypotheses and Lexical Entailment. In ACL .
[65]
Maayan Zhitomirsky-Geffet and Ido Dagan. 2009. Bootstrapping distributional feature vector quality. Computational linguistics, Vol. 35, 3 (2009), 435--461.
[66]
Honglei Zhuang, Jing Zhang, George Brova, Jie Tang, Hasan Cam, Xifeng Yan, and Jiawei Han. 2014. Mining query-based subnetwork outliers in heterogeneous information networks. In ICDM .

Cited By

View all
  • (2024)TeKo: Text-Rich Graph Neural Networks With External KnowledgeIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328135435:10(14699-14711)Online publication date: Oct-2024
  • (2022)Hypernymy Detection for Low-resource Languages: A Study for Hindi, Bengali, and AmharicACM Transactions on Asian and Low-Resource Language Information Processing10.1145/349038921:4(1-21)Online publication date: 4-Mar-2022
  • (2022)Heterogeneous Network Representation Learning: A Unified Framework With Survey and BenchmarkIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.304592434:10(4854-4873)Online publication date: 1-Oct-2022
  • Show More Cited By

Index Terms

  1. Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
      November 2019
      3373 pages
      ISBN:9781450369763
      DOI:10.1145/3357384
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 November 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. distributional inclusion hypothesis
      2. heterogeneous information network
      3. hypernymy discovery
      4. text-rich network

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      CIKM '19
      Sponsor:

      Acceptance Rates

      CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2,471
      • Downloads (Last 6 weeks)768
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)TeKo: Text-Rich Graph Neural Networks With External KnowledgeIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328135435:10(14699-14711)Online publication date: Oct-2024
      • (2022)Hypernymy Detection for Low-resource Languages: A Study for Hindi, Bengali, and AmharicACM Transactions on Asian and Low-Resource Language Information Processing10.1145/349038921:4(1-21)Online publication date: 4-Mar-2022
      • (2022)Heterogeneous Network Representation Learning: A Unified Framework With Survey and BenchmarkIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.304592434:10(4854-4873)Online publication date: 1-Oct-2022
      • (2022)Embedding text-rich graph neural networks with sequence and topical semantic structuresKnowledge and Information Systems10.1007/s10115-022-01768-465:2(613-640)Online publication date: 17-Oct-2022
      • (2022)ConclusionsAutomated Taxonomy Discovery and Exploration10.1007/978-3-031-11405-2_6(101-103)Online publication date: 22-Sep-2022
      • (2022)Taxonomy ConstructionAutomated Taxonomy Discovery and Exploration10.1007/978-3-031-11405-2_3(31-48)Online publication date: 22-Sep-2022
      • (2022)IntroductionAutomated Taxonomy Discovery and Exploration10.1007/978-3-031-11405-2_1(1-8)Online publication date: 22-Sep-2022
      • (2021)AS-GCN: Adaptive Semantic Architecture of Graph Convolutional Networks for Text-Rich Networks2021 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM51629.2021.00095(837-846)Online publication date: Dec-2021
      • (2020)Fully-Unsupervised Embeddings-Based Hypernym DiscoveryInformation10.3390/info1105026811:5(268)Online publication date: 18-May-2020

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media