research-article

Public Access

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

Authors:

Matthew Walker,

Jiawei HanAuthors Info & Claims

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Pages 599 - 608

https://doi.org/10.1145/3357384.3357866

Published: 03 November 2019 Publication History

Abstract

Text-rich heterogeneous information networks (text-rich HINs) are ubiquitous in real-world applications. Hypernymy, also known as is-a relation or subclass-of relation, lays in the core of many knowledge graphs and benefits many downstream applications. Existing methods of hypernymy discovery either leverage textual patterns to extract explicitly mentioned hypernym-hyponym pairs, or learn a distributional representation for each term of interest based its context. These approaches rely on statistical signals from the textual corpus, and their effectiveness would therefore be hindered when the signals from the corpus are not sufficient for all terms of interest. In this work, we propose to discover hypernymy in text-rich HINs, which can introduce additional high-quality signals. We develop a new framework, named HyperMine, that exploits multi-granular contexts and combines signals from both text and network without human labeled data. HyperMine extends the definition of "context" to the scenario of text-rich HIN. For example, we can define typed nodes and communities as contexts. These contexts encode signals of different granularities and we feed them into a hypernymy inference model. HyperMine learns this model using weak supervision acquired based on high-precision textual patterns. Extensive experiments on two large real-world datasets demonstrate the effectiveness of HyperMine and the utility of modeling context granularity. We further show a case study that a high-quality taxonomy can be generated solely based on the hypernymy discovered by HyperMine.

References

[1]

Tuan Luu Anh, Jung-jae Kim, and See Kiong Ng. 2014. Taxonomy construction using syntactic contextual evidence. In EMNLP .

[2]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary G. Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In ISWC/ASWC .

Digital Library

[3]

Marco Baroni and Alessandro Lenci. 2010. Distributional Memory: A General Framework for Corpus-Based Semantics. Computational Linguistics (2010).

[4]

Georgeta Bordea, Paul Buitelaar, Stefano Faralli, and Roberto Navigli. 2015. SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval). In SemEval@NAACL-HLT .

[5]

Georgeta Bordea, Els Lefever, and Paul Buitelaar. 2016. SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2). In SemEval@NAACL-HLT .

[6]

Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard S"ackinger, and Roopak Shah. 1993. Signature Verification Using a Siamese Time Delay Neural Network. In IJPRAI .

[7]

José Camacho-Collados, Claudio Delli Bovi, Luis Espinosa Anke, Sergio Oramas, Tommaso Pasini, Enrico Santus, Vered Shwartz, Roberto Navigli, and Horacio Saggion. 2018. SemEval-2018 Task 9: Hypernym Discovery. In SemEval@NAACL-HLT .

[8]

Daoud Clarke. 2009. Context-theoretic semantics for natural language: an overview. In ACL-GEMS .

[9]

Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2018. A survey on network embedding. TKDE (2018).

[10]

Hongbo Deng, Jiawei Han, Bo Zhao, Yintao Yu, and Cindy Xide Lin. 2011. Probabilistic topic models with biased propagation on heterogeneous information networks. In KDD .

[11]

Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S Weld, and Alexander Yates. 2004. Web-scale information extraction in knowitall. In WWW .

[12]

Maayan Geffet and Ido Dagan. 2005. The distributional inclusion hypotheses and lexical entailment. In ACL .

[13]

Marti A Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Computational linguistics .

[14]

Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, and Xiaofang Zhou. 2017. Understand Short Texts by Harvesting and Analyzing Semantic Knowledge. TKDE (2017).

[15]

Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2015. Sensembed: Learning sense embeddings for word and relational similarity. In ACL .

[16]

Lili Kotlerman, Ido Dagan, Idan Szpektor, and Maayan Zhitomirsky-Geffet. 2010. Directional distributional similarity for lexical inference. Natural Language Engineering, Vol. 16, 4 (2010), 359--389.

Digital Library

[17]

Zornitsa Kozareva and Eduard Hovy. 2010. A semi-supervised method to learn and construct taxonomies using the web. In EMNLP .

[18]

Alessandro Lenci and Giulia Benotto. 2012. Identifying hypernyms in distributional semantic spaces. In ACL-SEM .

[19]

Omer Levy, Steffen Remus, Chris Biemann, and Ido Dagan. 2015. Do supervised distributional methods really learn lexical inference relations?. In NAACL .

[20]

Dekang Lin and others. 1998. An information-theoretic definition of similarity. In ICML .

[21]

Jialu Liu, Xiang Ren, Jingbo Shang, Taylor Cassidy, Clare R Voss, and Jiawei Han. 2016. Representing documents via latent keyphrase inference. In WWW .

[22]

Yuning Mao, Xiang Ren, Jiaming Shen, Xiaotao Gu, and Jiawei Han. 2018. End-to-End Reinforcement Learning for Automatic Taxonomy Induction. ACL (2018).

[23]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS .

[24]

Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. 2012. PATTY: a taxonomy of relational patterns with semantic types. In EMNLP .

[25]

Maximillian Nickel and Douwe Kiela. 2017. Poincaré embeddings for learning hierarchical representations. In NIPS .

[26]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In EMNLP .

[27]

Dragomir R. Radev, Pradeep Muthukrishnan, Vahed Qazvinian, and Amjad Abu-Jbara. 2009. The ACL anthology network corpus. LREC (2009).

[28]

Marek Rei and Ted Briscoe. 2014. Looking for hyponyms in vector space. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning. 68--77.

[29]

Marek Rei, Daniela Gerz, and Ivan Vulić. 2018. Scoring Lexical Entailment with a Supervised Directional Similarity Network. arXiv preprint arXiv:1805.09355 (2018).

[30]

Laura Rimell. 2014. Distributional lexical entailment by topic coherence. In EACL .

[31]

Stephen Roller, Katrin Erk, and Gemma Boleda. 2014. Inclusive yet selective: Supervised distributional hypernymy detection. In COLING .

[32]

Stephen Roller, Douwe Kiela, and Maximilian Nickel. 2018. Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora. In ACL .

[33]

Mark Sanderson and W. Bruce Croft. 1999. Deriving Concept Hierarchies from Text. In SIGIR .

[34]

Enrico Santus, Alessandro Lenci, Qin Lu, and S Schulte im Walde. 2014. Chasing hypernyms in vector spaces with entropy. EACL .

[35]

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, and Xinbing Wang. 2016. Modeling Topic-Level Academic Influence in Scientific Literatures. In AAAI SBD .

[36]

Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, and Jiawei Han. 2017. SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble. In ECML/PKDD .

[37]

Jiaming Shen, Zeqiu Wu, Dongming Lei, Chao Zhang, Xiang Ren, Michelle T. Vanni, Brian M. Sadler, and Jiawei Han. 2018a. HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion. In KDD .

[38]

Jiaming Shen, Jinfeng Xiao, Xinwei He, Jingbo Shang, Saurabh Sinha, and Jiawei Han. 2018b. Entity Set Search of Scientific Literature: An Unsupervised Ranking Approach. In SIGIR .

[39]

Zhihong Shen, Hao Ma, and Kuansan Wang. 2018. A Web-scale system for scientific knowledge exploration. In ACL .

[40]

Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. 2017. A survey of heterogeneous information network analysis. TKDE, Vol. 29, 1 (2017), 17--37.

Digital Library

[41]

Yu Shi, Qi Zhu, Fang Guo, Chao Zhang, and Jiawei Han. 2018a. Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks. In KDD .

[42]

Yu Shi, Qi Zhu, Fang Guo, Chao Zhang, and Jiawei Han. 2018b. Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks. In KDD .

[43]

Vered Shwartz and Ido Dagan. 2016a. CogALex-V Shared Task: LexNET - Integrated Path-based and Distributional Method for the Identification of Semantic Relations. In CogALex@COLING .

[44]

Vered Shwartz and Ido Dagan. 2016b. Path-based vs. distributional information in recognizing lexical semantic relations. arXiv preprint arXiv:1608.05014 (2016).

[45]

Vered Shwartz, Yoav Goldberg, and Ido Dagan. 2016. Improving hypernymy detection with an integrated path-based and distributional method. arXiv preprint arXiv:1603.06076 (2016).

[46]

Vered Shwartz, Enrico Santus, and Dominik Schlechtweg. 2017. Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection. In EACL .

[47]

Rion Snow, Daniel Jurafsky, and Andrew Y Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. In NIPS .

[48]

Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In WWW .

[49]

Yizhou Sun and Jiawei Han. 2013. Mining heterogeneous information networks: a structural analysis approach. SIGKDD Explorations, Vol. 14, 2 (2013), 20--28.

Digital Library

[50]

Peter D Turney and Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, Vol. 37 (2010), 141--188.

[51]

Paola Velardi, Stefano Faralli, and Roberto Navigli. 2013. Ontolearn reloaded: A graph-based algorithm for taxonomy induction. Computational Linguistics, Vol. 39, 3 (2013), 665--707.

[52]

Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM, Vol. 57 (2014), 78--85.

Digital Library

[53]

Chengyu Wang, Xiaofeng He, and Aoying Zhou. 2017. A Short Survey on Taxonomy Learning from Text Corpora: Issues, Resources and Recent Advances. In EMNLP .

[54]

Chenguang Wang, Yangqiu Song, Ahmed El-Kishky, Dan Roth, Ming Zhang, and Jiawei Han. 2015. Incorporating world knowledge to document clustering via heterogeneous information networks. In KDD .

[55]

Chenguang Wang, Yangqiu Song, Haoran Li, Yizhou Sun, Ming Zhang, and Jiawei Han. Distant meta-path similarities for text-based heterogeneous information networks. In CIKM .

[56]

Julie Weeds, David Weir, and Diana McCarthy. 2004. Characterising measures of lexical distributional similarity. In Proceedings of the 20th international conference on Computational Linguistics. Association for Computational Linguistics, 1015.

Digital Library

[57]

Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In SIGMOD .

Digital Library

[58]

Josuke Yamane, Tomoya Takatani, Hitoshi Yamada, Makoto Miwa, and Yutaka Sasaki. 2016. Distributional hypernym generation by jointly learning clusters and projections. In COLING .

[59]

Carl Yang, Yichen Feng, Pan Li, Yu Shi, and Jiawei Han. 2018a. Meta-graph based hin spectral embedding: Methods, analyses, and insights. In ICDM . 657--666.

[60]

Carl Yang, Mengxiong Liu, Frank He, Xikun Zhang, Jian Peng, and Jiawei Han. 2018b. Similarity Modeling on Heterogeneous Networks via Automatic Path Discovery. In ECML-PKDD . 37--54.

[61]

Wenpeng Yin and Dan Roth. 2018. Term Definitions Help Hypernymy Detection. In *SEM@NAACL-HLT .

[62]

Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian M. Sadler, Michelle T. Vanni, and Jiawei Han. 2018. TaxoGen: Constructing Topical Concept Taxonomy by Adaptive Term Embedding and Clustering. In KDD .

[63]

Chao Zhang, Guangyu Zhou, Quan Yuan, Honglei Zhuang, Yu Zheng, Lance M. Kaplan, Shaowen Wang, and Jiawei Han. 2016. GeoBurst: Real-Time Local Event Detection in Geo-Tagged Tweet Streams. In SIGIR .

[64]

Maayan Zhitomirsky-Geffet and Ido Dagan. 2005. The Distributional Inclusion Hypotheses and Lexical Entailment. In ACL .

[65]

Maayan Zhitomirsky-Geffet and Ido Dagan. 2009. Bootstrapping distributional feature vector quality. Computational linguistics, Vol. 35, 3 (2009), 435--461.

[66]

Honglei Zhuang, Jing Zhang, George Brova, Jie Tang, Hasan Cam, Xifeng Yan, and Jiawei Han. 2014. Mining query-based subnetwork outliers in heterogeneous information networks. In ICDM .

Cited By

Yu ZJin DWei JLi YLiu ZShang YHan JWu L(2024)TeKo: Text-Rich Graph Neural Networks With External KnowledgeIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328135435:10(14699-14711)Online publication date: Oct-2024
https://doi.org/10.1109/TNNLS.2023.3281354
Jana AVenkatesh GYimam SBiemann C(2022)Hypernymy Detection for Low-resource Languages: A Study for Hindi, Bengali, and AmharicACM Transactions on Asian and Low-Resource Language Information Processing10.1145/349038921:4(1-21)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3490389
Yang CXiao YZhang YSun YHan J(2022)Heterogeneous Network Representation Learning: A Unified Framework With Survey and BenchmarkIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.304592434:10(4854-4873)Online publication date: 1-Oct-2022
https://doi.org/10.1109/TKDE.2020.3045924
Show More Cited By

Index Terms

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity
1. Applied computing
  1. Enterprise computing
    1. Enterprise ontologies, taxonomies and vocabularies
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Sentence entailment in compositional distributional semantics

Distributional semantic models provide vector representations for words by gathering co-occurrence frequencies from corpora of text. Compositional distributional models extend these from words to phrases and sentences. In categorical compositional ...
Discovering lexical information by parsing arabic newspaper text
Acquisition of Hypernymy-Hyponymy Relation between Nouns for WordNet Building
IALP '10: Proceedings of the 2010 International Conference on Asian Language Processing

Automatic extraction of hypernym-hyponym pairs has been done in many researches. But none is described as an automatic method to incorporate the result to Word Net or on Word Net building. This paper proposes a method to automatically acquire hypernym-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

November 2019

3373 pages

ISBN:9781450369763

DOI:10.1145/3357384

General Chairs:
Wenwu Zhu
Tsinghua University, China
,
Dacheng Tao
University of Massachusetts, USA
,
Xueqi Cheng
Institute of Computing Technology, CAS, China
,
Program Chairs:
Peng Cui
Tsinghua University, China
,
Elke Rundensteiner
Worcester Polytechnic Institute, USA
,
David Carmel
Amazon Research, USA
,
Qi He
LinkedIn, USA
,
Jeffrey Xu Yu
Chinese University of Hong Kong, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

CIKM '19

Sponsor:

CIKM '19: The 28th ACM International Conference on Information and Knowledge Management

November 3 - 7, 2019

Beijing, China

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
5,416
Total Downloads

Downloads (Last 12 months)2,471
Downloads (Last 6 weeks)768

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yu ZJin DWei JLi YLiu ZShang YHan JWu L(2024)TeKo: Text-Rich Graph Neural Networks With External KnowledgeIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328135435:10(14699-14711)Online publication date: Oct-2024
https://doi.org/10.1109/TNNLS.2023.3281354
Jana AVenkatesh GYimam SBiemann C(2022)Hypernymy Detection for Low-resource Languages: A Study for Hindi, Bengali, and AmharicACM Transactions on Asian and Low-Resource Language Information Processing10.1145/349038921:4(1-21)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3490389
Yang CXiao YZhang YSun YHan J(2022)Heterogeneous Network Representation Learning: A Unified Framework With Survey and BenchmarkIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.304592434:10(4854-4873)Online publication date: 1-Oct-2022
https://doi.org/10.1109/TKDE.2020.3045924
Yu ZJin DLiu ZHe DWang XTong HHan J(2022)Embedding text-rich graph neural networks with sequence and topical semantic structuresKnowledge and Information Systems10.1007/s10115-022-01768-465:2(613-640)Online publication date: 17-Oct-2022
https://doi.org/10.1007/s10115-022-01768-4
Shen JHan JShen JHan J(2022)ConclusionsAutomated Taxonomy Discovery and Exploration10.1007/978-3-031-11405-2_6(101-103)Online publication date: 22-Sep-2022
https://doi.org/10.1007/978-3-031-11405-2_6
Shen JHan JShen JHan J(2022)Taxonomy ConstructionAutomated Taxonomy Discovery and Exploration10.1007/978-3-031-11405-2_3(31-48)Online publication date: 22-Sep-2022
https://doi.org/10.1007/978-3-031-11405-2_3
Shen JHan JShen JHan J(2022)IntroductionAutomated Taxonomy Discovery and Exploration10.1007/978-3-031-11405-2_1(1-8)Online publication date: 22-Sep-2022
https://doi.org/10.1007/978-3-031-11405-2_1
Yu ZJin DLiu ZHe DWang XTong HHan J(2021)AS-GCN: Adaptive Semantic Architecture of Graph Convolutional Networks for Text-Rich Networks2021 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM51629.2021.00095(837-846)Online publication date: Dec-2021
https://doi.org/10.1109/ICDM51629.2021.00095
Atzori MBalloccu S(2020)Fully-Unsupervised Embeddings-Based Hypernym DiscoveryInformation10.3390/info1105026811:5(268)Online publication date: 18-May-2020
https://doi.org/10.3390/info11050268

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents