research-article

Integrating Local Context and Global Cohesiveness for Open Information Extraction

Authors:

Ahmed El-Kishky,

Jiawei HanAuthors Info & Claims

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pages 42 - 50

https://doi.org/10.1145/3289600.3291030

Published: 30 January 2019 Publication History

Abstract

Extracting entities and their relations from text is an important task for understanding massive text corpora. Open information extraction (IE) systems mine relation tuples (i.e., entity arguments and a predicate string to describe their relation) from sentences. These relation tuples are not confined to a predefined schema for the relations of interests. However, current Open IE systems focus on modeling local context information in a sentence to extract relation tuples, while ignoring the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions. In this paper, we propose a novel Open IE system, called ReMine, which integrates local context signals and global structural signals in a unified, distant-supervision framework. Leveraging facts from external knowledge bases as supervision, the new system can be applied to many different domains to facilitate sentence-level tuple extractions using corpus-level statistics. Our system operates by solving a joint optimization problem to unify (1) segmenting entity/relation phrases in individual sentences based on local context; and (2) measuring the quality of tuples extracted from individual sentences with a translating-based objective. Learning the two subtasks jointly helps correct errors produced in each subtask so that they can mutually enhance each other. Experiments on two real-world corpora from different domains demonstrate the effectiveness, generality, and robustness of ReMine when compared to state-of-the-art open IE systems.

References

[1]

Armen Allahverdyan and Aram Galstyan. 2011. Comparative analysis of viterbi training and maximum likelihood estimation for hmms. In NIPS.

Digital Library

[2]

Gabor Angeli, Melvin Johnson Premkumar, and Christopher D Manning. 2015. Leveraging linguistic structure for open domain information extraction. In ACL.

[3]

Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In IJCAI.

Digital Library

[4]

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD.

Digital Library

[5]

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In NIPS.

Digital Library

[6]

Razvan C. Bunescu and Raymond J. Mooney. 2007. Learning to Extract Relations from the Web using Minimal Supervision. In ACL.

[7]

Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R Hruschka Jr, and Tom M Mitchell. 2010. Toward an architecture for never-ending language learning. In AAAI.

Digital Library

[8]

Andrew Carlson, Justin Betteridge, Richard C Wang, Estevam R Hruschka Jr, and Tom M Mitchell. 2010. Coupled semi-supervised learning for information extraction. In WSDM.

Digital Library

[9]

Joachim Daiber, Max Jakob, Chris Hokamp, and Pablo N. Mendes. 2013. Improving Efficiency and Accuracy in Multilingual Entity Extraction. In I-Semantics.

Digital Library

[10]

Luciano Del Corro and Rainer Gemulla. 2013. Clausie: clause-based open information extraction. In WWW.

Digital Library

[11]

Xin Luna Dong, Thomas Strohmann, Shaohua Sun, andWei Zhang. 2014. Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion. In KDD.

[12]

Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In EMNLP.

Digital Library

[13]

Anthony Fader, Luke Zettlemoyer, and Oren Etzioni. 2014. Open Question Answering Over Curated and Extracted Knowledge Bases. KDD (2014).

Digital Library

[14]

Kiril Gashteovski, Rainer Gemulla, and Luciano Del Corro. 2017. MinIE: minimizing facts in open information extraction. In EMNLP.

[15]

Kelvin Guu, John Miller, and Percy Liang. 2015. Traversing knowledge graphs in vector space. In EMNLP.

[16]

Marti A Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In ACL.

Digital Library

[17]

Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören Auer, et al. 2015. DBpedia--a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 2 (2015), 167--195.

[18]

Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In AAAI, Vol. 15. 2181--2187.

Digital Library

[19]

Xiao Ling and Daniel S Weld. 2012. Fine-Grained Entity Recognition. In AAAI.

Digital Library

[20]

Jialu Liu, Jingbo Shang, and Jiawei Han. {n. d.}. Phrase Mining from Massive Text and Its Applications. Vol. 9.

Digital Library

[21]

Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, and Jiawei Han. 2015. Mining quality phrases from massive text corpora. In SIGMOD.

Digital Library

[22]

Liyuan Liu, Jingbo Shang, Frank F. Xu, Xiang Ren, Huan Gui, Jian Peng, and Jiawei Han. 2017. Empower sequence labeling with task-aware neural language model. arXiv preprint arXiv:1709.04109 (2017).

[23]

Yuanfei Luo, Quan Wang, Bin Wang, and Li Guo. 2015. Context-dependent knowledge graph embedding. In EMNLP.

[24]

Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bidirectional lstm-cnns-crf. In ACL.

[25]

Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The stanford corenlp natural language processing toolkit. In ACL (System Demonstrations).

[26]

Tom M Mitchell, William W Cohen, Estevam R Hruschka Jr, Partha Pratim Talukdar, Justin Betteridge, Andrew Carlson, Bhavana Dalvi Mishra, Matthew Gardner, Bryan Kisiel, Jayant Krishnamurthy, et al. 2015. Never ending learning. In AAAI.

Digital Library

[27]

Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. 2012. PATTY: A taxonomy of relational patterns with semantic types. In EMNLP-CoNLL.

Digital Library

[28]

Feng Niu, Ce Zhang, Christopher Ré, and Jude W Shavlik. 2012. Deepdive: Webscale knowledge-base construction using statistical learning and inference. In VLDS.

[29]

Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M Marlin. 2013. Relation extraction with matrix factorization and universal schemas. In NAACL.

[30]

Tim Rocktäschel, Sameer Singh, and Sebastian Riedel. 2015. Injecting logical background knowledge into embeddings for relation extraction. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1119--1129.

[31]

Sunita Sarawagi. 2008. Information extraction. Foundations and trends in databases 1, 3 (2008), 261--377.

Digital Library

[32]

Michael Schmitz, Robert Bart, Stephen Soderland, Oren Etzioni, et al. 2012. Open language learning for information extraction. In EMNLP-CoNLL.

Digital Library

[33]

Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, and Jiawei Han. 2017. Automated phrase mining from massive text corpora. arXiv preprint arXiv:1702.04457 (2017).

[34]

Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems. 926--934.

Digital Library

[35]

Huan Sun, Hao Ma, Wen-tau Yih, Chen-Tse Tsai, Jingjing Liu, and Ming-Wei Chang. 2015. Open domain question answering via semantic enrichment. In WWW.

Digital Library

[36]

Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. 2015. Representing text for joint embedding of text and knowledge bases. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1499--1509.

[37]

Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge graph embedding by translating on hyperplanes. In AAAI.

Digital Library

[38]

Chao Zhang, Guangyu Zhou, Quan Yuan, Honglei Zhuang, Yu Zheng, Lance Kaplan, Shaowen Wang, and Jiawei Han. 2016. Geoburst: Real-time local event detection in geo-tagged tweet streams. In SIGIR.

Digital Library

Cited By

Qu MGao TXhonneux LTang JDaumé HSingh A(2020)Few-shot relation extraction via Bayesian meta-learning on relation graphsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525667(7867-7876)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525667
Dai YQi JZhang RCaverlee JHu XLalmas MWang W(2020)Joint Recognition of Names and Publications in Academic HomepagesProceedings of the 13th International Conference on Web Search and Data Mining10.1145/3336191.3371771(133-141)Online publication date: 20-Jan-2020
https://dl.acm.org/doi/10.1145/3336191.3371771
Meng YHuang JShang JHan J(2019)TextCubeProceedings of the VLDB Endowment10.14778/3352063.335211312:12(1974-1977)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.14778/3352063.3352113

Integrating Local Context and Global Cohesiveness for Open Information Extraction
1. Computing methodologies
  1. Artificial intelligence
2. Information systems
  1. Information systems applications

Recommendations

Open Information Extraction with Global Structure Constraints
WWW '18: Companion Proceedings of the The Web Conference 2018

Extracting entities and their relations from text is an important task for understanding massive text corpora. Open information extraction (IE) systems mine relation tuples (i.e., entity arguments and a predicate string to describe their relation) from ...
Distant supervision for treatment relation extraction by leveraging MeSH subheadings
Highlights
- MeSH subheadings are leveraged to generate examples from PubMed abstracts for distantly supervised treatment relation extraction.
Abstract
The growing body of knowledge in biomedicine is too vast for human consumption. Hence there is a need for automated systems able to navigate and distill the emerging wealth of information. One fundamental task to that end is relation ...
Harnessing Open Information Extraction for Entity Classification in a French Corpus
Proceedings of the 29th Canadian Conference on Artificial Intelligence on Advances in Artificial Intelligence - Volume 9673

We describe a recall-oriented open information extraction system designed to extract knowledge from French corpora. We put it to the test by showing that general domain information triples extracted from French Wikipedia can be used for deriving new ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

January 2019

874 pages

ISBN:9781450359405

DOI:10.1145/3289600

General Chairs:
J. Shane Culpepper
RMIT University
,
Alistair Moffat
The University of Melbourne
,
Program Chairs:
Paul N. Bennett
Microsoft
,
Kristina Lerman
University of Southern California

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 January 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM '19

Sponsor:

WSDM '19: The Twelfth ACM International Conference on Web Search and Data Mining

February 11 - 15, 2019

Melbourne VIC, Australia

Acceptance Rates

WSDM '19 Paper Acceptance Rate 84 of 511 submissions, 16%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
434
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qu MGao TXhonneux LTang JDaumé HSingh A(2020)Few-shot relation extraction via Bayesian meta-learning on relation graphsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525667(7867-7876)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525667
Dai YQi JZhang RCaverlee JHu XLalmas MWang W(2020)Joint Recognition of Names and Publications in Academic HomepagesProceedings of the 13th International Conference on Web Search and Data Mining10.1145/3336191.3371771(133-141)Online publication date: 20-Jan-2020
https://dl.acm.org/doi/10.1145/3336191.3371771
Meng YHuang JShang JHan J(2019)TextCubeProceedings of the VLDB Endowment10.14778/3352063.335211312:12(1974-1977)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.14778/3352063.3352113

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten