Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3219819.3220017acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

TruePIE: Discovering Reliable Patterns in Pattern-Based Information Extraction

Published: 19 July 2018 Publication History

Abstract

Pattern-based methods have been successful in information extraction and NLP research. Previous approaches learn the quality of a textual pattern as relatedness to a certain task based on statistics of its individual content (e.g., length, frequency) and hundreds of carefully-annotated labels. However, patterns of good content-quality may generate heavily conflicting information due to the big gap between relatedness and correctness. Evaluating the correctness of information is critical in (entity, attribute, value)-tuple extraction. In this work, we propose a novel method, called TruePIE, that finds reliable patterns which can extract not only related but also correct information. TruePIE adopts the self-training framework and repeats the training-predicting-extracting process to gradually discover more and more reliable patterns. To better represent the textual patterns, pattern embeddings are formulated so that patterns with similar semantic meanings are embedded closely to each other. The embeddings jointly consider the local pattern information and the distributional information of the extractions. To conquer the challenge of lacking supervision on patterns' reliability, TruePIE can automatically generate high quality training patterns based on a couple of seed patterns by applying the arity-constraints to distinguish highly reliable patterns (i.e., positive patterns) and highly unreliable patterns (i.e., negative patterns). Experiments on a huge news dataset (over 25GB) demonstrate that the proposed TruePIE significantly outperforms baseline methods on each of the three tasks: reliable tuple extraction, reliable pattern extraction, and negative pattern extraction.

Supplementary Material

MP4 File (li_truepie_patterns.mp4)

References

[1]
Eugene Agichtein and Luis Gravano . 2000. Snowball: Extracting relations from large plain-text collections ACM DL.
[2]
Gabor Angeli, Sonal Gupta, Melvin Jose, Christopher D Manning, Christopher Ré, Julie Tibshirani, Jean Y Wu, Sen Wu, and Ce Zhang . 2014. Stanford's 2014 slot filling systems. TAC KBP Vol. 695 (2014).
[3]
Gabor Angeli, Melvin Johnson Premkumar, and Christopher D Manning . 2015. Leveraging linguistic structure for open domain information extraction ACL.
[4]
Michele Banko, Michael J Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni . 2007. Open information extraction from the Web. In IJCAI.
[5]
Hannah Bast, Björn Buchhold, and Elmar Haussmann . 2015. Relevance Scores for Triples from Type-Like Relations SIGIR.
[6]
Hannah Bast, Björn Buchhold, Elmar Haussmann, and others . 2016. Semantic Search on Text and Knowledge Bases. Foundations and Trends® in Information Retrieval (2016).
[7]
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R Hruschka Jr, and Tom M Mitchell . 2010. Toward an architecture for never-ending language learning AAAI, Vol. Vol. 5. 3.
[8]
Marie-Catherine De Marneffe, Bill MacCartney, Christopher D Manning, and others . 2006. Generating typed dependency parses from phrase structure parses Proceedings of LREC, Vol. Vol. 6. Genoa, 449--454.
[9]
Luciano Del Corro and Rainer Gemulla . 2013. Clausie: Clause-based open information extraction. WWW.
[10]
Anthony Fader, Stephen Soderland, and Oren Etzioni . 2011. Identifying relations for open information extraction EMNLP.
[11]
Rahul Gupta, Alon Halevy, Xuezhi Wang, Steven Euijong Whang, and Fei Wu . 2014. Biperpedia: An ontology for search applications. VLDB.
[12]
Alon Halevy, Natalya Noy, Sunita Sarawagi, Steven Euijong Whang, and Xiao Yu . 2016. Discovering structure in the universe of attribute names WWW. 939--949.
[13]
Marti A Hearst . 1992. Automatic acquisition of hyponyms from large text corpora Proceedings of the 14th conference on Computational linguistics-Volume 2. 539--545.
[14]
Meng Jiang, Jingbo Shang, Taylor Cassidy, Xiang Ren, Lance M Kaplan, Timothy P Hanratty, and Jiawei Han . 2017. MetaPAD: Meta pattern discovery from massive text corpora KDD.
[15]
Taesung Lee, Zhongyuan Wang, Haixun Wang, and Seung-won Hwang . 2013. Attribute extraction and scoring: A probabilistic approach ICDE.
[16]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky . 2014. The Stanford CoreNLP Natural Language Processing Toolkit Association for Computational Linguistics (ACL) System Demonstrations. 55--60.
[17]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean . 2013 a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[18]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean . 2013 b. Distributed representations of words and phrases and their compositionality NIPS. 3111--3119.
[19]
Thahir P Mohamed, Estevam R Hruschka Jr, and Tom M Mitchell . 2011. Discovering relations between noun categories. In EMNLP. 1447--1455.
[20]
Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek . 2012 a. Discovering and exploring relations on the web. Proceedings of the VLDB Endowment Vol. 5, 12 (2012), 1982--1985.
[21]
Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek . 2012 b. PATTY: A taxonomy of relational patterns with semantic types EMNLP.
[22]
Robert Parker, David Graff, Junbo Kong, Ke Chen, and Kazuaki Maeda . 2009. English Gigaword Fourth Edition LDC2009T13. Linguistic Data Consortium, Philadelphia (2009).
[23]
Simon Parsons . 1996. Current approaches to handling imperfect information in data and knowledge bases. TKDE (1996).
[24]
Meng Qu, Xiang Ren, Yu Zhang, and Jiawei Han . 2017. Overcoming Limited Supervision in Relation Extraction: A Pattern-enhanced Distributional Representation Approach. arXiv preprint arXiv:1711.03226 (2017).
[25]
Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M Marlin . 2013. Relation extraction with matrix factorization and universal schemas HLT-NAACL.
[26]
Michael Schmitz, Robert Bart, Stephen Soderland, Oren Etzioni, and others . 2012. Open language learning for information extraction. EMNLP.
[27]
Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon . 2015. Representing Text for Joint Embedding of Text and Knowledge Bases. EMNLP, Vol. Vol. 15. 1499--1509.
[28]
Mohamed Yahya, Steven Whang, Rahul Gupta, and Alon Y Halevy . 2014. ReNoun: Fact extraction for nominal attributes. In EMNLP.
[29]
Alexander Yates, Michael Cafarella, Michele Banko, Oren Etzioni, Matthew Broadhead, and Stephen Soderland . 2007. Textrunner: Open information extraction on the web ACL.
[30]
Jun Zhu, Zaiqing Nie, Xiaojiang Liu, Bo Zhang, and Ji-Rong Wen . 2009. StatSnowball: A statistical approach to extracting entity relationships WWW.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information extraction
  2. pattern embedding
  3. pattern reliability
  4. textual patterns

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '18
Sponsor:

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)136
  • Downloads (Last 6 weeks)26
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)KGRED: Knowledge-graph-based rule discovery for weakly supervised data labelingInformation Processing & Management10.1016/j.ipm.2024.10381661:5(103816)Online publication date: Sep-2024
  • (2023)RoREDInformation Sciences: an International Journal10.1016/j.ins.2023.01.132629:C(62-76)Online publication date: 1-Jun-2023
  • (2022)DPRL: Labeling Relation Based on Distant Supervision and POS Rule2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE)10.1109/ICCECE54139.2022.9712720(63-67)Online publication date: 14-Jan-2022
  • (2022)Fact Discovery for Text DataKnowledge Discovery from Multi-Sourced Data10.1007/978-981-19-1879-7_5(69-83)Online publication date: 14-Jun-2022
  • (2022)IntroductionKnowledge Discovery from Multi-Sourced Data10.1007/978-981-19-1879-7_1(1-11)Online publication date: 14-Jun-2022
  • (2021)Biomedical Knowledge Graphs Construction From Conditional StatementsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2020.297995918:3(823-835)Online publication date: 1-May-2021
  • (2020)Textual Evidence Mining via Spherical Heterogeneous Information Network Embedding2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9377958(828-837)Online publication date: 10-Dec-2020
  • (2020)Precise temporal slot filling via truth finding with data-driven commonsenseKnowledge and Information Systems10.1007/s10115-020-01493-wOnline publication date: 16-Jul-2020
  • (2020)Entity Synonym Discovery via Multiple AttentionsSemantic Technology10.1007/978-3-030-41407-8_18(271-286)Online publication date: 14-Feb-2020
  • (2019)A Novel Unsupervised Approach for Precise Temporal Slot Filling from Incomplete and Noisy Temporal ContextsThe World Wide Web Conference10.1145/3308558.3313435(3328-3334)Online publication date: 13-May-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media