Article

Open information extraction: the second generation

Authors:

Janara Christensen,

Stephen Soderland,

Mausam MausamAuthors Info & Claims

IJCAI'11: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One

Pages 3 - 10

Published: 16 July 2011 Publication History

Abstract

How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web.

In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews hand-labeled training examples, and avoids domain-specific verbs and nouns, to develop unlexicalized, domain-independent extractors that scale to the Web corpus. Open IE systems have extracted billions of assertions as the basis for both common-sense knowledge and novel question-answering systems.

This paper describes the second generation of Open IE systems, which rely on a novel model of how relations and their arguments are expressed in English sentences to double precision/recall compared with previous systems such as TEXTRUNNER and WOE.

References

[1]

David J. Allerton. Stretched Verb Constructions in English. Routledge Studies in Germanic Linguistics. Routledge (Taylor and Francis), New York, 2002.

[2]

Michele Banko and Oren Etzioni. The tradeoffs between open and traditional relation extraction. In ACL'08, 2008.

[3]

Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. Open information extraction from the web. In IJCAI, 2007.

[4]

Jonathan Berant, Ido Dagan, and Jacob Goldberger. Global learning of typed entailment rules. In ACL'11, 2011.

[5]

Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In AAAI'10, 2010.

[6]

Andrew Carlson, Justin Betteridge, Richard C. Wang, Estevam R. Hruschka Jr., and Tom M. Mitchell. Coupled semi-supervised learning for information extraction. In WSDM 2010, 2010.

[7]

Xavier Carreras and Lluis Marquez. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling, 2005.

[8]

Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. Learning Arguments for Open Information Extraction. Submitted, 2011.

[9]

Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. The tradeoffs between syntactic features and semantic roles for open information extraction. In Knowledge Capture (KCAP), 2011.

[10]

Oren Etzioni, Michele Banko, and Michael J. Cafarella. Machine reading. In Proceedings of the 21st National Conference on Artificial Intelligence, 2006.

[11]

Anthony Fader, Stephen Soderland, and Oren Etzioni. Identifying Relations for Open Information Extraction. Submitted, 2011.

[12]

Gregory Grefenstette and Simone Teufel. Corpus-based method for automatic identification of support verbs for nominalizations. In EACL'95, 1995.

[13]

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software: An update. SIGKDD Explorations, 1(1), 2009.

[14]

Raphael Hoffmann, Congle Zhang, and Daniel S. Weld. Learning 5000 relational extractors. In ACL '10, 2010.

[15]

Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. Distant supervision for information extraction of overlapping relations. In ACL '11, 2011.

[16]

J. Kim and D. Moldovan. Acquisition of semantic patterns for information extraction from corpora. In Procs. of Ninth IEEE Conference on Artificial Intelligence for Applications, pages 171-176, 1993.

[17]

Thomas Lin, Mausam, and Oren Etzioni. Identifying Functional Relations in Web Text. In EMNLP'10, 2010.

[18]

Andres McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.

[19]

Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. Distant supervision for relation extraction without labeled data. In ACL-IJCNLP'09, 2009.

[20]

E. Riloff. Automatically constructing extraction patterns from untagged text. In AAAI'96, 1996.

[21]

Alan Ritter, Mausam, and Oren Etzioni. A Latent Dirichlet Allocation Method for Selectional Preferences. In ACL, 2010.

[22]

Alan Ritter, Sam Clark, Mausam, and Oren Etzioni. Named Entity Recognition in Tweets: An Experimental Study. Submitted, 2011.

[23]

Stefan Schoenmackers, Oren Etzioni, Daniel S. Weld, and Jesse Davis. Learning first-order horn clauses from web text. In EMNLP'10, 2010.

[24]

Yusuke Shinyama and Satoshi Sekine. Preemptive Information Extraction using Unrestricted Relation Discovery. In NAACL'06, 2006.

[25]

Stephen Soderland, Brendan Roof, Bo Qin, Shi Xu, Mausam, and Oren Etzioni. Adapting open information extraction to domain-specific relations. AI Magazine, 31(3):93- 102, 2010.

[26]

S. Soderland. Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning, 34(1-3):233-272, 1999.

[27]

Suzanne Stevenson, Afsaneh Fazly, and Ryan North. Statistical measures of the semi-productivity of light verb constructions. In 2nd ACL Workshop on Multiword Expressions, pages 1-8, 2004.

[28]

Fei Wu and Daniel S. Weld. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pages 118-127, Morristown, NJ, USA, 2010. Association for Computational Linguistics.

[29]

A. Yates and O. Etzioni. Unsupervised methods for determining object and relation synonyms on the web. Journal of Artificial Intelligence Research, 34(1):255-296, 2009.

[30]

Jun Zhu, Zaiqing Nie, Xiaojiang Liu, Bo Zhang, and Ji-Rong Wen. StatSnowball: a statistical approach to extracting entity relationships. In WWW'09, 2009.

Cited By

Hsiao LWu SChiang NRé CLevis P(2020)Creating Hardware Component Knowledge Bases with Training Data Generation and Multi-task LearningACM Transactions on Embedded Computing Systems10.1145/339190619:6(1-26)Online publication date: 29-Sep-2020
https://dl.acm.org/doi/10.1145/3391906
Paşca MZhu WTao DCheng XCui PRundensteiner ECarmel DHe QXu Yu J(2019)Approximate Definitional Constructs as Lightweight Evidence for Detecting Classes Among Wikipedia ArticlesProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358167(2373-2376)Online publication date: 3-Nov-2019
https://dl.acm.org/doi/10.1145/3357384.3358167
Hsiao LWu SChiang NRé CLevis PChen JShrivastava A(2019)Automating the generation of hardware component knowledge basesProceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3316482.3326344(163-176)Online publication date: 23-Jun-2019
https://dl.acm.org/doi/10.1145/3316482.3326344
Show More Cited By

Index Terms

Open information extraction: the second generation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
2. Hardware
  1. Power and energy
    1. Power estimation and optimization
      1. Platform power issues

Index terms have been assigned to the content through auto-classification.

Recommendations

Open information extraction from the web
IJCAI'07: Proceedings of the 20th international joint conference on Artifical intelligence

Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires ...
Open information extraction using Wikipedia
ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Information-extraction (IE) systems seek to distill semantic relations from natural-language text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as ...
A weighting scheme for open information extraction
NAACL HLT '12: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop

We study the problem of extracting all possible relations among named entities from unstructured text, a task known as Open Information Extraction (Open IE). A state-of-the-art Open IE system consists of natural language processing tools to identify ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

IJCAI'11: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One

July 2011

704 pages

ISBN:9781577355137

Editor:
Toby Walsh
NICTA and University of NSW

Sponsors

The International Joint Conferences on Artificial Intelligence, Inc. (IJCAI)

Publisher

AAAI Press

Publication History

Published: 16 July 2011

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

82
Total Citations
View Citations
220
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hsiao LWu SChiang NRé CLevis P(2020)Creating Hardware Component Knowledge Bases with Training Data Generation and Multi-task LearningACM Transactions on Embedded Computing Systems10.1145/339190619:6(1-26)Online publication date: 29-Sep-2020
https://dl.acm.org/doi/10.1145/3391906
Paşca MZhu WTao DCheng XCui PRundensteiner ECarmel DHe QXu Yu J(2019)Approximate Definitional Constructs as Lightweight Evidence for Detecting Classes Among Wikipedia ArticlesProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358167(2373-2376)Online publication date: 3-Nov-2019
https://dl.acm.org/doi/10.1145/3357384.3358167
Hsiao LWu SChiang NRé CLevis PChen JShrivastava A(2019)Automating the generation of hardware component knowledge basesProceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3316482.3326344(163-176)Online publication date: 23-Jun-2019
https://dl.acm.org/doi/10.1145/3316482.3326344
Wang XZhang HLi QShi YJiang M(2019)A Novel Unsupervised Approach for Precise Temporal Slot Filling from Incomplete and Noisy Temporal ContextsThe World Wide Web Conference10.1145/3308558.3313435(3328-3334)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313435
Pasca MWolfe TCulpepper JMoffat ABennett PLerman K(2019)Lightweight Lexical and Semantic Evidence for Detecting Classes Among Wikipedia ArticlesProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3291020(78-86)Online publication date: 30-Jan-2019
https://dl.acm.org/doi/10.1145/3289600.3291020
Xiao PToivonen HGross OCardoso ACorreia JMachado PMartins POliveira HSharma RPinto ADíaz AFrancisco VGervás PHervás RLeón CForth JPurver MWiggins GMiljković DPodpečan VPollak SKralj JŽnidaršič MBohanec MLavrač NUrbančič TVelde FBattersby S(2019)Conceptual Representations for Computational Concept CreationACM Computing Surveys10.1145/318672952:1(1-33)Online publication date: 25-Feb-2019
https://dl.acm.org/doi/10.1145/3186729
Romadhony AWidyantoro DPurwarianti A(2019)Utilizing structured knowledge bases in open IE based event template extractionApplied Intelligence10.1007/s10489-018-1269-049:1(206-219)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1007/s10489-018-1269-0
Wang CFan YHe XZhou A(2019)Predicting hypernym---hyponym relations for Chinese taxonomy learningKnowledge and Information Systems10.1007/s10115-018-1166-158:3(585-610)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1007/s10115-018-1166-1
Wu Y(2018)Enriching a thesaurus as a better question-answering tool and information retrieval aidJournal of Information Science10.1177/016555151770621944:4(512-525)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1177/0165551517706219
Smirnova ACudré-Mauroux P(2018)Relation Extraction Using Distant SupervisionACM Computing Surveys10.1145/324174151:5(1-35)Online publication date: 19-Nov-2018
https://dl.acm.org/doi/10.1145/3241741
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents