Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2283396.2283398guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Open information extraction: the second generation

Published: 16 July 2011 Publication History
  • Get Citation Alerts
  • Abstract

    How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web.
    In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews hand-labeled training examples, and avoids domain-specific verbs and nouns, to develop unlexicalized, domain-independent extractors that scale to the Web corpus. Open IE systems have extracted billions of assertions as the basis for both common-sense knowledge and novel question-answering systems.
    This paper describes the second generation of Open IE systems, which rely on a novel model of how relations and their arguments are expressed in English sentences to double precision/recall compared with previous systems such as TEXTRUNNER and WOE.

    References

    [1]
    David J. Allerton. Stretched Verb Constructions in English. Routledge Studies in Germanic Linguistics. Routledge (Taylor and Francis), New York, 2002.
    [2]
    Michele Banko and Oren Etzioni. The tradeoffs between open and traditional relation extraction. In ACL'08, 2008.
    [3]
    Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. Open information extraction from the web. In IJCAI, 2007.
    [4]
    Jonathan Berant, Ido Dagan, and Jacob Goldberger. Global learning of typed entailment rules. In ACL'11, 2011.
    [5]
    Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In AAAI'10, 2010.
    [6]
    Andrew Carlson, Justin Betteridge, Richard C. Wang, Estevam R. Hruschka Jr., and Tom M. Mitchell. Coupled semi-supervised learning for information extraction. In WSDM 2010, 2010.
    [7]
    Xavier Carreras and Lluis Marquez. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling, 2005.
    [8]
    Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. Learning Arguments for Open Information Extraction. Submitted, 2011.
    [9]
    Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. The tradeoffs between syntactic features and semantic roles for open information extraction. In Knowledge Capture (KCAP), 2011.
    [10]
    Oren Etzioni, Michele Banko, and Michael J. Cafarella. Machine reading. In Proceedings of the 21st National Conference on Artificial Intelligence, 2006.
    [11]
    Anthony Fader, Stephen Soderland, and Oren Etzioni. Identifying Relations for Open Information Extraction. Submitted, 2011.
    [12]
    Gregory Grefenstette and Simone Teufel. Corpus-based method for automatic identification of support verbs for nominalizations. In EACL'95, 1995.
    [13]
    Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software: An update. SIGKDD Explorations, 1(1), 2009.
    [14]
    Raphael Hoffmann, Congle Zhang, and Daniel S. Weld. Learning 5000 relational extractors. In ACL '10, 2010.
    [15]
    Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. Distant supervision for information extraction of overlapping relations. In ACL '11, 2011.
    [16]
    J. Kim and D. Moldovan. Acquisition of semantic patterns for information extraction from corpora. In Procs. of Ninth IEEE Conference on Artificial Intelligence for Applications, pages 171-176, 1993.
    [17]
    Thomas Lin, Mausam, and Oren Etzioni. Identifying Functional Relations in Web Text. In EMNLP'10, 2010.
    [18]
    Andres McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.
    [19]
    Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. Distant supervision for relation extraction without labeled data. In ACL-IJCNLP'09, 2009.
    [20]
    E. Riloff. Automatically constructing extraction patterns from untagged text. In AAAI'96, 1996.
    [21]
    Alan Ritter, Mausam, and Oren Etzioni. A Latent Dirichlet Allocation Method for Selectional Preferences. In ACL, 2010.
    [22]
    Alan Ritter, Sam Clark, Mausam, and Oren Etzioni. Named Entity Recognition in Tweets: An Experimental Study. Submitted, 2011.
    [23]
    Stefan Schoenmackers, Oren Etzioni, Daniel S. Weld, and Jesse Davis. Learning first-order horn clauses from web text. In EMNLP'10, 2010.
    [24]
    Yusuke Shinyama and Satoshi Sekine. Preemptive Information Extraction using Unrestricted Relation Discovery. In NAACL'06, 2006.
    [25]
    Stephen Soderland, Brendan Roof, Bo Qin, Shi Xu, Mausam, and Oren Etzioni. Adapting open information extraction to domain-specific relations. AI Magazine, 31(3):93- 102, 2010.
    [26]
    S. Soderland. Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning, 34(1-3):233-272, 1999.
    [27]
    Suzanne Stevenson, Afsaneh Fazly, and Ryan North. Statistical measures of the semi-productivity of light verb constructions. In 2nd ACL Workshop on Multiword Expressions, pages 1-8, 2004.
    [28]
    Fei Wu and Daniel S. Weld. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pages 118-127, Morristown, NJ, USA, 2010. Association for Computational Linguistics.
    [29]
    A. Yates and O. Etzioni. Unsupervised methods for determining object and relation synonyms on the web. Journal of Artificial Intelligence Research, 34(1):255-296, 2009.
    [30]
    Jun Zhu, Zaiqing Nie, Xiaojiang Liu, Bo Zhang, and Ji-Rong Wen. StatSnowball: a statistical approach to extracting entity relationships. In WWW'09, 2009.

    Cited By

    View all
    • (2020)Creating Hardware Component Knowledge Bases with Training Data Generation and Multi-task LearningACM Transactions on Embedded Computing Systems10.1145/339190619:6(1-26)Online publication date: 29-Sep-2020
    • (2019)Approximate Definitional Constructs as Lightweight Evidence for Detecting Classes Among Wikipedia ArticlesProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358167(2373-2376)Online publication date: 3-Nov-2019
    • (2019)Automating the generation of hardware component knowledge basesProceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3316482.3326344(163-176)Online publication date: 23-Jun-2019
    • Show More Cited By

    Index Terms

    1. Open information extraction: the second generation
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          IJCAI'11: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
          July 2011
          704 pages
          ISBN:9781577355137

          Sponsors

          • The International Joint Conferences on Artificial Intelligence, Inc. (IJCAI)

          Publisher

          AAAI Press

          Publication History

          Published: 16 July 2011

          Qualifiers

          • Article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 27 Jul 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2020)Creating Hardware Component Knowledge Bases with Training Data Generation and Multi-task LearningACM Transactions on Embedded Computing Systems10.1145/339190619:6(1-26)Online publication date: 29-Sep-2020
          • (2019)Approximate Definitional Constructs as Lightweight Evidence for Detecting Classes Among Wikipedia ArticlesProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358167(2373-2376)Online publication date: 3-Nov-2019
          • (2019)Automating the generation of hardware component knowledge basesProceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3316482.3326344(163-176)Online publication date: 23-Jun-2019
          • (2019)A Novel Unsupervised Approach for Precise Temporal Slot Filling from Incomplete and Noisy Temporal ContextsThe World Wide Web Conference10.1145/3308558.3313435(3328-3334)Online publication date: 13-May-2019
          • (2019)Lightweight Lexical and Semantic Evidence for Detecting Classes Among Wikipedia ArticlesProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3291020(78-86)Online publication date: 30-Jan-2019
          • (2019)Conceptual Representations for Computational Concept CreationACM Computing Surveys10.1145/318672952:1(1-33)Online publication date: 25-Feb-2019
          • (2019)Utilizing structured knowledge bases in open IE based event template extractionApplied Intelligence10.1007/s10489-018-1269-049:1(206-219)Online publication date: 1-Jan-2019
          • (2019)Predicting hypernym---hyponym relations for Chinese taxonomy learningKnowledge and Information Systems10.1007/s10115-018-1166-158:3(585-610)Online publication date: 1-Mar-2019
          • (2018)Enriching a thesaurus as a better question-answering tool and information retrieval aidJournal of Information Science10.1177/016555151770621944:4(512-525)Online publication date: 1-Aug-2018
          • (2018)Relation Extraction Using Distant SupervisionACM Computing Surveys10.1145/324174151:5(1-35)Online publication date: 19-Nov-2018
          • Show More Cited By

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media