Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2488388.2488420acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

ClausIE: clause-based open information extraction

Published: 13 May 2013 Publication History
  • Get Citation Alerts
  • Abstract

    We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs from previous approaches in that it separates the detection of ``useful'' pieces of information expressed in a sentence from their representation in terms of extractions. In more detail, ClausIE exploits linguistic knowledge about the grammar of the English language to first detect clauses in an input sentence and to subsequently identify the type of each clause according to the grammatical function of its constituents. Based on this information, ClausIE is able to generate high-precision extractions; the representation of these extractions can be flexibly customized to the underlying application. ClausIE is based on dependency parsing and a small set of domain-independent lexica, operates sentence by sentence without any post-processing, and requires no training data (whether labeled or unlabeled). Our experimental study on various real-world datasets suggests that ClausIE obtains higher recall and higher precision than existing approaches, both on high-quality text as well as on noisy text as found in the web.

    References

    [1]
    Alan Akbik and Jürgen Broß. Wanderlust: Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns. In 1st Workshop on Semantic Search at 18th. WWWW Conference, 2009.
    [2]
    Alan Akbik and Alexander Löser. Kraken: N-ary facts in open information extraction. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pages 52--56, 2012.
    [3]
    Michele Banko, Michael J Cafarella, Stephen Soderl, Matt Broadhead, and Oren Etzioni. Open information extraction from the web. In Proceedings of Conference on Artificial Intelligence, pages 2670--2676, 2007.
    [4]
    Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of the Conference on Artificial Intelligence, 2010.
    [5]
    Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. Semantic role labeling for open information extraction. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 52--60, 2010.
    [6]
    Marie-Catherine de Marnee and Christopher D. Manning. Stanford typed dependencies manual.
    [7]
    Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam. Open information extraction: The second generation. In Proceedings of the Conference on Artificial Intelligence, pages 3--10, 2011.
    [8]
    Richard J. Evans. Comparing methods for the syntactic simplification of sentences in information extraction. Literary and Linguistic Computing, 26(4):371--388, 2011.
    [9]
    Anthony Fader, Stephen Soderland, and Oren Etzioni. Identifying relations for open information extraction. In Proceedings of the Conference of Empirical Methods in Natural Language Processing, 2011.
    [10]
    Pablo Gamallo, Marcos Garcia, and Santiago Fernández-Lanza. Dependency-based open information extraction. In Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, pages 10--18, 2012.
    [11]
    Dan Klein and Christopher D. Manning. Accurate unlexicalized parsing. In Proceedings of Association of computational linguistics, pages 423--430, 2003.
    [12]
    Thomas Lin, Mausam, and Oren Etzioni. No noun phrase left behind: detecting and typing unlinkable entities. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 893--903, 2012.
    [13]
    Mausam, Michael Schmitz, Stephen Soderland, Robert Bart, and Oren Etzioni. Open language learning for information extraction. In Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 523--534, 2012.
    [14]
    Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. PATTY: a taxonomy of relational patterns with semantic types. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2012.
    [15]
    Randolph Quirk, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik. A Comprehensive Grammar of the English Language. Longman, 1985.
    [16]
    Evan Sandhaus. The New York Times Annotated Corpus, 2008.
    [17]
    Fabian M. Suchanek, Mauro Sozio, and Gerhard Weikum. Sofie: a self-organizing framework for information extraction. In Proceedings of WWW, pages 631--640, 2009.
    [18]
    Petros Venetis, Alon Y. Halevy, Jayant Madhavan, Marius Pasca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. Recovering semantics of tables on the web. PVLDB, 4(9):528--538, 2011.
    [19]
    Fei Wu and Daniel S. Weld. Open information extraction using wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 118--127, 2010.
    [20]
    Amal zouaq. An overview of shallow and deep natural language processing for ontology learning. In W. Wong, W. Liu, and M. Bennamoun, editors, Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances. 2011.

    Cited By

    View all
    • (2024)QGAE: an end-to-end answer-agnostic question generation model for generating question-answer pairsJUSTC10.52396/JUSTC-2023-000254:1(0102)Online publication date: 2024
    • (2024)Applying a Context-based Method to Build a Knowledge Graph for the Blue AmazonData Intelligence10.1162/dint_a_00223(1-63)Online publication date: 11-Mar-2024
    • (2024)Completeness, Recall, and Negation in Open-world Knowledge Bases: A SurveyACM Computing Surveys10.1145/363956356:6(1-42)Online publication date: 5-Jan-2024
    • Show More Cited By

    Index Terms

    1. ClausIE: clause-based open information extraction

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      WWW '13: Proceedings of the 22nd international conference on World Wide Web
      May 2013
      1628 pages
      ISBN:9781450320351
      DOI:10.1145/2488388

      Sponsors

      • NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
      • CGIBR: Comite Gestor da Internet no Brazil

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 May 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. open information extraction
      2. relation extraction

      Qualifiers

      • Research-article

      Conference

      WWW '13
      Sponsor:
      • NICBR
      • CGIBR
      WWW '13: 22nd International World Wide Web Conference
      May 13 - 17, 2013
      Rio de Janeiro, Brazil

      Acceptance Rates

      WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;
      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)62
      • Downloads (Last 6 weeks)6

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)QGAE: an end-to-end answer-agnostic question generation model for generating question-answer pairsJUSTC10.52396/JUSTC-2023-000254:1(0102)Online publication date: 2024
      • (2024)Applying a Context-based Method to Build a Knowledge Graph for the Blue AmazonData Intelligence10.1162/dint_a_00223(1-63)Online publication date: 11-Mar-2024
      • (2024)Completeness, Recall, and Negation in Open-world Knowledge Bases: A SurveyACM Computing Surveys10.1145/363956356:6(1-42)Online publication date: 5-Jan-2024
      • (2024)Doc‐KG: Unstructured documents to knowledge graph construction, identification and validation with WikidataExpert Systems10.1111/exsy.13617Online publication date: 8-May-2024
      • (2024)Improving biomedical Named Entity Recognition with additional external contextsJournal of Biomedical Informatics10.1016/j.jbi.2024.104674156(104674)Online publication date: Aug-2024
      • (2024)Cnosso, a novel method for business document automation based on open information extractionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.123038245:COnline publication date: 1-Jul-2024
      • (2024)Unifying context with labeled property graph: A pipeline-based system for comprehensive text representation in NLPExpert Systems with Applications10.1016/j.eswa.2023.122269239(122269)Online publication date: Apr-2024
      • (2024)OIE4PA: open information extraction for the public administrationJournal of Intelligent Information Systems10.1007/s10844-023-00814-z62:1(273-294)Online publication date: 1-Feb-2024
      • (2024)Extraction of Numerical Facts from German Texts to Enrich Internal Audit DataArtificial Intelligence Tools and Applications in Embedded and Mobile Systems10.1007/978-3-031-56576-2_16(183-193)Online publication date: 30-Jun-2024
      • (2023)Hierarchical Clause Annotation: Building a Clause-Level Corpus for Semantic Parsing with Complex SentencesApplied Sciences10.3390/app1316941213:16(9412)Online publication date: 19-Aug-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media