research-article

ClausIE: clause-based open information extraction

Authors:

Luciano Del Corro and

Rainer GemullaAuthors Info & Claims

WWW '13: Proceedings of the 22nd international conference on World Wide Web

May 2013

Pages 355 - 366

https://doi.org/10.1145/2488388.2488420

Published: 13 May 2013 Publication History

Abstract

We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs from previous approaches in that it separates the detection of ``useful'' pieces of information expressed in a sentence from their representation in terms of extractions. In more detail, ClausIE exploits linguistic knowledge about the grammar of the English language to first detect clauses in an input sentence and to subsequently identify the type of each clause according to the grammatical function of its constituents. Based on this information, ClausIE is able to generate high-precision extractions; the representation of these extractions can be flexibly customized to the underlying application. ClausIE is based on dependency parsing and a small set of domain-independent lexica, operates sentence by sentence without any post-processing, and requires no training data (whether labeled or unlabeled). Our experimental study on various real-world datasets suggests that ClausIE obtains higher recall and higher precision than existing approaches, both on high-quality text as well as on noisy text as found in the web.

References

[1]

Alan Akbik and Jürgen Broß. Wanderlust: Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns. In 1st Workshop on Semantic Search at 18th. WWWW Conference, 2009.

[2]

Alan Akbik and Alexander Löser. Kraken: N-ary facts in open information extraction. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pages 52--56, 2012.

Digital Library

[3]

Michele Banko, Michael J Cafarella, Stephen Soderl, Matt Broadhead, and Oren Etzioni. Open information extraction from the web. In Proceedings of Conference on Artificial Intelligence, pages 2670--2676, 2007.

Digital Library

[4]

Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of the Conference on Artificial Intelligence, 2010.

[5]

Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. Semantic role labeling for open information extraction. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 52--60, 2010.

Digital Library

[6]

Marie-Catherine de Marnee and Christopher D. Manning. Stanford typed dependencies manual.

[7]

Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam. Open information extraction: The second generation. In Proceedings of the Conference on Artificial Intelligence, pages 3--10, 2011.

Digital Library

[8]

Richard J. Evans. Comparing methods for the syntactic simplification of sentences in information extraction. Literary and Linguistic Computing, 26(4):371--388, 2011.

[9]

Anthony Fader, Stephen Soderland, and Oren Etzioni. Identifying relations for open information extraction. In Proceedings of the Conference of Empirical Methods in Natural Language Processing, 2011.

Digital Library

[10]

Pablo Gamallo, Marcos Garcia, and Santiago Fernández-Lanza. Dependency-based open information extraction. In Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, pages 10--18, 2012.

Digital Library

[11]

Dan Klein and Christopher D. Manning. Accurate unlexicalized parsing. In Proceedings of Association of computational linguistics, pages 423--430, 2003.

Digital Library

[12]

Thomas Lin, Mausam, and Oren Etzioni. No noun phrase left behind: detecting and typing unlinkable entities. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 893--903, 2012.

Digital Library

[13]

Mausam, Michael Schmitz, Stephen Soderland, Robert Bart, and Oren Etzioni. Open language learning for information extraction. In Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 523--534, 2012.

Digital Library

[14]

Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. PATTY: a taxonomy of relational patterns with semantic types. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2012.

Digital Library

[15]

Randolph Quirk, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik. A Comprehensive Grammar of the English Language. Longman, 1985.

[16]

Evan Sandhaus. The New York Times Annotated Corpus, 2008.

[17]

Fabian M. Suchanek, Mauro Sozio, and Gerhard Weikum. Sofie: a self-organizing framework for information extraction. In Proceedings of WWW, pages 631--640, 2009.

Digital Library

[18]

Petros Venetis, Alon Y. Halevy, Jayant Madhavan, Marius Pasca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. Recovering semantics of tables on the web. PVLDB, 4(9):528--538, 2011.

Digital Library

[19]

Fei Wu and Daniel S. Weld. Open information extraction using wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 118--127, 2010.

Digital Library

[20]

Amal zouaq. An overview of shallow and deep natural language processing for ontology learning. In W. Wong, W. Liu, and M. Bennamoun, editors, Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances. 2011.

Cited By

Li LZhang LZhu CMao Z(2024)QGAE: an end-to-end answer-agnostic question generation model for generating question-answer pairsJUSTC10.52396/JUSTC-2023-000254:1(0102)Online publication date: 2024
https://doi.org/10.52396/JUSTC-2023-0002
Ligabue PBrandão APeres SCozman FPirozelli P(2024)Applying a Context-based Method to Build a Knowledge Graph for the Blue AmazonData Intelligence10.1162/dint_a_00223(1-63)Online publication date: 11-Mar-2024
https://doi.org/10.1162/dint_a_00223
Razniewski SArnaout HGhosh SSuchanek F(2024)Completeness, Recall, and Negation in Open-world Knowledge Bases: A SurveyACM Computing Surveys10.1145/363956356:6(1-42)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3639563
Show More Cited By

Index Terms

ClausIE: clause-based open information extraction
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Vietnamese Open Information Extraction
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication Technology

Open information extraction (OIE) is the process to extract relations and their arguments automatically from textual documents without the need to restrict the search to predefined relations. In recent years, several OIE systems for the English language ...
Read More
Lexicon-Grammar based open information extraction from natural language sentences in Italian
Highlights
- An OIE approach for Italian language, based on verb behavior patterns.
- Verb ...
Abstract
In the last decade, the quantity of readily accessible text has grown rapidly and enormously, long exceeding the capacity of humans to read and understand it. One of the most interesting strategies proposed to fulfill this need is ...
Read More
CrossOIE: Cross-Lingual Classifier for Open Information Extraction
Computational Processing of the Portuguese Language
Abstract
Open information extraction (Open IE) is the task of extracting open-domain assertions from natural language sentences. Considering the low availability of datasets and tools for this task in languages other than English, recently it has been ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '13: Proceedings of the 22nd international conference on World Wide Web

May 2013

1628 pages

ISBN:9781450320351

DOI:10.1145/2488388

General Chairs:
Daniel Schwabe
PUC-Rio - Brazil
,
Virgílio Almeida
UFMG - Brazil
,
Hartmut Glaser
CGI.br - Brazil
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Labs - Spain & Chile
,
Sue Moon
KAIST - South Korea

Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
CGIBR: Comite Gestor da Internet no Brazil

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '13

Sponsor:

NICBR
CGIBR

WWW '13: 22nd International World Wide Web Conference

May 13 - 17, 2013

Rio de Janeiro, Brazil

Acceptance Rates

WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

225
Total Citations
View Citations
1,601
Total Downloads

Downloads (Last 12 months)62
Downloads (Last 6 weeks)6

Other Metrics

View Author Metrics

Citations

Cited By

Li LZhang LZhu CMao Z(2024)QGAE: an end-to-end answer-agnostic question generation model for generating question-answer pairsJUSTC10.52396/JUSTC-2023-000254:1(0102)Online publication date: 2024
https://doi.org/10.52396/JUSTC-2023-0002
Ligabue PBrandão APeres SCozman FPirozelli P(2024)Applying a Context-based Method to Build a Knowledge Graph for the Blue AmazonData Intelligence10.1162/dint_a_00223(1-63)Online publication date: 11-Mar-2024
https://doi.org/10.1162/dint_a_00223
Razniewski SArnaout HGhosh SSuchanek F(2024)Completeness, Recall, and Negation in Open-world Knowledge Bases: A SurveyACM Computing Surveys10.1145/363956356:6(1-42)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3639563
Salman MHaller AMéndez SNaseem U(2024)Doc‐KG: Unstructured documents to knowledge graph construction, identification and validation with WikidataExpert Systems10.1111/exsy.13617Online publication date: 8-May-2024
https://doi.org/10.1111/exsy.13617
Tho BNguyen MLe DYing LInoue SNguyen T(2024)Improving biomedical Named Entity Recognition with additional external contextsJournal of Biomedical Informatics10.1016/j.jbi.2024.104674156(104674)Online publication date: Aug-2024
https://doi.org/10.1016/j.jbi.2024.104674
Scannapieco STomazzoli C(2024)Cnosso, a novel method for business document automation based on open information extractionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.123038245:COnline publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.123038
Hur AJanjua NAhmed M(2024)Unifying context with labeled property graph: A pipeline-based system for comprehensive text representation in NLPExpert Systems with Applications10.1016/j.eswa.2023.122269239(122269)Online publication date: Apr-2024
https://doi.org/10.1016/j.eswa.2023.122269
Siciliani LGhizzota EBasile PLops P(2024)OIE4PA: open information extraction for the public administrationJournal of Intelligent Information Systems10.1007/s10844-023-00814-z62:1(273-294)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1007/s10844-023-00814-z
Schumann GMarx Gómez J(2024)Extraction of Numerical Facts from German Texts to Enrich Internal Audit DataArtificial Intelligence Tools and Applications in Embedded and Mobile Systems10.1007/978-3-031-56576-2_16(183-193)Online publication date: 30-Jun-2024
https://doi.org/10.1007/978-3-031-56576-2_16
Fan YLi BSataer YGao MShi CCao SGao Z(2023)Hierarchical Clause Annotation: Building a Clause-Level Corpus for Semantic Parsing with Complex SentencesApplied Sciences10.3390/app1316941213:16(9412)Online publication date: 19-Aug-2023
https://doi.org/10.3390/app13169412
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents