Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1858681.1858694dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
research-article
Free access

Open information extraction using Wikipedia

Published: 11 July 2010 Publication History

Abstract

Information-extraction (IE) systems seek to distill semantic relations from natural-language text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform?
This paper presents WOE, an open IE system which improves dramatically on TextRunner's precision and recall. The key to WOE's performance is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Like TextRunner, WOE's extractor eschews lexicalized features and handles an unbounded set of semantic relations. WOE can operate in two modes: when restricted to POS tag features, it runs as quickly as TextRunner, but when set to use dependency-parse features its precision and recall rise even higher.

References

[1]
}}E. Agichtein and L. Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In ICDL.
[2]
}}Alan Akbik and Jügen Broß. 2009. Wanderlust: Extracting semantic relations from natural language text using dependency grammar patterns. In WWW Workshop.
[3]
}}Sören Auer and Jens Lehmann. 2007. What have innsbruck and leipzig in common? extracting semantics from wiki content. In ESWC.
[4]
}}M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. 2007. Open information extraction from the Web. In Procs. of IJCAI.
[5]
}}Razvan C. Bunescu and Raymond J. Mooney. 2005. Subsequence kernels for relation extraction. In NIPS.
[6]
}}R. Bunescu and R. Mooney. 2005. A shortest path dependency kernel for relation extraction. In HLT/EMNLP.
[7]
}}Eugene Charniak and Mark Johnson. 2005. Coarse-to-fine n-best parsing and maxent discriminative reranking. In ACL.
[8]
}}M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. 1998. Learning to extract symbolic knowledge from the world wide web. In AAAI.
[9]
}}Dmitry Davidov and Ari Rappoport. 2008. Unsupervised discovery of generic relationships using pattern clusters and its evaluation by automatically generated sat analogy questions. In ACL.
[10]
}}Dmitry Davidov, Ari Rappoport, and Moshe Koppel. 2007. Fully unsupervised discovery of concept-specific relationships by web mining. In ACL.
[11]
}}Marie-Catherine de Marneffe and Christopher D. Manning. 2008. Stanford typed dependencies manual. http://nlp.stanford.edu/downloads/lex-parser.shtml.
[12]
}}Benjamin Van Durme and Lenhart K. Schubert. 2008. Open knowledge extraction using compositional language processing. In STEP.
[13]
}}R. Hoffmann, C. Zhang, and D. Weld. 2010. Learning 5000 relational extractors. In ACL.
[14]
}}Jing Jiang and ChengXiang Zhai. 2007. A systematic exploration of the feature space for relation extraction. In HLT/NAACL.
[15]
}}A. Gangemi M. Ciaramita. 2005. Unsupervised learning of semantic relations between concepts of a molecular biology ontology. In IJCAI.
[16]
}}Andrew Kachites McCallum. 2002. Mallet: A machine learning for language toolkit. In http://mallet.cs.umass.edu.
[17]
}}Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In ACL-IJCNLP.
[18]
}}T. H. Kotaro Nakayama and S. Nishio. 2008. Wikipedia link structure and text mining for semantic relation extraction. In CEUR Workshop.
[19]
}}Dat P. T Nguyen, Yutaka Matsuo, and Mitsuru Ishizuka. 2007. Exploiting syntactic and semantic information for relation extraction from wikipedia. In IJCAI07-TextLinkWS.
[20]
}}Marius Pasca. 2008. Turning web text and search queries into factual knowledge: Hierarchical class attribute extraction. In AAAI.
[21]
}}Fuchun Peng and Andrew McCallum. 2004. Accurate Information Extraction from Research Papers using Conditional Random Fields. In HLT-NAACL.
[22]
}}Hoifung Poon and Pedro Domingos. 2008. Joint Inference in Information Extraction. In AAAI.
[23]
}}Y. Shinyama and S. Sekine. 2006. Preemptive information extraction using unristricted relation discovery. In HLT-NAACL.
[24]
}}Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. In NIPS.
[25]
}}Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A core of semantic knowledge - unifying WordNet and Wikipedia. In WWW.
[26]
}}Mengqiu Wang. 2008. A re-examination of dependency path kernels for relation extraction. In IJC-NLP.
[27]
}}Fei Wu and Daniel Weld. 2007. Autonomouslly Semantifying Wikipedia. In CIKM.
[28]
}}Fei Wu, Raphael Hoffmann, and Danel S. Weld. 2008. Information extraction from Wikipedia: Moving down the long tail. In KDD.
[29]
}}Min Zhang, Jie Zhang, Jian Su, and Guodong Zhou. 2006. A composite kernel to extract relations between entities with both flat and structured features. In ACL.
[30]
}}Shubin Zhao and Ralph Grishman. 2005. Extracting relations with integrated information using kernel methods. In ACL.
[31]
}}Jun Zhu, Zaiqing Nie, Xiaojiang Liu, Bo Zhang, and Ji-Rong Wen. 2009. Statsnowball: a statistical approach to extracting entity relationships. In WWW.

Cited By

View all
  • (2024)Knowledge Graph Embedding: A Survey from the Perspective of Representation SpacesACM Computing Surveys10.1145/364380656:6(1-42)Online publication date: 2-Feb-2024
  • (2023)Automatically Reproducing Android Bug Reports using Natural Language Processing and Reinforcement LearningProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598066(411-422)Online publication date: 12-Jul-2023
  • (2021)Hierarchical Concept-Driven Language ModelACM Transactions on Knowledge Discovery from Data10.1145/345116715:6(1-22)Online publication date: 19-May-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
July 2010
1618 pages
  • Program Chair:
  • Jan Hajič

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 11 July 2010

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)197
  • Downloads (Last 6 weeks)16
Reflects downloads up to 02 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Knowledge Graph Embedding: A Survey from the Perspective of Representation SpacesACM Computing Surveys10.1145/364380656:6(1-42)Online publication date: 2-Feb-2024
  • (2023)Automatically Reproducing Android Bug Reports using Natural Language Processing and Reinforcement LearningProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598066(411-422)Online publication date: 12-Jul-2023
  • (2021)Hierarchical Concept-Driven Language ModelACM Transactions on Knowledge Discovery from Data10.1145/345116715:6(1-22)Online publication date: 19-May-2021
  • (2021)Dependency Parsing-based Entity Relation Extraction over Chinese Complex TextACM Transactions on Asian and Low-Resource Language Information Processing10.1145/345027320:4(1-34)Online publication date: 9-Jun-2021
  • (2020)An Extensible Framework of Leveraging Syntactic Skeleton for Semantic Relation ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/340288519:6(1-21)Online publication date: 27-Sep-2020
  • (2019)Approximate Definitional Constructs as Lightweight Evidence for Detecting Classes Among Wikipedia ArticlesProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358167(2373-2376)Online publication date: 3-Nov-2019
  • (2019)Lightweight Lexical and Semantic Evidence for Detecting Classes Among Wikipedia ArticlesProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3291020(78-86)Online publication date: 30-Jan-2019
  • (2019)SenseDefsLanguage Resources and Evaluation10.1007/s10579-018-9421-353:2(251-278)Online publication date: 1-Jun-2019
  • (2018)Relation Extraction Using Distant SupervisionACM Computing Surveys10.1145/324174151:5(1-35)Online publication date: 19-Nov-2018
  • (2018)Employing Semantic Context for Sparse Information Extraction AssessmentACM Transactions on Knowledge Discovery from Data10.1145/320140712:5(1-36)Online publication date: 27-Jun-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media