Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

Relation Extraction Using Distant Supervision: A Survey

Published: 19 November 2018 Publication History

Abstract

Relation extraction is a subtask of information extraction where semantic relationships are extracted from natural language text and then classified. In essence, it allows us to acquire structured knowledge from unstructured text. In this article, we present a survey of relation extraction methods that leverage pre-existing structured or semi-structured data to guide the extraction process. We introduce a taxonomy of existing methods and describe distant supervision approaches in detail. We describe, in addition, the evaluation methodologies and the datasets commonly used for quality assessment. Finally, we give a high-level outlook on the field, highlighting open problems as well as the most promising research directions.

References

[1]
Eugene Agichtein and Luis Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM Conference on Digital Libraries. 85--94.
[2]
Alan Akbik, Thilo Michael, and Christoph Boden. 2014. Exploratory relation extraction in large text corpora. In Proceedings of the 25th International Conference on Computational Linguistics (COLING’14). 2087--2096.
[3]
Enrique Alfonseca, Katja Filippova, Jean-Yves Delort, and Guillermo Garrido. 2012. Pattern learning for relation extraction with a hierarchical topic model. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 54--59.
[4]
Gabor Angeli, Julie Tibshirani, Jean Wu, and Christopher D. Manning. 2014. Combining distant and partial supervision for relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1556--1567.
[5]
Isabelle Augenstein, Diana Maynard, and Fabio Ciravegna. 2016. Distantly supervised web relation extraction for knowledge base population. Semant. Web 7 (2016), 335--349.
[6]
Nguyen Bach and Sameer Badaskar. 2007. A survey on relation extraction. Language Technologies Institute, Carnegie Mellon University.
[7]
Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), Vol. 7. 2670--2676.
[8]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (2003), 993--1022.
[9]
Sergey Brin. 1998. Extracting patterns and relations from the world wide web. In Proceedings of the International Workshop on the World Wide Web and Databases (WebDB’98). Springer, 172--183.
[10]
Kai-Wei Chang, Scott Wen-tau Yih, Bishan Yang, and Chris Meek. 2014. Typed tensor decomposition of knowledge bases for relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1568--1579.
[11]
Peter Pin-Shan Chen. 1976. The entity-relationship model—Toward a unified view of data. ACM Trans. Database Syst. 1 (1976), 9--36.
[12]
Mark Craven, Johan Kumlien, et al. 1999. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology. 77--86.
[13]
Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of the 21st World Wide Web Conference (WWW’12). 469--478.
[14]
Rodrigo Dienstmann, In Sock Jang, Brian Bot, Stephen Friend, and Justin Guinney. 2015. Database of genomic biomarkers for cancer drugs and clinical targetability in solid tumors. Cancer Discov. 5 (2015), 118--123.
[15]
Patrick Ernst, Amy Siu, and Gerhard Weikum. 2018. HighLife: Higher-arity fact harvesting. In Proceedings of the World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1013--1022.
[16]
Oren Etzioni, Michele Banko, Stephen Soderland, and Daniel S. Weld. 2008. Open information extraction from the web. Commun. ACM 51 (2008), 68--74.
[17]
Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the web: An experimental study. Artific. Intell. 165, 1 (2005), 91--134.
[18]
Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam Mausam. 2011. Open information extraction: The second generation. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI’11), vol. 11. 3--10.
[19]
Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1535--1545.
[20]
Miao Fan, Deli Zhao, Qiang Zhou, Zhiyuan Liu, Thomas Fang Zheng, and Edward Y. Chang. 2014. Distant supervision for relation extraction with matrix completion. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14). 839--849.
[21]
Evgeniy Gabrilovich, Michael Ringgaard, and Amarnag Subramanya. 2013. FACC1: Freebase annotation of ClueWeb corpora, Version 1. Retrieved from http://lemurproject.org/clueweb09/FACC1/Cited by.
[22]
Xianpei Han and Le Sun. 2016. Global distant supervision for relation extraction. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2950--2956.
[23]
Xianpei Han, Le Sun, and Jun Zhao. 2011. Collective entity linking in web text: A graph-based method. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 765--774.
[24]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9 (1997), 1735--1780.
[25]
Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies—Volume 1. 541--550.
[26]
Heng Ji, Ralph Grishman, and Hoa Dang. 2011. Overview of the TAC2011 knowledge base population track. In Proceedings of the Text Analysis Conference.
[27]
Heng Ji, Ralph Grishman, Hoa Trang Dang, Kira Griffitt, and Joe Ellis. 2010. Overview of the TAC 2010 knowledge base population track. In Proceedings of the Text Analysis Conference.
[28]
Jin-Dong Kim, Tomoko Ohta, Sampo Pyysalo, Yoshinobu Kano, and Jun’ichi Tsujii. 2009. Overview of BioNLP’09 shared task on event extraction. In Proceedings of the Biomedical Natural Language Processing Workshop Companion Volume for Shared Task (BioNLP@HLT-NAACL’09). 1--9.
[29]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1746--1751.
[30]
Johannes Kirschnick, Alan Akbik, and Holmer Hemsen. 2014. Freepal: A large collection of deep lexico-syntactic patterns for relation extraction. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 2071--2075.
[31]
Mitchell Koch, John Gilmer, Stephen Soderland, and Daniel S. Weld. 2014. Type-aware distantly supervised relation extraction with linked arguments. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1891--1901.
[32]
Heeyoung Lee, Angel Chang, Yves Peirsman, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. 2013. Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput. Linguist. 39 (2013), 885--916.
[33]
David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5 (2004), 361--397.
[34]
Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural relation extraction with selective attention over instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16).
[35]
Shiqian Ma, Donald Goldfarb, and Lifeng Chen. 2011. Fixed point and Bregman iterative methods for matrix rank minimization. Math. Program. 128 (2011), 321--353.
[36]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv Preprint arXiv:1301.3781.
[37]
David Milne and Ian H. Witten. 2008. Learning to link with wikipedia. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). 509--518.
[38]
Bonan Min, Ralph Grishman, Li Wan, Chang Wang, and David Gondek. 2013. Distant supervision for relation extraction with an incomplete knowledge base. In Proceedings of the Human Language Technologies Conference of the North American Chapter of the Association of Computational Linguistics. 777--782.
[39]
Bonan Min, Xiang Li, Ralph Grishman, and Ang Sun. 2012. New york university 2012 system for KBP slot filling. In Proceedings of the 5th Text Analysis Conference (TAC’12).
[40]
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 1003--1011.
[41]
Raymond J. Mooney and Razvan C. Bunescu. 2006. Subsequence kernels for relation extraction. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’05). 171--178.
[42]
Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity linking meets word sense disambiguation: A unified approach. Trans. Assoc. Comput. Linguist. 2 (2014), 231--244.
[43]
Thien Huu Nguyen and Ralph Grishman. 2015. Relation extraction: Perspective from convolutional neural networks. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing (VS@NAACL-HLT’15). 39--48.
[44]
Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A three-way model for collective learning on multi-relational data. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 809--816.
[45]
Joakim Nivre. 2006. Inductive Dependency Parsing, Vol. 34. Springer.
[46]
Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-sentence N-ary relation extraction with graph LSTMs. Trans. Assoc. Comput. Linguist. (2017). arXiv preprint arXiv:1708.03743.
[47]
Maria Pershina, Bonan Min, Wei Xu, and Ralph Grishman. 2014. Infusion of labeled data into distant supervision for relation extraction. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14). 732--738.
[48]
Hoifung Poon, Kristina Toutanova, and Chris Quirk. 2015. Distant supervision for cancer pathway extraction from text. In Proceedings of the Pacific Symposium on Biocomputing. 120--131.
[49]
Deepak Ravichandran and Eduard Hovy. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 41--47.
[50]
Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD’10). Springer, 148--163.
[51]
Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation extraction with matrix factorization and universal schemas. In Proceedings of the Human Language Technologies Conference of the North American Chapter of the Association of Computational Linguistics. 74--84.
[52]
Alan Ritter, Luke Zettlemoyer, Oren Etzioni, et al. 2013. Modeling missing data in distant supervision for information extraction. Trans. Assoc. Comput. Linguis. 1, 367--378.
[53]
Tim Rocktäschel, Sameer Singh, and Sebastian Riedel. 2015. Injecting logical background knowledge into embeddings for relation extraction. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT’15). 1119--1129.
[54]
Benjamin Roth, Tassilo Barth, Michael Wiegand, and Dietrich Klakow. 2013. A survey of noise reduction methods for distant supervision. In Proceedings of the Workshop on Automated Knowledge Base Construction (AKBC@CIKM’13). 73--78.
[55]
Benjamin Roth and Dietrich Klakow. 2013. Combining generative and discriminative model scores for distant supervision. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 24--29.
[56]
Benjamin Roth and Dietrich Klakow. 2013. Feature-based models for improving the quality of noisy training data for relation extraction. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). 1181--1184.
[57]
Evan Sandhaus. 2008. The new york times annotated corpus. Proceedings of the Linguistic Data Consortium.
[58]
Carl F. Schaefer, Kira Anthony, Shiva Krupa, Jeffrey Buchoff, Matthew Day, Timo Hannay, and Kenneth H. Buetow. 2009. PID: The pathway interaction database. Nucleic Acids Res. 37 (2009), 674--679.
[59]
Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27 (2015), 443--460.
[60]
Yusuke Shinyama and Satoshi Sekine. 2006. Preemptive information extraction using unrestricted relation discovery. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Association for Computational Linguistics, 304--311.
[61]
Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. 2012. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 455--465.
[62]
Shingo Takamatsu, Issei Sato, and Hiroshi Nakagawa. 2012. Reducing wrong labels in distant supervision for relation extraction. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 721--729.
[63]
Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, and Karl Aberer. 2013. Trank: Ranking entity types using the web of data. In Proceedings of the 12th International Semantic Web Conference on the Semantic Web (ISWC’13). Springer, 640--656.
[64]
Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. 2015. Representing text for joint embedding of text and knowledge bases. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1499--1509.
[65]
Chang Wang, James Fan, Aditya Kalyanpur, and David Gondek. 2011. Relation extraction with relation topics. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1426--1436.
[66]
Jason Weston, Antoine Bordes, Oksana Yakhnenko, and Nicolas Usunier. 2013. Connecting language and knowledge bases with embedding models for relation extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 1366--1371.
[67]
Michael Wick, Khashayar Rohanimanesh, Aron Culotta, and Andrew McCallum. 2009. Samplerank: Learning preferences from atomic gradients. In Proceedings of the Workshop on Advances in Ranking: Neural Information Processing Systems (NIPS’09).
[68]
Fei Wu and Daniel S. Weld. 2007. Autonomously semantifying Wikipedia. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). 41--50.
[69]
Fei Wu and Daniel S. Weld. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). 118--127.
[70]
Wei Xu, Raphael Hoffmann, Le Zhao, and Ralph Grishman. 2013. Filling knowledge base gaps for distant supervision of relation extraction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13). 665--670.
[71]
Limin Yao, Aria Haghighi, Sebastian Riedel, and Andrew McCallum. 2011. Structured relation discovery using generative models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1456--1466.
[72]
Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. 2003. Kernel methods for relation extraction. J. Mach. Learn. Res. 3 (2003), 1083--1106.
[73]
Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1753--1762.
[74]
Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, Jun Zhao et al. 2014. Relation classification via convolutional deep neural network. In Proceedings of the 25th International Conference on Computational Linguistics (COLING’14). 2335--2344.
[75]
Ce Zhang. 2015. DeepDive: A data management system for automatic knowledge base construction. University of Wisconsin-Madison, Madison, Wisconsin.

Cited By

View all
  • (2024)KBPT: knowledge-based prompt tuning for zero-shot relation triplet extractionPeerJ Computer Science10.7717/peerj-cs.201410(e2014)Online publication date: 24-May-2024
  • (2024)MetaTron: advancing biomedical annotation empowering relation annotation and collaborationBMC Bioinformatics10.1186/s12859-024-05730-925:1Online publication date: 14-Mar-2024
  • (2024)Gait‐based human recognition based on millimetre wave multiple input multiple output radar point cloud constructed using velocity‐depth‐timeIET Radar, Sonar & Navigation10.1049/rsn2.1257718:8(1381-1389)Online publication date: 27-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 51, Issue 5
September 2019
791 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3271482
  • Editor:
  • Sartaj Sahni
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 November 2018
Accepted: 01 July 2018
Revised: 01 July 2018
Received: 01 March 2018
Published in CSUR Volume 51, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Relation extraction
  2. distant supervision
  3. knowledge graph

Qualifiers

  • Survey
  • Research
  • Refereed

Funding Sources

  • ERC 683253/GraphInt

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)181
  • Downloads (Last 6 weeks)14
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)KBPT: knowledge-based prompt tuning for zero-shot relation triplet extractionPeerJ Computer Science10.7717/peerj-cs.201410(e2014)Online publication date: 24-May-2024
  • (2024)MetaTron: advancing biomedical annotation empowering relation annotation and collaborationBMC Bioinformatics10.1186/s12859-024-05730-925:1Online publication date: 14-Mar-2024
  • (2024)Gait‐based human recognition based on millimetre wave multiple input multiple output radar point cloud constructed using velocity‐depth‐timeIET Radar, Sonar & Navigation10.1049/rsn2.1257718:8(1381-1389)Online publication date: 27-May-2024
  • (2024)Relational concept enhanced prototypical network for incremental few-shot relation classificationKnowledge-Based Systems10.1016/j.knosys.2023.111282284:COnline publication date: 17-Apr-2024
  • (2024)A platform-based Natural Language processing-driven strategy for digitalising regulatory compliance processes for the built environmentAdvanced Engineering Informatics10.1016/j.aei.2024.10265362(102653)Online publication date: Oct-2024
  • (2024)A Review of Relationship Extraction Based on Deep LearningArtificial Intelligence and Machine Learning10.1007/978-981-97-1277-9_6(73-84)Online publication date: 3-Apr-2024
  • (2024)Weakly Supervised Relation ExtractionInnovative Methods in Computer Science and Computational Applications in the Era of Industry 5.010.1007/978-3-031-56322-5_9(100-112)Online publication date: 6-Apr-2024
  • (2024)Toward a Human-in-the-Loop Approach to Create Training Datasets for RDF LexicalisationIntelligent Systems and Applications10.1007/978-3-031-47721-8_6(84-101)Online publication date: 10-Jan-2024
  • (2024)Knowledge graph‐driven data processing for business intelligenceWIREs Data Mining and Knowledge Discovery10.1002/widm.152914:3Online publication date: 11-Feb-2024
  • (2023)Event-Centric Temporal Knowledge Graph Construction: A SurveyMathematics10.3390/math1123485211:23(4852)Online publication date: 2-Dec-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media