short-paper

ReadsRE: Retrieval-Augmented Distantly Supervised Relation Extraction

Authors:

Ping LiAuthors Info & Claims

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2257 - 2262

https://doi.org/10.1145/3404835.3463103

Published: 11 July 2021 Publication History

Abstract

Distant supervision (DS) has been widely used to automatically construct (noisy) labeled data for relation extraction (RE). To address the noisy label problem, most models have adopted the multi-instance learning paradigm by representing entity pairs as a bag of sentences. However, this strategy depends on multiple assumptions (e.g., all sentences in a bag share the same relation), which may be invalid in real-world applications. Besides, it cannot work well on long-tail entity pairs which have few supporting sentences in the dataset. In this work, we propose a new paradigm named retrieval-augmented distantly supervised relation extraction (ReadsRE), which can incorporate large-scale open-domain knowledge (e.g., Wikipedia) into the retrieval step. ReadsRE seamlessly integrates a neural retriever and a relation predictor in an end-to-end framework. We demonstrate the effectiveness of ReadsRE on the well-known NYT10 dataset. The experimental results verify that ReadsRE can effectively retrieve meaningful sentences (i.e., denoise), and relieve the problem of long-tail entity pairs in the original dataset through incorporating external open-domain corpus. Through comparisons, we show ReadsRE outperforms other baselines for this task.

Supplementary Material

MP4 File (1721.mp4)

Presentation video

Download
11.58 MB

References

[1]

Christoph Alt, Marc Hü bner, and Leonhard Hennig. 2019. Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL). Florence, Italy, 1388--1398.

[2]

Giusepppe Attardi. 2015. WikiExtractor. https://github.com/attardi/wikiextractor .

[3]

Ermei Cao, Difeng Wang, Jiacheng Huang, and Wei Hu. 2020. Open knowledge enrichment for long-tail entities. In Proceedings of The Web Conference (WWW). Tapei, 384--394.

Digital Library

[4]

Zi Chai, Xiaojun Wan, Zhao Zhang, and Minjie Li. 2019. Harvesting Drug Effectiveness from Social Media. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Paris, French, 55--64.

Digital Library

[5]

Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL). Vancouver, Canada, 1870--1879.

[6]

Eunsol Choi, Omer Levy, Yejin Choi, and Luke Zettlemoyer. 2018. Ultra-Fine Entity Typing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL). Melbourne, Australia, 87--96.

[7]

Despina Christou and Grigorios Tsoumakas. 2021. Improving Distantly-Supervised Relation Extraction through BERT-based Label & Instance Embeddings. arXiv preprint arXiv:2102.01156 (2021).

[8]

Xiang Deng and Huan Sun. 2019. Leveraging 2-hop Distant Supervision from Table Entity Pairs for Relation Extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, 410--420.

[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Minneapolis, MN, 4171--4186.

[10]

Jinhua Du, Jingguang Han, Andy Way, and Dadong Wan. 2018. Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). Brussels, Belgium, 2216--2225.

[11]

José Esquivel, Dyaa Albakour, Miguel Martinez-Alvarez, David Corney, and Samir Moussa. 2017. On the Long-Tail Entities in News. In Proceedings of the 39th European Conference on IR Research (ECIR). Aberdeen, UK, 691--697.

[12]

Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. CS224N project report, Stanford, Vol. 1, 12 (2009), 2009.

[13]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. Retrieval Augmented Language Model Pre-Training. In Proceedings of the 37th International Conference on Machine Learning (ICML). Virtual Event, 3929--3938.

[14]

Xu Han, Tianyu Gao, Yuan Yao, Deming Ye, Zhiyuan Liu, and Maosong Sun. 2019. OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), System Demonstrations. Hong Kong, China, 169--174.

[15]

Xu Han, Pengfei Yu, Zhiyuan Liu, Maosong Sun, and Peng Li. 2018. Hierarchical Relation Extraction with Coarse-to-Fine Grained Attention. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). Brussels, Belgium, 2236--2245.

[16]

Yuyun Huang and Jinhua Du. 2019. Self-Attention Enhanced CNNs and Collaborative Curriculum Learning for Distantly Supervised Relation Extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, 389--398.

[17]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick S. H. Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online, 6769--6781.

[18]

Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. 2019. Latent Retrieval for Weakly Supervised Open Domain Question Answering. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL). Florence, Italy, 6086--6096.

[19]

Kai Lei, Daoyuan Chen, Yaliang Li, Nan Du, Min Yang, Wei Fan, and Ying Shen. 2018. Cooperative Denoising for Distantly Supervised Relation Extraction. In Proceedings of the 27th International Conference on Computational Linguistics (COLING). Santa Fe, NM, 426--436.

[20]

Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kü ttler, Mike Lewis, Wen-tau Yih, Tim Rockt"a schel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems (NeurIPS). Virtual.

[21]

Yang Li, Guodong Long, Tao Shen, Tianyi Zhou, Lina Yao, Huan Huo, and Jing Jiang. 2020. Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation Extraction. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI). New York, NY, 8269--8276.

[22]

Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural Relation Extraction with Selective Attention over Instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL). Berlin, Germany, 2124--2133.

[23]

Guiliang Liu, Xu Li, Jiakang Wang, Mingming Sun, and Ping Li. 2020. Extracting Knowledge from Web Text with Monte Carlo Tree Search. In Proceedings of The Web Conference 2020. Association for Computing Machinery.

Digital Library

[24]

Tianyu Liu, Kexiang Wang, Baobao Chang, and Zhifang Sui. 2017. A Soft-label Method for Noise-tolerant Distantly Supervised Relation Extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). Copenhagen, Denmark, 1790--1795.

[25]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

[26]

Colin Lockard, Xin Luna Dong, Prashant Shiralkar, and Arash Einolghozati. 2018. CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web. Proc. VLDB Endow., Vol. 11, 10 (2018), 1084--1096.

Digital Library

[27]

Bingfeng Luo, Yansong Feng, Zheng Wang, Zhanxing Zhu, Songfang Huang, Rui Yan, and Dongyan Zhao. 2017. Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL). Vancouver, Canada, 430--439.

[28]

Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL). Singapore, 1003--1011.

[29]

Kyosuke Nishida, Itsumi Saito, Atsushi Otsuka, Hisako Asano, and Junji Tomita. 2018. Retrieve-and-Read: Multi-task Learning of Information Retrieval and Reading Comprehension. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM). Torino, Italy, 647--656.

Digital Library

[30]

Matthew Purver and Stuart Adam Battersby. 2012. Experimenting with Distant Supervision for Emotion Classification. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Avignon, France, 482--491.

Digital Library

[31]

Pengda Qin, Weiran Xu, and William Yang Wang. 2018. DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL). Melbourne, Australia, 496--505.

[32]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.

[33]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res., Vol. 21 (2020), 140:1--140:67.

[34]

Ridho Reinanda, Edgar Meij, and Maarten de Rijke. 2016. Document Filtering for Long-tail Entities. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM). Indianapolis, IN, 771--780.

Digital Library

[35]

Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling Relations and Their Mentions without Labeled Text. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), Part III. Barcelona, Spain, 148--163.

Digital Library

[36]

Stephen E. Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr., Vol. 3, 4 (2009), 333--389.

Digital Library

[37]

Anshumali Shrivastava and Ping Li. 2014. Asymmetric textLSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS). In Advances in Neural Information Processing Systems (NIPS). Montreal, Canada, 2321--2329.

[38]

Anshumali Shrivastava and Ping Li. 2015. Asymmetric Minwise Hashing for Indexing Binary Inner Products and Set Containment. In Proceedings of the 24th International Conference on World Wide Web (WWW). Florence, Italy, 981--991.

Digital Library

[39]

Tingting Sun, Chunhong Zhang, Yang Ji, and Zheng Hu. 2019. Reinforcement learning for distantly supervised relation extraction. IEEE Access, Vol. 7 (2019), 98023--98033.

[40]

Shikhar Vashishth, Rishabh Joshi, Sai Suman Prayaga, Chiranjib Bhattacharyya, and Partha P. Talukdar. 2018. RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). Brussels, Belgium, 1257--1266.

[41]

Hong Wang, Christfried Focke, Rob Sylvester, Nilesh Mishra, and William Wang. 2019. Fine-tune bert for docred with two-step process. arXiv preprint arXiv:1909.11898 (2019).

[42]

Zhiguo Wang, Patrick Ng, Ramesh Nallapati, and Bing Xiang. 2021. Retrieval, Re-ranking and Multi-task Learning for Knowledge-Base Question Answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL). Online, 347--357.

[43]

Shanchan Wu and Kai Fan. 2020. A Practical Framework for Relation Extraction with Noisy Labels Based on Doubly Transitional Loss. arXiv preprint arXiv:2004.13786 (2020).

[44]

Shanchan Wu, Kai Fan, and Qiong Zhang. 2019. Improving Distantly Supervised Relation Extraction with Neural Noise Converter and Conditional Optimal Selector. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI). Honolulu, HI, 7273--7280.

Digital Library

[45]

Peng Xu and Denilson Barbosa. 2019. Connecting Language and Knowledge with Heterogeneous Representations for Neural Relation Extraction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Minneapolis, MN, 3201--3206.

[46]

Lingyong Yan, Xianpei Han, Le Sun, Fangchao Liu, and Ning Bian. 2020. From Bag of Sentences to Document: Distantly Supervised Relation Extraction via Machine Reading Comprehension. arXiv preprint arXiv:2012.04334 (2020).

[47]

Zhi-Xiu Ye and Zhen-Hua Ling. 2019. Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Minneapolis, MN, 2810--2819.

[48]

Erxin Yu, Wenjuan Han, Yuan Tian, and Yi Chang. 2020. ToHRE: A Top-Down Classification Strategy with Hierarchical Bag Representation for Distantly Supervised Relation Extraction. In Proceedings of the 28th International Conference on Computational Linguistics (COLING). Barcelona, Spain (Online), 1665--1676.

[49]

Yujin Yuan, Liyuan Liu, Siliang Tang, Zhongfei Zhang, Yueting Zhuang, Shiliang Pu, Fei Wu, and Xiang Ren. 2019. Cross-Relation Cross-Bag Attention for Distantly-Supervised Relation Extraction. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI). Honolulu, HI, 419--426.

Digital Library

[50]

Weijie Zhao, Shulong Tan, and Ping Li. 2020. SONG: Approximate Nearest Neighbor Search on GPU. In Proceedings of the 35th IEEE International Conference on Data Engineering (ICDE). Dallas, TX, 1033--1044.

[51]

Zhixin Zhou, Shulong Tan, Zhaozhuo Xu, and Ping Li. 2019. Möbius Transformation for Fast Inner Product Search on Graph. In Advances in Neural Information Processing Systems (NeurIPS). Vancouver, Canada, 8216--8227.

Cited By

Hu XHong ZZhang CLiu AMeng SWen LKing IYu P(2024)Reading Broadly to Open Your Mind: Improving Open Relation Extraction With Search Documents Under Self-SupervisionsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331713936:5(2026-2040)Online publication date: May-2024
https://doi.org/10.1109/TKDE.2023.3317139
Alshehri AHorstmann KYou F(2024)Versatile Deep Learning Pipeline for Transferable Chemical Data ExtractionJournal of Chemical Information and Modeling10.1021/acs.jcim.4c0081664:15(5888-5899)Online publication date: 15-Jul-2024
https://doi.org/10.1021/acs.jcim.4c00816
Zhao QYin CFan XChen HChai YOuyang Y(2024)Distant Supervised Relation Extraction on Pre-train Model with Improved Multi-label Attention MechanismKnowledge Science, Engineering and Management10.1007/978-981-97-5492-2_24(310-321)Online publication date: 26-Jul-2024
https://doi.org/10.1007/978-981-97-5492-2_24
Show More Cited By

Index Terms

ReadsRE: Retrieval-Augmented Distantly Supervised Relation Extraction
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction

Recommendations

End-to-end Distantly Supervised Information Extraction with Retrieval Augmentation
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Distant supervision (DS) has been a prevalent approach to generating labeled data for information extraction (IE) tasks. However, DS often suffers from noisy label problems, where the labels are extracted from the knowledge base (KB), regardless of the ...
Learning labeling functions in distantly supervised relation extraction

Distant supervision has become the leading method for training large-scale information extractors. It could be encoded in the form of labeling functions, which employ knowledge bases to provide labels for the data. However, most previous works use only ...
Clustering-Augmented Multi-instance Learning for Neural Relation Extraction
Advances in Information Retrieval
Abstract
Despite its efficiency in generating training data, distant supervision for sentential relation extraction assigns labels to instances in a context-agnostic manner—a process that may introduce false labels and confuse sentential model learning. In ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2021

2998 pages

ISBN:9781450380379

DOI:10.1145/3404835

General Chairs:
Fernando Diaz
(Google)
,
Chirag Shah
University of Washington
,
Torsten Suel
New York University
,
Program Chairs:
Pablo Castells
Universidad Autónoma de Madrid, Amazon
,
Rosie Jones
Spotify
,
Tetsuya Sakai
Waseda University

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '21

Sponsor:

SIGIR

SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2021

Virtual Event, Canada

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
550
Total Downloads

Downloads (Last 12 months)76
Downloads (Last 6 weeks)6

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hu XHong ZZhang CLiu AMeng SWen LKing IYu P(2024)Reading Broadly to Open Your Mind: Improving Open Relation Extraction With Search Documents Under Self-SupervisionsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331713936:5(2026-2040)Online publication date: May-2024
https://doi.org/10.1109/TKDE.2023.3317139
Alshehri AHorstmann KYou F(2024)Versatile Deep Learning Pipeline for Transferable Chemical Data ExtractionJournal of Chemical Information and Modeling10.1021/acs.jcim.4c0081664:15(5888-5899)Online publication date: 15-Jul-2024
https://doi.org/10.1021/acs.jcim.4c00816
Zhao QYin CFan XChen HChai YOuyang Y(2024)Distant Supervised Relation Extraction on Pre-train Model with Improved Multi-label Attention MechanismKnowledge Science, Engineering and Management10.1007/978-981-97-5492-2_24(310-321)Online publication date: 26-Jul-2024
https://doi.org/10.1007/978-981-97-5492-2_24
Wang XHu J(2023)An Open Relation Extraction Method for Domain Text Based on Hybrid Supervised LearningApplied Sciences10.3390/app1305296213:5(2962)Online publication date: 25-Feb-2023
https://doi.org/10.3390/app13052962
Huang HYuan CLiu QCao Y(2023)Document-level Relation Extraction via Separate Relation Representation and Logical ReasoningACM Transactions on Information Systems10.1145/359761042:1(1-24)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1145/3597610
Wu XLiu SBai Y(2023)The manifold regularized SVDD for noisy label detectionInformation Sciences: an International Journal10.1016/j.ins.2022.10.109619:C(235-248)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.ins.2022.10.109
Yu JCai YSun MLi PBellogín ABoratto LCena F(2022)SpaceE: Knowledge Graph Embedding by Relational Linear Transformation in the Entity SpaceProceedings of the 33rd ACM Conference on Hypertext and Social Media10.1145/3511095.3531284(64-72)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3511095.3531284
Zhang YFei HLi PAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)End-to-end Distantly Supervised Information Extraction with Retrieval AugmentationProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531876(2449-2455)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531876
Chen XLi LZhang NTan CHuang FSi LChen HAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Relation Extraction as Open-book ExaminationProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531746(2443-2448)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531746
Yu TYang YFei HLi YChen XLi PDemartini GZuccon GCulpepper JHuang ZTong H(2021)Assorted Attention Network for Cross-Lingual Language-to-Vision RetrievalProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482233(2444-2454)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482233

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents