Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3404835.3463103acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

ReadsRE: Retrieval-Augmented Distantly Supervised Relation Extraction

Published: 11 July 2021 Publication History

Abstract

Distant supervision (DS) has been widely used to automatically construct (noisy) labeled data for relation extraction (RE). To address the noisy label problem, most models have adopted the multi-instance learning paradigm by representing entity pairs as a bag of sentences. However, this strategy depends on multiple assumptions (e.g., all sentences in a bag share the same relation), which may be invalid in real-world applications. Besides, it cannot work well on long-tail entity pairs which have few supporting sentences in the dataset. In this work, we propose a new paradigm named retrieval-augmented distantly supervised relation extraction (ReadsRE), which can incorporate large-scale open-domain knowledge (e.g., Wikipedia) into the retrieval step. ReadsRE seamlessly integrates a neural retriever and a relation predictor in an end-to-end framework. We demonstrate the effectiveness of ReadsRE on the well-known NYT10 dataset. The experimental results verify that ReadsRE can effectively retrieve meaningful sentences (i.e., denoise), and relieve the problem of long-tail entity pairs in the original dataset through incorporating external open-domain corpus. Through comparisons, we show ReadsRE outperforms other baselines for this task.

Supplementary Material

MP4 File (1721.mp4)
Presentation video

References

[1]
Christoph Alt, Marc Hü bner, and Leonhard Hennig. 2019. Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL). Florence, Italy, 1388--1398.
[2]
Giusepppe Attardi. 2015. WikiExtractor. https://github.com/attardi/wikiextractor .
[3]
Ermei Cao, Difeng Wang, Jiacheng Huang, and Wei Hu. 2020. Open knowledge enrichment for long-tail entities. In Proceedings of The Web Conference (WWW). Tapei, 384--394.
[4]
Zi Chai, Xiaojun Wan, Zhao Zhang, and Minjie Li. 2019. Harvesting Drug Effectiveness from Social Media. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Paris, French, 55--64.
[5]
Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL). Vancouver, Canada, 1870--1879.
[6]
Eunsol Choi, Omer Levy, Yejin Choi, and Luke Zettlemoyer. 2018. Ultra-Fine Entity Typing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL). Melbourne, Australia, 87--96.
[7]
Despina Christou and Grigorios Tsoumakas. 2021. Improving Distantly-Supervised Relation Extraction through BERT-based Label & Instance Embeddings. arXiv preprint arXiv:2102.01156 (2021).
[8]
Xiang Deng and Huan Sun. 2019. Leveraging 2-hop Distant Supervision from Table Entity Pairs for Relation Extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, 410--420.
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Minneapolis, MN, 4171--4186.
[10]
Jinhua Du, Jingguang Han, Andy Way, and Dadong Wan. 2018. Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). Brussels, Belgium, 2216--2225.
[11]
José Esquivel, Dyaa Albakour, Miguel Martinez-Alvarez, David Corney, and Samir Moussa. 2017. On the Long-Tail Entities in News. In Proceedings of the 39th European Conference on IR Research (ECIR). Aberdeen, UK, 691--697.
[12]
Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. CS224N project report, Stanford, Vol. 1, 12 (2009), 2009.
[13]
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. Retrieval Augmented Language Model Pre-Training. In Proceedings of the 37th International Conference on Machine Learning (ICML). Virtual Event, 3929--3938.
[14]
Xu Han, Tianyu Gao, Yuan Yao, Deming Ye, Zhiyuan Liu, and Maosong Sun. 2019. OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), System Demonstrations. Hong Kong, China, 169--174.
[15]
Xu Han, Pengfei Yu, Zhiyuan Liu, Maosong Sun, and Peng Li. 2018. Hierarchical Relation Extraction with Coarse-to-Fine Grained Attention. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). Brussels, Belgium, 2236--2245.
[16]
Yuyun Huang and Jinhua Du. 2019. Self-Attention Enhanced CNNs and Collaborative Curriculum Learning for Distantly Supervised Relation Extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, 389--398.
[17]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick S. H. Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online, 6769--6781.
[18]
Kenton Lee, Ming-Wei Chang, and Kristina Toutanova. 2019. Latent Retrieval for Weakly Supervised Open Domain Question Answering. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL). Florence, Italy, 6086--6096.
[19]
Kai Lei, Daoyuan Chen, Yaliang Li, Nan Du, Min Yang, Wei Fan, and Ying Shen. 2018. Cooperative Denoising for Distantly Supervised Relation Extraction. In Proceedings of the 27th International Conference on Computational Linguistics (COLING). Santa Fe, NM, 426--436.
[20]
Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kü ttler, Mike Lewis, Wen-tau Yih, Tim Rockt"a schel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems (NeurIPS). Virtual.
[21]
Yang Li, Guodong Long, Tao Shen, Tianyi Zhou, Lina Yao, Huan Huo, and Jing Jiang. 2020. Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation Extraction. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI). New York, NY, 8269--8276.
[22]
Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural Relation Extraction with Selective Attention over Instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL). Berlin, Germany, 2124--2133.
[23]
Guiliang Liu, Xu Li, Jiakang Wang, Mingming Sun, and Ping Li. 2020. Extracting Knowledge from Web Text with Monte Carlo Tree Search. In Proceedings of The Web Conference 2020. Association for Computing Machinery.
[24]
Tianyu Liu, Kexiang Wang, Baobao Chang, and Zhifang Sui. 2017. A Soft-label Method for Noise-tolerant Distantly Supervised Relation Extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). Copenhagen, Denmark, 1790--1795.
[25]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[26]
Colin Lockard, Xin Luna Dong, Prashant Shiralkar, and Arash Einolghozati. 2018. CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web. Proc. VLDB Endow., Vol. 11, 10 (2018), 1084--1096.
[27]
Bingfeng Luo, Yansong Feng, Zheng Wang, Zhanxing Zhu, Songfang Huang, Rui Yan, and Dongyan Zhao. 2017. Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL). Vancouver, Canada, 430--439.
[28]
Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL). Singapore, 1003--1011.
[29]
Kyosuke Nishida, Itsumi Saito, Atsushi Otsuka, Hisako Asano, and Junji Tomita. 2018. Retrieve-and-Read: Multi-task Learning of Information Retrieval and Reading Comprehension. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM). Torino, Italy, 647--656.
[30]
Matthew Purver and Stuart Adam Battersby. 2012. Experimenting with Distant Supervision for Emotion Classification. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Avignon, France, 482--491.
[31]
Pengda Qin, Weiran Xu, and William Yang Wang. 2018. DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL). Melbourne, Australia, 496--505.
[32]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.
[33]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res., Vol. 21 (2020), 140:1--140:67.
[34]
Ridho Reinanda, Edgar Meij, and Maarten de Rijke. 2016. Document Filtering for Long-tail Entities. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM). Indianapolis, IN, 771--780.
[35]
Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling Relations and Their Mentions without Labeled Text. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), Part III. Barcelona, Spain, 148--163.
[36]
Stephen E. Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr., Vol. 3, 4 (2009), 333--389.
[37]
Anshumali Shrivastava and Ping Li. 2014. Asymmetric textLSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS). In Advances in Neural Information Processing Systems (NIPS). Montreal, Canada, 2321--2329.
[38]
Anshumali Shrivastava and Ping Li. 2015. Asymmetric Minwise Hashing for Indexing Binary Inner Products and Set Containment. In Proceedings of the 24th International Conference on World Wide Web (WWW). Florence, Italy, 981--991.
[39]
Tingting Sun, Chunhong Zhang, Yang Ji, and Zheng Hu. 2019. Reinforcement learning for distantly supervised relation extraction. IEEE Access, Vol. 7 (2019), 98023--98033.
[40]
Shikhar Vashishth, Rishabh Joshi, Sai Suman Prayaga, Chiranjib Bhattacharyya, and Partha P. Talukdar. 2018. RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). Brussels, Belgium, 1257--1266.
[41]
Hong Wang, Christfried Focke, Rob Sylvester, Nilesh Mishra, and William Wang. 2019. Fine-tune bert for docred with two-step process. arXiv preprint arXiv:1909.11898 (2019).
[42]
Zhiguo Wang, Patrick Ng, Ramesh Nallapati, and Bing Xiang. 2021. Retrieval, Re-ranking and Multi-task Learning for Knowledge-Base Question Answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL). Online, 347--357.
[43]
Shanchan Wu and Kai Fan. 2020. A Practical Framework for Relation Extraction with Noisy Labels Based on Doubly Transitional Loss. arXiv preprint arXiv:2004.13786 (2020).
[44]
Shanchan Wu, Kai Fan, and Qiong Zhang. 2019. Improving Distantly Supervised Relation Extraction with Neural Noise Converter and Conditional Optimal Selector. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI). Honolulu, HI, 7273--7280.
[45]
Peng Xu and Denilson Barbosa. 2019. Connecting Language and Knowledge with Heterogeneous Representations for Neural Relation Extraction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Minneapolis, MN, 3201--3206.
[46]
Lingyong Yan, Xianpei Han, Le Sun, Fangchao Liu, and Ning Bian. 2020. From Bag of Sentences to Document: Distantly Supervised Relation Extraction via Machine Reading Comprehension. arXiv preprint arXiv:2012.04334 (2020).
[47]
Zhi-Xiu Ye and Zhen-Hua Ling. 2019. Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Minneapolis, MN, 2810--2819.
[48]
Erxin Yu, Wenjuan Han, Yuan Tian, and Yi Chang. 2020. ToHRE: A Top-Down Classification Strategy with Hierarchical Bag Representation for Distantly Supervised Relation Extraction. In Proceedings of the 28th International Conference on Computational Linguistics (COLING). Barcelona, Spain (Online), 1665--1676.
[49]
Yujin Yuan, Liyuan Liu, Siliang Tang, Zhongfei Zhang, Yueting Zhuang, Shiliang Pu, Fei Wu, and Xiang Ren. 2019. Cross-Relation Cross-Bag Attention for Distantly-Supervised Relation Extraction. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI). Honolulu, HI, 419--426.
[50]
Weijie Zhao, Shulong Tan, and Ping Li. 2020. SONG: Approximate Nearest Neighbor Search on GPU. In Proceedings of the 35th IEEE International Conference on Data Engineering (ICDE). Dallas, TX, 1033--1044.
[51]
Zhixin Zhou, Shulong Tan, Zhaozhuo Xu, and Ping Li. 2019. Möbius Transformation for Fast Inner Product Search on Graph. In Advances in Neural Information Processing Systems (NeurIPS). Vancouver, Canada, 8216--8227.

Cited By

View all
  • (2024)Reading Broadly to Open Your Mind: Improving Open Relation Extraction With Search Documents Under Self-SupervisionsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331713936:5(2026-2040)Online publication date: May-2024
  • (2024)Versatile Deep Learning Pipeline for Transferable Chemical Data ExtractionJournal of Chemical Information and Modeling10.1021/acs.jcim.4c0081664:15(5888-5899)Online publication date: 15-Jul-2024
  • (2024)Distant Supervised Relation Extraction on Pre-train Model with Improved Multi-label Attention MechanismKnowledge Science, Engineering and Management10.1007/978-981-97-5492-2_24(310-321)Online publication date: 26-Jul-2024
  • Show More Cited By

Index Terms

  1. ReadsRE: Retrieval-Augmented Distantly Supervised Relation Extraction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2021
    2998 pages
    ISBN:9781450380379
    DOI:10.1145/3404835
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 July 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data augmentation
    2. distant supervision
    3. relation extraction

    Qualifiers

    • Short-paper

    Conference

    SIGIR '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)76
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 17 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Reading Broadly to Open Your Mind: Improving Open Relation Extraction With Search Documents Under Self-SupervisionsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331713936:5(2026-2040)Online publication date: May-2024
    • (2024)Versatile Deep Learning Pipeline for Transferable Chemical Data ExtractionJournal of Chemical Information and Modeling10.1021/acs.jcim.4c0081664:15(5888-5899)Online publication date: 15-Jul-2024
    • (2024)Distant Supervised Relation Extraction on Pre-train Model with Improved Multi-label Attention MechanismKnowledge Science, Engineering and Management10.1007/978-981-97-5492-2_24(310-321)Online publication date: 26-Jul-2024
    • (2023)An Open Relation Extraction Method for Domain Text Based on Hybrid Supervised LearningApplied Sciences10.3390/app1305296213:5(2962)Online publication date: 25-Feb-2023
    • (2023)Document-level Relation Extraction via Separate Relation Representation and Logical ReasoningACM Transactions on Information Systems10.1145/359761042:1(1-24)Online publication date: 21-Aug-2023
    • (2023)The manifold regularized SVDD for noisy label detectionInformation Sciences: an International Journal10.1016/j.ins.2022.10.109619:C(235-248)Online publication date: 1-Jan-2023
    • (2022)SpaceE: Knowledge Graph Embedding by Relational Linear Transformation in the Entity SpaceProceedings of the 33rd ACM Conference on Hypertext and Social Media10.1145/3511095.3531284(64-72)Online publication date: 28-Jun-2022
    • (2022)End-to-end Distantly Supervised Information Extraction with Retrieval AugmentationProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531876(2449-2455)Online publication date: 6-Jul-2022
    • (2022)Relation Extraction as Open-book ExaminationProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531746(2443-2448)Online publication date: 6-Jul-2022
    • (2021)Assorted Attention Network for Cross-Lingual Language-to-Vision RetrievalProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482233(2444-2454)Online publication date: 26-Oct-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media