research-article

Document-Level Relation Extraction Based on Machine Reading Comprehension and Hybrid Pointer-sequence Labeling

Authors:

Jianshe ZhouAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 23, Issue 7

Article No.: 100, Pages 1 - 16

https://doi.org/10.1145/3666042

Published: 26 June 2024 Publication History

Abstract

Document-level relational extraction requires reading, memorization, and reasoning to discover relevant factual information in multiple sentences. It is difficult for the current hierarchical network and graph network methods to fully capture the structural information behind the document and make natural reasoning from the context. Different from the previous methods, this article reconstructs the relation extraction task into a machine reading comprehension task. Each pair of entities and relationships is characterized by a question template, and the extraction of entities and relationships is translated into identifying answers from the context. To enhance the context comprehension ability of the extraction model and achieve more precise extraction, we introduce large language models (LLMs) during question construction, enabling the generation of exemplary answers. Besides, to solve the multi-label and multi-entity problems in documents, we propose a new answer extraction model based on hybrid pointer-sequence labeling, which improves the reasoning ability of the model and realizes the extraction of zero or multiple answers in documents. Extensive experiments on three public datasets show that the proposed method is effective.

References

[1]

Kun Xu, Yansong Feng, Songfang Huang, and Dongyan Zhao. 2015. Semantic relation classification via convolutional neural networks with simple negative sampling. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 536–540.

[2]

Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. 2014. Relation classification via convolutional deep neural network. In Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers (COLING’14). 2335–2344.

[3]

Razieh Baradaran, Razieh Ghiasi, and Hossein Amirkhani. 2022. A survey on machine reading comprehension systems. Natural Language Engineering 28, 6 (2022), 683–732.

[4]

Hengzhu Tang, Yanan Cao, Zhenyu Zhang, Jiangxia Cao, Fang Fang, Shi Wang, and Pengfei Yin. 2020. HIN: Hierarchical inference network for document-level relation extraction. In Proceedings of the 24th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD’20), Part I 24. Springer, 197–209.

Digital Library

[5]

Difeng Wang, Wei Hu, Ermei Cao, and Weijian Sun. 2020. Global-to-local neural networks for document-level relation extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 3711–3721.

[6]

Shuang Zeng, Runxin Xu, Baobao Chang, and Lei Li. 2020. Double graph based reasoning for document-level relation extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 1630–1640.

[7]

Patrick Verga, Emma Strubell, and Andrew McCallum. 2018. Simultaneously self-attending to all mentions for full-abstract biological relation extraction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 872–884.

[8]

Dat Quoc Nguyen and Karin Verspoor. 2018. Convolutional neural networks for chemical-disease relation extraction are improved with character-based word embeddings. In Proceedings of the BioNLP 2018 Workshop. 129–136.

[9]

Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, and Maosong Sun. 2019. DocRED: A large-scale document-level relation extraction dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 764–777.

[10]

Hong Wang, Christfried Focke, Rob Sylvester, Nilesh Mishra, and William Wang. 2019. Fine-tune BERT for docred with two-step process. arXiv preprint arXiv:1909.11898 (2019).

[11]

Ningyu Zhang, Xiang Chen, Xin Xie, Shumin Deng, Chuanqi Tan, Mosha Chen, Fei Huang, Luo Si, and Huajun Chen. 2021. Document-level relation extraction as semantic segmentation. In Proceedings of the 30th International Joint Conference on Artificial Intelligence. 3999–4006.

[12]

Lige Yang, Liping Zheng, and Lijuan Zheng. 2020. Research on extraction of human information entity relationship based on improved capsule network. In 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI’20). IEEE, 41–45.

[13]

Xinsong Zhang, Pengshuai Li, Weijia Jia, and Hai Zhao. 2019. Multi-labeled relation extraction with attentive capsule network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 7484–7491.

Digital Library

[14]

Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5 (2017), 339–351.

[15]

Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-sentence N-ary relation extraction with graph LSTMs. Transactions of the Association for Computational Linguistics 5 (2017), 101–115.

[16]

Linfeng Song, Yue Zhang, Zhiguo Wang, and Daniel Gildea. 2018. N-ary relation extraction using graph-state LSTM. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2226–2235.

[17]

Fenia Christopoulou, Makoto Miwa, and Sophia Ananiadou. 2019. Connecting the dots: Document-level neural relation extraction with edge-oriented graphs. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 4925–4936.

[18]

Wang Xu, Kehai Chen, and Tiejun Zhao. 2021. Document-level relation extraction with reconstruction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 14167–14175.

[19]

Limeng Cui, Haeseung Seo, Maryam Tabar, Fenglong Ma, Suhang Wang, and Dongwon Lee. 2020. Deterrent: Knowledge guided graph attention network for detecting healthcare misinformation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 492–502.

Digital Library

[20]

Qingyu Tan, Ruidan He, Lidong Bing, and Hwee Tou Ng. 2022. Document-level relation extraction with adaptive focal loss and knowledge distillation. In Findings of the Association for Computational Linguistics (ACL’22). 1672–1681.

[21]

Yang Chen and Bowen Shi. 2024. Enhanced heterogeneous graph attention network with a novel multilabel focal loss for document-level relation extraction. Entropy 26, 3 (2024), 210.

[22]

Wenxuan Zhou, Kevin Huang, Tengyu Ma, and Jing Huang. 2021. Document-level relation extraction with adaptive thresholding and localized context pooling. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 14612–14620.

[23]

Jiaxin Yu, Deqing Yang, and Shuyu Tian. 2022. Relation-specific attentions over entity mentions for enhanced document-level relation extraction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1523–1529.

[24]

Quan Yuan, Yunpeng Xu, and Chengliang Tang. 2023. Document-level relation extraction method based on path labels. Journal of Computer Applications 43, 4 (2023), 1029.

[25]

Omer Levy, Minjoon Seo, Eunsol Choi, and Luke Zettlemoyer. 2017. Zero-shot relation extraction via reading comprehension. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL’17). 333–342.

[26]

Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, and Richard Socher. 2018. The natural language decathlon: Multitask learning as question answering. arXiv preprint arXiv:1806.08730 (2018).

[27]

Xiaoya Li, Jingrong Feng, Yuxian Meng, Qinghong Han, Fei Wu, and Jiwei Li. 2020. A unified MRC framework for named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 5849–5859.

[28]

Tianyang Zhao, Zhao Yan, Yunbo Cao, and Zhoujun Li. 2021. Asking effective and diverse questions: A machine reading comprehension based framework for joint entity-relation extraction. In Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence. 3948–3954.

[29]

Zhaoyang Feng, Xing Wang, and Deqian Fu. 2022. Dual machine reading comprehension for event extraction. In 2022 12th International Conference on Information Science and Technology (ICIST’22). IEEE, 317–324.

[30]

Alexander J. Ratner, Christopher M. De Sa, Sen Wu, Daniel Selsam, and Christopher Ré. 2016. Data programming: Creating large training sets, quickly. Advances in Neural Information Processing Systems 29 (2016), 3567–3575.

[31]

Minghao Hu, Yuxing Peng, Zhen Huang, and Dongsheng Li. 2019. A multi-type multi-span network for reading comprehension that requires discrete reasoning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 1596–1606.

[32]

Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems 2 (2015), 2692–2700.

[33]

Ye Wu, Ruibang Luo, Henry C. M. Leung, Hing-Fung Ting, and Tak-Wah Lam. 2019. Renet: A deep learning approach for extracting gene-disease associations from literature. In Proceedings of the 23rd Annual International Conference on Research in Computational Molecular Biology (RECOMB’19). Springer, 272–284.

[34]

Guoshun Nan, Zhijiang Guo, Ivan Sekulic, and Wei Lu. 2020. Reasoning with latent structure refinement for document-level relation extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1546–1557.

[35]

Jinghang Gu, Fuqing Sun, Longhua Qian, and Guodong Zhou. 2017. Chemical-induced disease relation extraction via convolutional neural network. Database 2017 (2017), bax024.

[36]

Sunil Kumar Sahu, Fenia Christopoulou, Makoto Miwa, and Sophia Ananiadou. 2019. Inter-sentence relation extraction with document-level graph convolutional neural network. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4309–4316.

[37]

Chen Haotian, Chen Yijiang, and Zhou Xiangdong. 2024. Understanding more knowledge makes the transformer perform better in document-level relation extraction. In Asian Conference on Machine Learning. PMLR, 231–246.

Index Terms

Document-Level Relation Extraction Based on Machine Reading Comprehension and Hybrid Pointer-sequence Labeling
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction

Recommendations

NA-Aware Machine Reading Comprehension for Document-Level Relation Extraction
Machine Learning and Knowledge Discovery in Databases. Research Track
Abstract
Document-level relation extraction aims to identify semantic relations between target entities from the document.Most of the existing work roughly treats the document as a long sequence and produces target-agnostic representation for relation ...
Multi-passage extraction-based machine reading comprehension based on verification sorting
Highlights
- This paper proposes a sequencing-based multi-passage reading comprehension model. Compared with the traditional machine reading model, the innovation of this paper is to combine the retriever with the machine reading model, avoiding the ...
Abstract
For traditional single-passage machine reading comprehension, the text data of a single passage does not well reflect the complexity of practical application scenarios. Many researchers have shifted their research goals to study multi-passage ...
Graphical abstract

Display Omitted
Document-level relation extraction with three channels
Abstract
Relation extraction aims to identify the relation between two entities. Early works on relation extraction are performed within a sentence. Numerous entity pairs with relations are distributed across sentences, so the task of document-level ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23, Issue 7

July 2024

254 pages

EISSN:2375-4702

DOI:10.1145/3613605

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2024

Online AM: 01 June 2024

Accepted: 21 May 2024

Revised: 20 March 2024

Received: 13 March 2023

Published in TALLIP Volume 23, Issue 7

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key Research and Development Program of China
National Natural Science Foundation of China
14th Five-Year Scientific Research Plan of the National language commission

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
107
Total Downloads

Downloads (Last 12 months)107
Downloads (Last 6 weeks)16

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents