PGA-SciRE: Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction

Zhou, Yang; Shan, Shimin; Wei, Hongkui; Zhao, Zhehuan; Feng, Wenshuo

Computer Science > Computation and Language

arXiv:2405.20787 (cs)

[Submitted on 30 May 2024]

Title:PGA-SciRE: Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction

Authors:Yang Zhou, Shimin Shan, Hongkui Wei, Zhehuan Zhao, Wenshuo Feng

View PDF HTML (experimental)

Abstract:Relation Extraction (RE) aims at recognizing the relation between pairs of entities mentioned in a text. Advances in LLMs have had a tremendous impact on NLP. In this work, we propose a textual data augmentation framework called PGA for improving the performance of models for RE in the scientific domain. The framework introduces two ways of data augmentation, utilizing a LLM to obtain pseudo-samples with the same sentence meaning but with different representations and forms by paraphrasing the original training set samples. As well as instructing LLM to generate sentences that implicitly contain information about the corresponding labels based on the relation and entity of the original training set samples. These two kinds of pseudo-samples participate in the training of the RE model together with the original dataset, respectively. The PGA framework in the experiment improves the F1 scores of the three mainstream models for RE within the scientific domain. Also, using a LLM to obtain samples can effectively reduce the cost of manually labeling data.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2405.20787 [cs.CL]
	(or arXiv:2405.20787v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2405.20787

Submission history

From: Yang Zhou [view email]
[v1] Thu, 30 May 2024 13:07:54 UTC (1,102 KB)

Computer Science > Computation and Language

Title:PGA-SciRE: Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:PGA-SciRE: Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators