Simple Questions Generate Named Entity Recognition Datasets

Kim, Hyunjae; Yoo, Jaehyo; Yoon, Seunghyun; Lee, Jinhyuk; Kang, Jaewoo

Computer Science > Computation and Language

arXiv:2112.08808 (cs)

[Submitted on 16 Dec 2021 (v1), last revised 5 Nov 2022 (this version, v4)]

Title:Simple Questions Generate Named Entity Recognition Datasets

Authors:Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jinhyuk Lee, Jaewoo Kang

View PDF

Abstract:Recent named entity recognition (NER) models often rely on human-annotated datasets, requiring the significant engagement of professional knowledge on the target domain and entities. This research introduces an ask-to-generate approach that automatically generates NER datasets by asking questions in simple natural language to an open-domain question answering system (e.g., "Which disease?"). Despite using fewer in-domain resources, our models, solely trained on the generated datasets, largely outperform strong low-resource models by an average F1 score of 19.4 for six popular NER benchmarks. Furthermore, our models provide competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by an F1 score of 5.2 on three benchmarks and achieve new state-of-the-art performance.

Comments:	EMNLP 2022. Code and datasets available at this https URL
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2112.08808 [cs.CL]
	(or arXiv:2112.08808v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2112.08808

Submission history

From: Hyunjae Kim [view email]
[v1] Thu, 16 Dec 2021 11:44:38 UTC (857 KB)
[v2] Tue, 24 May 2022 14:09:37 UTC (930 KB)
[v3] Sun, 23 Oct 2022 03:21:37 UTC (8,415 KB)
[v4] Sat, 5 Nov 2022 06:33:02 UTC (8,415 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-12

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Hyunjae Kim
Seunghyun Yoon
Jinhyuk Lee
Jaewoo Kang

export BibTeX citation

Computer Science > Computation and Language

Title:Simple Questions Generate Named Entity Recognition Datasets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Simple Questions Generate Named Entity Recognition Datasets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators