Simple Questions Generate Named Entity Recognition Datasets

Kim, Hyunjae; Yoo, Jaehyo; Yoon, Seunghyun; Lee, Jinhyuk; Kang, Jaewoo

Computer Science > Computation and Language

arXiv:2112.08808v2 (cs)

[Submitted on 16 Dec 2021 (v1), revised 24 May 2022 (this version, v2), latest version 5 Nov 2022 (v4)]

Title:Simple Questions Generate Named Entity Recognition Datasets

Authors:Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jinhyuk Lee, Jaewoo Kang

View PDF

Abstract:Recent named entity recognition (NER) models often rely on human-annotated datasets requiring the vast engagement of professional knowledge on the target domain and entities. This work introduces an ask-to-generate approach, which automatically generates NER datasets by asking simple natural language questions to an open-domain question answering system (e.g., "Which disease?"). Despite using fewer training resources, our models solely trained on the generated datasets largely outperform strong low-resource models by 20.8 F1 score on average across six popular NER benchmarks. Our models also show competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by 5.2 F1 score on three benchmarks and achieve new state-of-the-art performance.

Comments:	Code available at this https URL
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2112.08808 [cs.CL]
	(or arXiv:2112.08808v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2112.08808

Submission history

From: Hyunjae Kim [view email]
[v1] Thu, 16 Dec 2021 11:44:38 UTC (857 KB)
[v2] Tue, 24 May 2022 14:09:37 UTC (930 KB)
[v3] Sun, 23 Oct 2022 03:21:37 UTC (8,415 KB)
[v4] Sat, 5 Nov 2022 06:33:02 UTC (8,415 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-12

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Hyunjae Kim
Seunghyun Yoon
Jinhyuk Lee
Jaewoo Kang

export BibTeX citation

Computer Science > Computation and Language

Title:Simple Questions Generate Named Entity Recognition Datasets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Simple Questions Generate Named Entity Recognition Datasets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators