Simple Questions Generate Named Entity Recognition Datasets

Kim, Hyunjae; Yoo, Jaehyo; Yoon, Seunghyun; Lee, Jinhyuk; Kang, Jaewoo

Computer Science > Computation and Language

arXiv:2112.08808v1 (cs)

[Submitted on 16 Dec 2021 (this version), latest version 5 Nov 2022 (v4)]

Title:Simple Questions Generate Named Entity Recognition Datasets

Authors:Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jinhyuk Lee, Jaewoo Kang

View PDF

Abstract:Named entity recognition (NER) is a task of extracting named entities of specific types from text. Current NER models often rely on human-annotated datasets requiring the vast engagement of professional knowledge on the target domain and entities. This work introduces an ask-to-generate approach, which automatically generates NER datasets by asking simple natural language questions that reflect the needs for entity types (e.g., Which disease?) to an open-domain question answering system. Without using any in-domain resources (i.e., training sentences, labels, or in-domain dictionaries), our models solely trained on our generated datasets largely outperform previous weakly supervised models on six NER benchmarks across four different domains. Surprisingly, on NCBI-disease, our model achieves 75.5 F1 score and even outperforms the previous best weakly supervised model by 4.1 F1 score, which utilizes a rich in-domain dictionary provided by domain experts. Formulating the needs of NER with natural language also allows us to build NER models for fine-grained entity types such as Award, where our model even outperforms fully supervised models. On three few-shot NER benchmarks, our model achieves new state-of-the-art performance.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2112.08808 [cs.CL]
	(or arXiv:2112.08808v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2112.08808

Submission history

From: Hyunjae Kim [view email]
[v1] Thu, 16 Dec 2021 11:44:38 UTC (857 KB)
[v2] Tue, 24 May 2022 14:09:37 UTC (930 KB)
[v3] Sun, 23 Oct 2022 03:21:37 UTC (8,415 KB)
[v4] Sat, 5 Nov 2022 06:33:02 UTC (8,415 KB)

Computer Science > Computation and Language

Title:Simple Questions Generate Named Entity Recognition Datasets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Simple Questions Generate Named Entity Recognition Datasets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators