Text Classification Using Label Names Only: A Language Model Self-Training Approach

Meng, Yu; Zhang, Yunyi; Huang, Jiaxin; Xiong, Chenyan; Ji, Heng; Zhang, Chao; Han, Jiawei

Computer Science > Computation and Language

arXiv:2010.07245 (cs)

[Submitted on 14 Oct 2020]

Title:Text Classification Using Label Names Only: A Language Model Self-Training Approach

Authors:Yu Meng, Yunyi Zhang, Jiaxin Huang, Chenyan Xiong, Heng Ji, Chao Zhang, Jiawei Han

View PDF

Abstract:Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Humans can perform classification without seeing any labeled examples but only based on a small set of words describing the categories to be classified. In this paper, we explore the potential of only using the label name of each class to train classification models on unlabeled data, without using any labeled documents. We use pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification. Our method (1) associates semantically related words with the label names, (2) finds category-indicative words and trains the model to predict their implied categories, and (3) generalizes the model via self-training. We show that our model achieves around 90% accuracy on four benchmark datasets including topic and sentiment classification without using any labeled documents but learning from unlabeled data supervised by at most 3 words (1 in most cases) per class as the label name.

Comments:	EMNLP 2020. (Code: this https URL)
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2010.07245 [cs.CL]
	(or arXiv:2010.07245v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2010.07245

Submission history

From: Yu Meng [view email]
[v1] Wed, 14 Oct 2020 17:06:41 UTC (2,700 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-10

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Yu Meng
Chenyan Xiong
Heng Ji
Chao Zhang
Jiawei Han

export BibTeX citation

Computer Science > Computation and Language

Title:Text Classification Using Label Names Only: A Language Model Self-Training Approach

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Text Classification Using Label Names Only: A Language Model Self-Training Approach

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators