TaxoClass: Hierarchical Multi-Label Text Classification Using Only Class Names

Jiaming Shen; Wenda Qiu; Yu Meng; Jingbo Shang; Xiang Ren; Jiawei Han

doi:10.18653/v1/2021.naacl-main.335

TaxoClass: Hierarchical Multi-Label Text Classification Using Only Class Names

Jiaming Shen, Wenda Qiu, Yu Meng, Jingbo Shang, Xiang Ren, Jiawei Han

Abstract

Hierarchical multi-label text classification (HMTC) aims to tag each document with a set of classes from a taxonomic class hierarchy. Most existing HMTC methods train classifiers using massive human-labeled documents, which are often too costly to obtain in real-world applications. In this paper, we explore to conduct HMTC based on only class surface names as supervision signals. We observe that to perform HMTC, human experts typically first pinpoint a few most essential classes for the document as its “core classes”, and then check core classes’ ancestor classes to ensure the coverage. To mimic human experts, we propose a novel HMTC framework, named TaxoClass. Specifically, TaxoClass (1) calculates document-class similarities using a textual entailment model, (2) identifies a document’s core classes and utilizes confident core classes to train a taxonomy-enhanced classifier, and (3) generalizes the classifier via multi-label self-training. Our experiments on two challenging datasets show TaxoClass can achieve around 0.71 Example-F1 using only class names, outperforming the best previous method by 25%.

Anthology ID:: 2021.naacl-main.335
Volume:: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:: June
Year:: 2021
Address:: Online
Editors:: Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4239–4249
Language:
URL:: https://aclanthology.org/2021.naacl-main.335
DOI:: 10.18653/v1/2021.naacl-main.335
Bibkey:
Cite (ACL):: Jiaming Shen, Wenda Qiu, Yu Meng, Jingbo Shang, Xiang Ren, and Jiawei Han. 2021. TaxoClass: Hierarchical Multi-Label Text Classification Using Only Class Names. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4239–4249, Online. Association for Computational Linguistics.
Cite (Informal):: TaxoClass: Hierarchical Multi-Label Text Classification Using Only Class Names (Shen et al., NAACL 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.naacl-main.335.pdf
Video:: https://aclanthology.org/2021.naacl-main.335.mp4

PDF Cite Search Video