ProtoRectifier: A Prototype Rectification Framework for Efficient Cross-Domain Text Classification with Limited Labeled Samples

Shiyao Zhao; Zhu Wang; Dingqi Yang; Xuejing Li; Bin Guo; Zhiwen Yu

doi:10.1609/icwsm.v18i1.31426

Authors

Shiyao Zhao Northwestern Polytechnical University University of Macau
Zhu Wang Northwestern Polytechnical University
Dingqi Yang University of Macau
Xuejing Li Northwestern Polytechnical University
Bin Guo Northwestern Polytechnical University
Zhiwen Yu Northwestern Polytechnical University

DOI:

https://doi.org/10.1609/icwsm.v18i1.31426

Abstract

During the past few years, with the advent of large-scale pre-trained language models (PLMs), there has been a significant advancement in cross-domain text classification with limited labeled samples. However, most existing approaches still face the problem of excessive computation overhead. While some non-pretrained language models can reduce the computation overhead, the performance could sharply drop off. To resolve few-shot learning problems on resource-limited devices with satisfactory performance, we propose a prototype rectification framework, ProtoRectifier, based on pre-trained model distillation and episodic meta-learning strategy. Specifically, a representation refactor based on DistilBERT is developed to mine text semantics. Meanwhile, a novel prototype rectification approach (i.e., Mean Shift Rectification) is put forward by making full use of the pseudo labeled query samples, so that the prototype of each category can be updated during the meta-training phase without introducing additional time overhead. Experiments on multiple real-world datasets demonstrate that ProtoRectifier outperforms the state-of-the-art baselines, not only achieving high cross-domain classification accuracy but also reducing the computation overhead significantly.

ProtoRectifier: A Prototype Rectification Framework for Efficient Cross-Domain Text Classification with Limited Labeled Samples

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information