Transformer to CNN: Label-scarce distillation for efficient text classification

Chia, Yew Ken; Witteveen, Sam; Andrews, Martin

Computer Science > Machine Learning

arXiv:1909.03508 (cs)

[Submitted on 8 Sep 2019]

Title:Transformer to CNN: Label-scarce distillation for efficient text classification

Authors:Yew Ken Chia, Sam Witteveen, Martin Andrews

View PDF

Abstract:Significant advances have been made in Natural Language Processing (NLP) modelling since the beginning of 2018. The new approaches allow for accurate results, even when there is little labelled data, because these NLP models can benefit from training on both task-agnostic and task-specific unlabelled data. However, these advantages come with significant size and computational costs. This workshop paper outlines how our proposed convolutional student architecture, having been trained by a distillation process from a large-scale model, can achieve 300x inference speedup and 39x reduction in parameter count. In some cases, the student model performance surpasses its teacher on the studied tasks.

Comments:	Accepted paper for CDNNRIA workshop at NeurIPS 2018. (3 pages + references)
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Cite as:	arXiv:1909.03508 [cs.LG]
	(or arXiv:1909.03508v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1909.03508

Submission history

From: Martin Andrews [view email]
[v1] Sun, 8 Sep 2019 16:57:26 UTC (28 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-09

Change to browse by:

cs
cs.CL
cs.IR
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Martin Andrews

export BibTeX citation

Computer Science > Machine Learning

Title:Transformer to CNN: Label-scarce distillation for efficient text classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Transformer to CNN: Label-scarce distillation for efficient text classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators