Training with Quantization Noise for Extreme Model Compression

Fan, Angela; Stock, Pierre; Graham, Benjamin; Grave, Edouard; Gribonval, Remi; Jegou, Herve; Joulin, Armand

Computer Science > Machine Learning

arXiv:2004.07320 (cs)

[Submitted on 15 Apr 2020 (v1), last revised 28 Feb 2021 (this version, v3)]

Title:Training with Quantization Noise for Extreme Model Compression

Authors:Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, Remi Gribonval, Herve Jegou, Armand Joulin

View PDF

Abstract:We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods where the approximations introduced by STE are severe, such as Product Quantization. Our proposal is to only quantize a different random subset of weights during each forward, allowing for unbiased gradients to flow through the other weights. Controlling the amount of noise and its form allows for extreme compression rates while maintaining the performance of the original model. As a result we establish new state-of-the-art compromises between accuracy and model size both in natural language processing and image classification. For example, applying our method to state-of-the-art Transformer and ConvNet architectures, we can achieve 82.5% accuracy on MNLI by compressing RoBERTa to 14MB and 80.0 top-1 accuracy on ImageNet by compressing an EfficientNet-B3 to 3.3MB.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2004.07320 [cs.LG]
	(or arXiv:2004.07320v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2004.07320

Submission history

From: Angela Fan [view email]
[v1] Wed, 15 Apr 2020 20:10:53 UTC (655 KB)
[v2] Fri, 17 Apr 2020 11:59:18 UTC (655 KB)
[v3] Sun, 28 Feb 2021 21:43:34 UTC (942 KB)

Computer Science > Machine Learning

Title:Training with Quantization Noise for Extreme Model Compression

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Training with Quantization Noise for Extreme Model Compression

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators