Once-for-All: Train One Network and Specialize it for Efficient Deployment on Diverse Hardware Platforms

Cai, Han; Gan, Chuang; Wang, Tianzhe; Zhang, Zhekai; Han, Song

Computer Science > Machine Learning

arXiv:1908.09791v3 (cs)

[Submitted on 26 Aug 2019 (v1), revised 8 Mar 2020 (this version, v3), latest version 29 Apr 2020 (v5)]

Title:Once-for-All: Train One Network and Specialize it for Efficient Deployment on Diverse Hardware Platforms

Authors:Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

View PDF

Abstract:We address the challenging problem of efficient deep learning model deployment across many devices and diverse constraints, from general-purpose hardware to specialized accelerators. Conventional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally prohibitive (causing $CO_2$ emission as much as 5 cars' lifetime) thus unscalable. To reduce the cost, our key idea is to decouple model training from architecture search. To this end, we propose to train a once-for-all network (OFA) that supports diverse architectural settings (depth, width, kernel size, and resolution). Given a deployment scenario, we can then quickly get a specialized sub-network by selecting from the OFA network without additional training. To prevent interference between many sub-networks during training, we also propose a novel progressive shrinking algorithm, which can train a surprisingly large number of sub-networks ($> 10^{19}$) simultaneously. Extensive experiments on various hardware platforms (CPU, GPU, mCPU, mGPU, FPGA accelerator) show that OFA consistently outperforms SOTA NAS methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNetV3) while reducing orders of magnitude GPU hours and $CO_2$ emission. In particular, OFA achieves a new SOTA 80.0% ImageNet top1 accuracy under the mobile setting ($<$600M FLOPs). Code and pre-trained models are released at this https URL.

Comments:	ICLR 2020
Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Cite as:	arXiv:1908.09791 [cs.LG]
	(or arXiv:1908.09791v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1908.09791

Submission history

From: Han Cai [view email]
[v1] Mon, 26 Aug 2019 16:46:23 UTC (784 KB)
[v2] Sun, 5 Jan 2020 20:26:58 UTC (1,444 KB)
[v3] Sun, 8 Mar 2020 18:18:22 UTC (2,013 KB)
[v4] Sun, 26 Apr 2020 23:02:50 UTC (2,660 KB)
[v5] Wed, 29 Apr 2020 20:49:05 UTC (2,518 KB)

Computer Science > Machine Learning

Title:Once-for-All: Train One Network and Specialize it for Efficient Deployment on Diverse Hardware Platforms

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Once-for-All: Train One Network and Specialize it for Efficient Deployment on Diverse Hardware Platforms

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators