NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search

Panda, Rameswar; Merler, Michele; Jaiswal, Mayoore; Wu, Hui; Ramakrishnan, Kandan; Finkler, Ulrich; Chen, Chun-Fu; Cho, Minsik; Kung, David; Feris, Rogerio; Bhattacharjee, Bishwaranjan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2006.13314 (cs)

[Submitted on 23 Jun 2020 (v1), last revised 12 Feb 2021 (this version, v2)]

Title:NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search

Authors:Rameswar Panda, Michele Merler, Mayoore Jaiswal, Hui Wu, Kandan Ramakrishnan, Ulrich Finkler, Chun-Fu Chen, Minsik Cho, David Kung, Rogerio Feris, Bishwaranjan Bhattacharjee

View PDF

Abstract:Neural Architecture Search (NAS) is an open and challenging problem in machine learning. While NAS offers great promise, the prohibitive computational demand of most of the existing NAS methods makes it difficult to directly search the architectures on large-scale tasks. The typical way of conducting large scale NAS is to search for an architectural building block on a small dataset (either using a proxy set from the large dataset or a completely different small scale dataset) and then transfer the block to a larger dataset. Despite a number of recent results that show the promise of transfer from proxy datasets, a comprehensive evaluation of different NAS methods studying the impact of different source datasets has not yet been addressed. In this work, we propose to analyze the architecture transferability of different NAS methods by performing a series of experiments on large scale benchmarks such as ImageNet1K and ImageNet22K. We find that: (i) The size and domain of the proxy set does not seem to influence architecture performance on the target dataset. On average, transfer performance of architectures searched using completely different small datasets (e.g., CIFAR10) perform similarly to the architectures searched directly on proxy target datasets. However, design of proxy sets has considerable impact on rankings of different NAS methods. (ii) While different NAS methods show similar performance on a source dataset (e.g., CIFAR10), they significantly differ on the transfer performance to a large dataset (e.g., ImageNet1K). (iii) Even on large datasets, random sampling baseline is very competitive, but the choice of the appropriate combination of proxy set and search strategy can provide significant improvement over it. We believe that our extensive empirical analysis will prove useful for future design of NAS algorithms.

Comments:	19 pages, 19 Figures, 6 Tables
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
MSC classes:	68T05
ACM classes:	I.2.6; I.4
Cite as:	arXiv:2006.13314 [cs.CV]
	(or arXiv:2006.13314v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2006.13314

Submission history

From: Michele Merler [view email]
[v1] Tue, 23 Jun 2020 20:28:42 UTC (1,969 KB)
[v2] Fri, 12 Feb 2021 02:55:35 UTC (332 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators