Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

Liu, Shiwei; Chen, Tianlong; Atashgahi, Zahra; Chen, Xiaohan; Sokar, Ghada; Mocanu, Elena; Pechenizkiy, Mykola; Wang, Zhangyang; Mocanu, Decebal Constantin

Computer Science > Machine Learning

arXiv:2106.14568v3 (cs)

[Submitted on 28 Jun 2021 (v1), revised 2 Nov 2021 (this version, v3), latest version 7 Feb 2022 (v4)]

Title:Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

Authors:Shiwei Liu, Tianlong Chen, Zahra Atashgahi, Xiaohan Chen, Ghada Sokar, Elena Mocanu, Mykola Pechenizkiy, Zhangyang Wang, Decebal Constantin Mocanu

View PDF

Abstract:Recent works on sparse neural networks have demonstrated the possibility to train a sparse subnetwork independently from scratch, to match the performance of its corresponding dense network. However, identifying such sparse subnetworks (winning tickets) either involves a costly iterative train-prune-retrain process (e.g., Lottery Ticket Hypothesis) or an over-extended training time (e.g., Dynamic Sparse Training). In this work, we draw a unique connection between sparse neural network training and the deep ensembling technique, yielding a novel ensemble learning framework called FreeTickets. Instead of starting from a dense network, FreeTickets randomly initializes a sparse subnetwork and then trains the subnetwork while dynamically adjusting its sparse mask, resulting in many diverse sparse subnetworks throughout the training process. FreeTickets is defined as the ensemble of these sparse subnetworks freely obtained during this one-pass, sparse-to-sparse training, which uses only a fraction of the computational resources required by the vanilla dense training. Moreover, despite being an ensemble of models, FreeTickets has even fewer parameters and training FLOPs compared to a single dense model: this seemingly counter-intuitive outcome is due to the high sparsity of each subnetwork. FreeTickets is observed to demonstrate a significant all-round improvement compared to standard dense baselines, in prediction accuracy, uncertainty estimation, robustness, and efficiency. FreeTickets easily outperforms the naive deep ensemble with ResNet50 on ImageNet using only a quarter of the training FLOPs required by the latter. Our results provide insights into the strength of sparse neural networks and suggest that the benefits of sparsity go way beyond the usually expected inference efficiency.

Comments:	preprint version
Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2106.14568 [cs.LG]
	(or arXiv:2106.14568v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2106.14568

Submission history

From: Shiwei Liu [view email]
[v1] Mon, 28 Jun 2021 10:48:20 UTC (158 KB)
[v2] Thu, 14 Oct 2021 04:52:51 UTC (1,570 KB)
[v3] Tue, 2 Nov 2021 03:27:36 UTC (1,570 KB)
[v4] Mon, 7 Feb 2022 12:21:13 UTC (1,975 KB)

Computer Science > Machine Learning

Title:Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators