KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks

Pauloski, J. Gregory; Huang, Qi; Huang, Lei; Venkataraman, Shivaram; Chard, Kyle; Foster, Ian; Zhang, Zhao

doi:10.1145/3458817.3476152

Computer Science > Machine Learning

arXiv:2107.01739 (cs)

[Submitted on 4 Jul 2021 (v1), last revised 20 Sep 2021 (this version, v2)]

Title:KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks

Authors:J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, Zhao Zhang

View PDF

Abstract:Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC's larger memory footprint hinders its applicability to large models. We present KAISA, a K-FAC-enabled, Adaptable, Improved, and ScAlable second-order optimizer framework that adapts the memory footprint, communication, and computation given specific models and hardware to improve performance and increase scalability. We quantify the tradeoffs between memory and communication cost and evaluate KAISA on large models, including ResNet-50, Mask R-CNN, U-Net, and BERT, on up to 128 NVIDIA A100 GPUs. Compared to the original optimizers, KAISA converges 18.1-36.3% faster across applications with the same global batch size. Under a fixed memory budget, KAISA converges 32.5% and 41.6% faster in ResNet-50 and BERT-Large, respectively. KAISA can balance memory and communication to achieve scaling efficiency equal to or better than the baseline optimizers. KAISA is open source and available at this https URL.

Comments:	Accepted for publication at the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21)
Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2107.01739 [cs.LG]
	(or arXiv:2107.01739v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2107.01739
Related DOI:	https://doi.org/10.1145/3458817.3476152

Submission history

From: J. Gregory Pauloski [view email]
[v1] Sun, 4 Jul 2021 21:34:22 UTC (1,067 KB)
[v2] Mon, 20 Sep 2021 14:47:11 UTC (1,067 KB)

Computer Science > Machine Learning

Title:KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators