Just-in-time Quantization with Processing-In-Memory for Efficient ML Training

Ibrahim, Mohamed Assem; Aga, Shaizeen; Li, Ada; Pati, Suchita; Islam, Mahzabeen

Computer Science > Hardware Architecture

arXiv:2311.05034 (cs)

[Submitted on 8 Nov 2023]

Title:Just-in-time Quantization with Processing-In-Memory for Efficient ML Training

Authors:Mohamed Assem Ibrahim, Shaizeen Aga, Ada Li, Suchita Pati, Mahzabeen Islam

View PDF

Abstract:Data format innovations have been critical for machine learning (ML) scaling, which in turn fuels ground-breaking ML capabilities. However, even in the presence of low-precision formats, model weights are often stored in both high-precision and low-precision during training. Furthermore, with emerging directional data formats (e.g., MX9, MX6, etc.) multiple low-precision weight copies can be required. To lower memory capacity needs of weights, we explore just-in-time quantization (JIT-Q) where we only store high-precision weights in memory and generate low-precision weights only when needed. To perform JIT-Q efficiently, in this work, we evaluate emerging processing-in-memory (PIM) technology to execute quantization. With PIM, we can offload quantization to in-memory compute units enabling quantization to be performed without incurring costly data movement while allowing quantization to be concurrent with accelerator computation. Our proposed PIM-offloaded quantization keeps up with GPU compute and delivers considerable capacity savings (up to 24\%) at marginal throughput loss (up to 2.4\%). Said memory capacity savings can unlock several benefits such as fitting larger model in the same system, reducing model parallelism requirement, and improving overall ML training efficiency.

Subjects:	Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2311.05034 [cs.AR]
	(or arXiv:2311.05034v1 [cs.AR] for this version)
	https://doi.org/10.48550/arXiv.2311.05034

Submission history

From: Mohamed Ibrahim [view email]
[v1] Wed, 8 Nov 2023 21:44:37 UTC (609 KB)

Computer Science > Hardware Architecture

Title:Just-in-time Quantization with Processing-In-Memory for Efficient ML Training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Hardware Architecture

Title:Just-in-time Quantization with Processing-In-Memory for Efficient ML Training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators