Distributed computing methodologies

Applied Filters

People

Publications

Conferences

Reproducibility Badges

Publication Date

2 ResultsEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,784,777 records)|Limit your search to The ACM Full-Text Collection (765,874 records)

Showing 1 - 2of2 Results

Select All

Export Citations Save to Binder

per page:

Recency

research-article
April 2024
Artifacts Evaluated & Functional / v1.1
Artifacts Available / v1.1
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer SystemsPages 524–541https://doi.org/10.1145/3627703.3629580

Single-Program-Multiple-Data (SPMD) parallelism has recently been adopted to train large deep neural networks (DNNs). Few studies have explored its applicability on heterogeneous clusters, to fully exploit available resources for large model learning. ...
0
233
Metrics
Total Citations0
Total Downloads233
Last 12 Months233
Last 6 weeks52
Get Access
research-article
November 2022
Accelerating large-scale distributed neural network training with SPMD parallelism
SoCC '22: Proceedings of the 13th Symposium on Cloud ComputingPages 403–418https://doi.org/10.1145/3542929.3563487

Deep neural networks (DNNs) with trillions of parameters have emerged, e.g., Mixture-of-Experts (MoE) models. Training models of this scale requires sophisticated parallelization strategies like the newly proposed SPMD parallelism, that shards each ...
1
431
Metrics
Total Citations1
Total Downloads431
Last 12 Months118
Last 6 weeks10
Get Access