Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer SystemsPages 524–541https://doi.org/10.1145/3627703.3629580Single-Program-Multiple-Data (SPMD) parallelism has recently been adopted to train large deep neural networks (DNNs). Few studies have explored its applicability on heterogeneous clusters, to fully exploit available resources for large model learning. ...
- research-articleNovember 2022
Accelerating large-scale distributed neural network training with SPMD parallelism
SoCC '22: Proceedings of the 13th Symposium on Cloud ComputingPages 403–418https://doi.org/10.1145/3542929.3563487Deep neural networks (DNNs) with trillions of parameters have emerged, e.g., Mixture-of-Experts (MoE) models. Training models of this scale requires sophisticated parallelization strategies like the newly proposed SPMD parallelism, that shards each ...