Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2020
FFT-based Gradient Sparsification for the Distributed Training of Deep Neural Networks
HPDC '20: Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed ComputingPages 113–124https://doi.org/10.1145/3369583.3392681The performance and efficiency of distributed training of Deep Neural Networks (DNN) highly depend on the performance of gradient averaging among participating processes, a step bound by communication costs. There are two major approaches to reduce ...
- research-articleJune 2020
DCDB Wintermute: Enabling Online and Holistic Operational Data Analytics on HPC Systems
HPDC '20: Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed ComputingPages 101–112https://doi.org/10.1145/3369583.3392674As we approach the exascale era, the size and complexity of HPC systems continues to increase, raising concerns about their manageability and sustainability. For this reason, more and more HPC centers are experimenting with fine-grained monitoring ...