Shen: Nexus: A GPU cluster engine for accelerating...

Nexus: A GPU cluster engine for accelerating DNN-based video analysis

H Shen, L Chen, Y Jin, L Zhao, B Kong… - Proceedings of the 27th …, 2019 - dl.acm.org

H Shen, L Chen, Y Jin, L Zhao, B Kong, M Philipose, A Krishnamurthy, R Sundaram

Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019•dl.acm.org

We address the problem of serving Deep Neural Networks (DNNs) efficiently from a cluster of GPUs. In order to realize the promise of very low-cost processing made by accelerators such as GPUs, it is essential to run them at sustained high utilization. Doing so requires cluster-scale resource management that performs detailed scheduling of GPUs, reasoning about groups of DNN invocations that need to be co-scheduled, and moving from the conventional whole-DNN execution model to executing fragments of DNNs. Nexus is a fully implemented system that includes these innovations. In large-scale case studies on 16 GPUs, when required to stay within latency constraints at least 99% of the time, Nexus can process requests at rates 1.8-12.7X higher than state of the art systems can. A long-running multi-application deployment stays within 84% of optimal utilization and, on a 100-GPU cluster, violates latency SLOs on 0.27% of requests.

ACM Digital Library

Show moreShow less

Save Cite Cited by 146 Related articles All 7 versions

Cite

Advanced search

Saved to My library

Nexus: A GPU cluster engine for accelerating DNN-based video analysis