Nexus: A GPU cluster engine for accelerating DNN-based video analysis

H Shen, L Chen, Y Jin, L Zhao, B Kong… - Proceedings of the 27th …, 2019 - dl.acm.org
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019dl.acm.org
We address the problem of serving Deep Neural Networks (DNNs) efficiently from a cluster
of GPUs. In order to realize the promise of very low-cost processing made by accelerators
such as GPUs, it is essential to run them at sustained high utilization. Doing so requires
cluster-scale resource management that performs detailed scheduling of GPUs, reasoning
about groups of DNN invocations that need to be co-scheduled, and moving from the
conventional whole-DNN execution model to executing fragments of DNNs. Nexus is a fully …
We address the problem of serving Deep Neural Networks (DNNs) efficiently from a cluster of GPUs. In order to realize the promise of very low-cost processing made by accelerators such as GPUs, it is essential to run them at sustained high utilization. Doing so requires cluster-scale resource management that performs detailed scheduling of GPUs, reasoning about groups of DNN invocations that need to be co-scheduled, and moving from the conventional whole-DNN execution model to executing fragments of DNNs. Nexus is a fully implemented system that includes these innovations. In large-scale case studies on 16 GPUs, when required to stay within latency constraints at least 99% of the time, Nexus can process requests at rates 1.8-12.7X higher than state of the art systems can. A long-running multi-application deployment stays within 84% of optimal utilization and, on a 100-GPU cluster, violates latency SLOs on 0.27% of requests.
ACM Digital Library