Enhancing server efficiency in the face of killer microseconds

A Mirhosseini, A Sriraman… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
A Mirhosseini, A Sriraman, TF Wenisch
2019 IEEE International Symposium on High Performance Computer …, 2019ieeexplore.ieee.org
We are entering an era of “killer microseconds” in data center applications. Killer
microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O
devices or brief idle times between requests in high throughput microservices. Whereas
modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-
architectural techniques and OS context switching, they lack efficient support to hide the
latency of μs-scale stalls. Simultaneous Multithreading (SMT) is an efficient way to improve …
We are entering an era of “killer microseconds” in data center applications. Killer microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O devices or brief idle times between requests in high throughput microservices. Whereas modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-architectural techniques and OS context switching, they lack efficient support to hide the latency of μs-scale stalls. Simultaneous Multithreading (SMT) is an efficient way to improve core utilization and increase server performance density. Unfortunately, scaling SMT to provision enough threads to hide frequent μs-scale stalls is prohibitive and SMT co-location can often drastically increase the tail latency of cloud microservices. In this paper, we propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity provisions dyads (pairs) of two kinds of cores: master-cores, which each primarily executes a single latency-critical master-thread, and lender-cores, which multiplex latency-insensitive throughput threads. When the master-thread stalls, the master-core borrows filler-threads from the lender-core, filling μs-scale utilization holes of the microservice. We propose critical mechanisms, including separate memory paths for the master-thread and filler-threads, to enable master-cores to borrow filler-threads while protecting master-threads' state from disruption. Duplexity facilitates fast master-thread restart when stalls resolve and minimizes the microservice's QoS violation. Our evaluation demonstrates that Duplexity is able to achieve 1.9× higher core utilization and 2.7× lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average.
ieeexplore.ieee.org