Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Treadmill: attributing the source of tail latency through precise load testing and statistical inference

Published: 18 June 2016 Publication History

Abstract

Managing tail latency of requests has become one of the primary challenges for large-scale Internet services. Data centers are quickly evolving and service operators frequently desire to make changes to the deployed software and production hardware configurations. Such changes demand a confident understanding of the impact on one's service, in particular its effect on tail latency (e.g., 95th- or 99th-percentile response latency of the service). Evaluating the impact on the tail is challenging because of its inherent variability. Existing tools and methodologies for measuring these effects suffer from a number of deficiencies including poor load tester design, statistically inaccurate aggregation, and improper attribution of effects. As shown in the paper, these pitfalls can often result in misleading conclusions.
In this paper, we develop a methodology for statistically rigorous performance evaluation and performance factor attribution for server workloads. First, we find that careful design of the server load tester can ensure high quality performance evaluation, and empirically demonstrate the inaccuracy of load testers in previous work. Learning from the design flaws in prior work, we design and develop a modular load tester platform, Treadmill, that overcomes pitfalls of existing tools. Next, utilizing Treadmill, we construct measurement and analysis procedures that can properly attribute performance factors. We rely on statistically-sound performance evaluation and quantile regression, extending it to accommodate the idiosyncrasies of server systems. Finally, we use our augmented methodology to evaluate the impact of common server hardware features with Facebook production workloads on production hardware. We decompose the effects of these features on request tail latency and demonstrate that our evaluation methodology provides superior results, particularly in capturing complicated and counter-intuitive performance behaviors. By tuning the hardware features as suggested by the attribution, we reduce the 99th-percentile latency by 43% and its variance by 93%.

References

[1]
D. Meisner, C. M. Sadler, L. A. Barroso, W.-D. Weber, and T. F. Wenisch, "Power management of online data-intensive services," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2011.
[2]
J. Dean and L. A. Barroso, "The tail at scale," Commun. ACM, 2013.
[3]
J. Li, N. K. Sharma, D. R. K. Ports, and S. D. Gribble, "Tales of the tail: Hardware, os, and application-level sources of tail latency," in Proceedings of the Symposium on Cloud Computing (SoCC), 2014.
[4]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, "Benchmarking cloud serving systems with ycsb," in Proceedings of the Symposium on Cloud Computing (SoCC), 2010.
[5]
X. Fan, W.-D. Weber, and L. A. Barroso, "Power provisioning for a warehouse-sized computer," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2007.
[6]
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the clouds: A study of emerging scale-out workloads on modern hardware," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2012.
[7]
K. Lim, P. Ranganathan, J. Chang, C. Patel, T. Mudge, and S. Reinhardt, "Understanding and designing new server architectures for emerging warehouse-computing environments," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2008.
[8]
L. Wang, J. Zhan, C. Luo, Y. Zhu, Q. Yang, Y. He, W. Gao, Z. Jia, Y. Shi, S. Zhang, C. Zheng, G. Lu, K. Zhan, X. Li, and B. Qiu, "Bigdatabench: A big data benchmark suite from internet services," in Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), 2014.
[9]
J. Hauswald, M. A. Laurenzano, Y. Zhang, C. Li, A. Rovinski, A. Khurana, R. G. Dreslinski, T. Mudge, V. Petrucci, L. Tang, and J. Mars, "Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015.
[10]
J. Hauswald, Y. Kang, M. A. Laurenzano, Q. Chen, C. Li, T. Mudge, R. G. Dreslinski, J. Mars, and L. Tang, "Djinn and tonic: Dnn as a service and its implications for future warehouse scale computers," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2015.
[11]
L. A. Barroso, J. Clidaras, and U. Hölzle, "The datacenter as a computer: An introduction to the design of warehouse-scale machines," Synthesis lectures on computer architecture, 2013.
[12]
D. G. Feitelson, E. Frachtenberg, and K. L. Beck, "Development and deployment at facebook," IEEE Internet Computing, 2013.
[13]
C. Delimitrou and C. Kozyrakis, "Paragon: Qos-aware scheduling for heterogeneous datacenters," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2013.
[14]
C. Delimitrou and C. Kozyrakis, "Quasar: Resource-efficient and qos-aware cluster management," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2014.
[15]
S. Kanev, K. Hazelwood, G.-Y. Wei, and D. Brooks, "Tradeoffs between power management and tail latency in warehouse-scale applications," in Proceedings of the International Symposium on Workload Characterization (IISWC), 2014.
[16]
H. Kasture and D. Sanchez, "Ubik: Efficient cache sharing with strict qos for latency-critical workloads," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2014.
[17]
D. Lo and C. Kozyrakis, "Dynamic management of turbo-mode in modern multi-core chips," in Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), 2014.
[18]
D. Lo, L. Cheng, R. Govindaraju, L. A. Barroso, and C. Kozyrakis, "Towards energy proportionality for large-scale latency-critical workloads," in Proceedings of the International Symposium on Computer Architecuture (ISCA), 2014.
[19]
S. M. Zahedi and B. C. Lee, "Ref: Resource elasticity fairness with sharing incentives for multiprocessors," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2014.
[20]
Y. Zhang, M. A. Laurenzano, J. Mars, and L. Tang, "Smite: Precise qos prediction on real-system smt processors to improve utilization in warehouse scale computers," in Proceedings of the International Symposium on Microarchitecture (MICRO), 2014.
[21]
C. H. Hsu, Y. Zhang, M. A. Laurenzano, D. Meisner, T. Wenisch, J. Mars, L. Tang, and R. G. Dreslinski, "Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting," in Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), 2015.
[22]
H. Yang, A. Breslow, J. Mars, and L. Tang, "Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2013.
[23]
L. Tang, J. Mars, X. Zhang, R. Hagmann, R. Hundt, and E. Tune, "Optimizing google's warehouse scale computers: The numa experience," in Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), 2013.
[24]
J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa, "Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations," in Proceedings of the International Symposium on Microarchitecture (MICRO), 2011.
[25]
L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa, "The impact of memory subsystem resource sharing on datacenter applications," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2011.
[26]
R. Koenker, Quantile regression, 2005.
[27]
A. Likhtarov, R. Nishtala, R. McElroy, H. Fugal, A. Grynenko, and V. Venkataramani, "Introducing mcrouter: A memcached protocol router for scaling memcached deployments," 2014.
[28]
"Faban performance workload creation and execution framework," http://faban.org/.
[29]
J. Leverich and C. Kozyrakis, "Reconciling high server utilization and sub-millisecond quality-of-service," in Proceedings of the European Conference on Computer Systems (EuroSys), 2014.
[30]
"Spec jenterprise," https://www.spec.org/jEnterprise2010/.
[31]
"Spec sip-infrastructure 2011," https://www.spec.org/specsip/.
[32]
B. Schroeder, A. Wierman, and M. Harchol-Balter, "Open versus closed: A cautionary tale." in Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI), 2006.
[33]
C. Curtsinger and E. D. Berger, "Stabilizer: Statistically sound performance evaluation," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2013.
[34]
H. Fugal, "Wangle," https://github.com/facebook/wangle.
[35]
R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani, "Scaling memcache at facebook," in Proceedings of the USENIX Conference on Networked Systems Design and Implementation (NSDI), 2013.
[36]
B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny, "Workload analysis of a large-scale key-value store," in Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 2012.
[37]
B. Fitzpatrick, "Distributed caching with memcached," Linux J., 2004.
[38]
C. Aniszczyk, "Caching with twemcache," https://blog.twitter.com/2012/caching-with-twemcache, 2012.
[39]
B. Fan, D. G. Andersen, and M. Kaminsky, "Memc3: Compact and concurrent memcache with dumber caching and smarter hashing." in Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI), 2013.
[40]
K. Lim, D. Meisner, A. G. Saidi, P. Ranganathan, and T. F. Wenisch, "Thin servers with smart pipes: Designing soc accelerators for memcached," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2013.
[41]
T. H. Hetherington, M. O'Connor, and T. M. Aamodt, "Memcachedgpu: Scaling-up scale-out key-value stores," in Proceedings of the Symposium on Cloud Computing (SoCC), 2015.
[42]
T. Hetherington, T. Rogers, L. Hsu, M. O'Connor, and T. Aamodt, "Characterizing and evaluating a key-value store application on heterogeneous cpu-gpu systems," in Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS), 2012.
[43]
S. Li, H. Lim, V. W. Lee, J. H. Ahn, A. Kalia, M. Kaminsky, D. G. Andersen, O. Seongil, S. Lee, and P. Dubey, "Architecting to achieve a billion requests per second throughput on a single key-value store server platform," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2015.
[44]
A. Gutierrez, M. Cieslak, B. Giridhar, R. G. Dreslinski, L. Ceze, and T. Mudge, "Integrated 3d-stacked server designs for increasing physical density of key-value stores," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2014.
[45]
A. B. de Oliveira, S. Fischmeister, A. Diwan, M. Hauswirth, and P. F. Sweeney, "Why you should care about quantile regression," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2013.
[46]
M. Harchol-Balter, Performance Modeling and Design of Computer Systems: Queueing Theory in Action, 2013.
[47]
R. W. Keener, "Theoretical statistics: Topics for a core course," International Statistical Review, 2012.
[48]
T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney, "Producing wrong data without doing anything obviously wrong!" in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009.
[49]
A. Alameldeen and D. Wood, "Variability in architectural simulations of multi-threaded workloads," in High-Performance Computer Architecture (HPCA). Proceedings. The Ninth International Symposium on, 2003.
[50]
D. Tsafrir and D. Feitelson, "Instability in parallel job scheduling simulation: the role of workload flurries," in Parallel and Distributed Processing Symposium (IPDPS). 20th International, 2006.
[51]
D. Tsafrir, K. Ouaknine, and D. Feitelson, "Reducing performance evaluation sensitivity and variability by input shaking," in Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). 15th International Symposium on, 2007.
[52]
A. Georges, D. Buytaert, and L. Eeckhout, "Statistically rigorous java performance evaluation," in Proceedings of the Annual Conference on Object-oriented Programming Systems and Applications (OOPSLA), 2007.
[53]
M. B. Breughe and L. Eeckhout, "Selecting representative benchmark inputs for exploring microprocessor design spaces," ACM Trans. Archit. Code Optim.
[54]
K. Shen, "Request behavior variations," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2010.
[55]
D. Meisner, J. Wu, and T. F. Wenisch, "Bighouse: A simulation infrastructure for data center systems," in Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS), 2012.

Cited By

View all
  • (2024)DeInfer: A GPU resource allocation algorithm with spatial sharing for near-deterministic inferring tasksProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673091(701-711)Online publication date: 12-Aug-2024
  • (2024)InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference With Spatio-Temporal SharingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343006335:10(1735-1748)Online publication date: 1-Oct-2024
  • (2023)Async-Fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS LevelProceedings of the VLDB Endowment10.14778/3579075.357907916:5(1033-1045)Online publication date: 6-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 44, Issue 3
ISCA'16
June 2016
730 pages
ISSN:0163-5964
DOI:10.1145/3007787
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture
    June 2016
    756 pages
    ISBN:9781467389471
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2016
Published in SIGARCH Volume 44, Issue 3

Check for updates

Author Tags

  1. data center
  2. load testing
  3. tail latency

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)88
  • Downloads (Last 6 weeks)11
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)DeInfer: A GPU resource allocation algorithm with spatial sharing for near-deterministic inferring tasksProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673091(701-711)Online publication date: 12-Aug-2024
  • (2024)InSS: An Intelligent Scheduling Orchestrator for Multi-GPU Inference With Spatio-Temporal SharingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343006335:10(1735-1748)Online publication date: 1-Oct-2024
  • (2023)Async-Fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS LevelProceedings of the VLDB Endowment10.14778/3579075.357907916:5(1033-1045)Online publication date: 6-Mar-2023
  • (2023)Fast, Light-weight, and Accurate Performance Evaluation using Representative Datacenter BehaviorsProceedings of the 24th International Middleware Conference10.1145/3590140.3629117(220-233)Online publication date: 27-Nov-2023
  • (2023)μManycore: A Cloud-Native CPU for Tail at ScaleProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589068(1-15)Online publication date: 17-Jun-2023
  • (2023)Turbo: SmartNIC-enabled Dynamic Load Balancing of µs-scale RPCs2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071135(1045-1058)Online publication date: Feb-2023
  • (2023)Adrias: Interference-Aware Memory Orchestration for Disaggregated Cloud Infrastructures2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070939(855-869)Online publication date: Feb-2023
  • (2022)Lukewarm serverless functionsProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527390(757-770)Online publication date: 18-Jun-2022
  • (2022)UFC2: User-Friendly Collaborative CloudIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.313249633:9(2163-2182)Online publication date: 1-Sep-2022
  • (2022)Percentile Optimization Based on Classical Parallel Machine Scheduling Model2022 8th International Conference on Big Data Computing and Communications (BigCom)10.1109/BigCom57025.2022.00020(96-102)Online publication date: Aug-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media