Article

Retro: Targeted resource management in multi-tenant distributed systems

Authors:

Rodrigo Fonseca,

Madanlal MusuvathiAuthors Info & Claims

NSDI'15: Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation

Pages 589 - 603

Published: 04 May 2015 Publication History

Abstract

In distributed systems shared by multiple tenants, effective resource management is an important pre-requisite to providing quality of service guarantees. Many systems deployed today lack performance isolation and experience contention, slowdown, and even outages caused by aggressive workloads or by improperly throttled maintenance tasks such as data replication. In this work we present Retro, a resource management framework for shared distributed systems. Retro monitors per-tenant resource usage both within and across distributed systems, and exposes this information to centralized resource management policies through a high-level API. A policy can shape the resources consumed by a tenant using Retro's control points, which enforce sharing and rate-limiting decisions. We demonstrate Retro through three policies providing bottleneck resource fairness, dominant resource fairness, and latency guarantees to high-priority tenants, and evaluate the system across five distributed systems: HBase, Yarn, MapReduce, HDFS, and Zookeeper. Our evaluation shows that Retro has low overhead, and achieves the policies' goals, accurately detecting contended resources, throttling tenants responsible for slowdown and overload, and fairly distributing the remaining cluster capacity.

References

[1]

Intel Performance Counter Monitor - A better way to measure CPU utilization. http://intel.ly/1C23e67.

[2]

F. Akgul. ZeroMQ. Packt Publishing, 2013.

Digital Library

[3]

Amazon web services. http://aws.amazon.com/.

[4]

G. Banga, P. Druschel, and J. C. Mogul. Resource containers: a new facility for resource management in server systems. In OSDI '99, pages 45-58, Berkeley, CA, USA, 1999. USENIX Association.

Digital Library

[5]

P. Barham, A. Donnelly, R. Isaacs, and R. Mortier. Using Magpie for Request Extraction and Workload Modeling. In Proc. USENIX OSDI, 2004.

Digital Library

[6]

B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, et al. Windows Azure Storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 143-157. ACM, 2011.

Digital Library

[7]

A. Chanda, A. L. Cox, and W. Zwaenepoel. Whodunit: Transactional Profiling for Multi-Tier Applications. In EuroSys'07, Lisbon, Portugal, March 2007.

Digital Library

[8]

A. Chanda, K. Elmeleegy, A. L. Cox, and W. Zwaenepoel. Causeway: System support for controlling and analyzing the execution of multi-tier applications. In Proc. Middleware 2005, pages 42-59, November 2005.

Digital Library

[9]

M. Chen, E. Kiciman, E. Fratkin, E. Brewer, and A. Fox. Pinpoint: Problem Determination in Large, Dynamic, Internet Services. In Proc. International Conference on Dependable Systems and Networks, 2002.

Digital Library

[10]

T. Do, H. S. Gunawi, T. Do, T. Harter, Y. Liu, H. S. Gunawi, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. The case for limping-hardware tolerant clouds. In 5th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), 2013.

[11]

R. Fonseca, G. Porter, R. H. Katz, S. Shenker, and I. Stoica. X-trace: A pervasive network tracing framework. In Proceedings of the 4th USENIX Conference on Networked Systems Design & Implementation, NSDI'07, Berkeley, CA, USA, 2007. USENIX Association.

Digital Library

[12]

A. Ghodsi, V. Sekar, M. Zaharia, and I. Stoica. Multiresource fair queueing for packet processing. In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM '12, pages 1-12, New York, NY, USA, 2012. ACM.

Digital Library

[13]

A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant resource fairness: fair allocation of multiple resource types. In USENIX NSDI, 2011.

Digital Library

[14]

Google Protocol Buffers. http://code.google. com/p/protobuf/.

[15]

A. Gulati, A. Merchant, and P. J. Varman. mClock: Handling Throughput Variability for Hypervisor IO Scheduling. In R. H. Arpaci-Dusseau and B. Chen, editors, Proceedings of OSDI, pages 437-450. USENIX Association, 2010.

Digital Library

[16]

Z. Guo, S. McDirmid, M. Yang, L. Zhuang, P. Zhang, Y. Luo, T. Bergan, M. Musuvathi, Z. Zhang, and L. Zhou. Failure recovery: When the cure is worse than the disease. In Presented as part of the 14th Workshop on Hot Topics in Operating Systems, Berkeley, CA, 2013. USENIX.

Digital Library

[17]

HBase. http://hbase.apache.org.

[18]

HDFS-4183. http://bit.ly/1l4uWbu.

[19]

HDFS API. http://bit.ly/1cxFTD9.

[20]

B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI'11, Berkeley, CA, USA, 2011. USENIX Association.

Digital Library

[21]

P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX conference on USENIX annual technical conference, volume 8, 2010.

Digital Library

[22]

T. Johnson. Approximate analysis of reader/writer queues. IEEE Trans. Softw. Eng., 21(3):209-218, Mar. 1995.

Digital Library

[23]

H. Kang. Computational Color Technology. Press Monographs. Society of Photo Optical, 2006.

Digital Library

[24]

S.-I. Kang and H.-K. Lee. Analysis and solution of nonpreemptive policies for scheduling readers and writers. SIGOPS Oper. Syst. Rev., 32(3):30-50, July 1998.

Digital Library

[25]

G. Kiczales, E. Hilsdale, J. Hugunin, M. Kersten, J. Palm, and W. G. Griswold. An Overview of AspectJ. In Proceedings of the 15th European Conference on Object-Oriented Programming, ECOOP '01, pages 327-353, London, UK, UK, 2001. Springer-Verlag.

[26]

J. Mace, P. Bodik, R. Fonseca, and M. Musuvathi. Towards general-purpose resource management in shared cloud services. In 10th Workshop on Hot Topics in System Dependability (HotDep 14), Broomfield, CO, Oct. 2014. USENIX Association.

Digital Library

[27]

V. R. Narasayya, S. Das, M. Syamala, B. Chandramouli, and S. Chaudhuri. Sqlvm: Performance isolation in multi-tenant relational database-as-a-service. In CIDR'13. www.cidrdb.org, 2013.

Digital Library

[28]

J. Newsome and D. Song. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In Proceedings of the 12th Annual Network and Distributed System Security Symposium (NDSS '05), Feb. 2005.

[29]

K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. Sparrow: Distributed, low latency scheduling. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 69-84, New York, NY, USA, 2013. ACM.

Digital Library

[30]

J. Padhye, V. Firoiu, D. Towsley, and J. Kurose. Modeling TCP throughput: A simple model and its empirical validation. In ACM SIGCOMM Computer Communication Review, volume 28(4), pages 303-314. ACM, 1998.

Digital Library

[31]

A. K. Parekh and R. G. Gallagher. A generalized processor sharing approach to flow control in integrated services networks: the multiple node case. IEEE/ACM Transactions on Networking (TON), 2(2):137-150, 1994.

Digital Library

[32]

L. C. Puryear and V. G. Kulkarni. Comparison of stability and queueing times for reader-writer queues. Perform. Eval., 30(4):195-215, 1997.

Digital Library

[33]

L. Ravindranath, J. Padhye, R. Mahajan, and H. Balakrishnan. Timecard: Controlling user-perceived delays in server-based mobile applications. In SOSP '13, pages 85-100, New York, NY, USA, 2013. ACM.

Digital Library

[34]

P. Reynolds, C. Killian, J. L. Wiener, J. C. Mogul, M. A. Shah, and A. Vahdat. Pip: detecting the unexpected in distributed systems. In NSDI'06, Berkeley, CA, USA, 2006. USENIX Association.

Digital Library

[35]

M. Shreedhar and G. Varghese. Efficient fair queuing using deficit round-robin. Networking, IEEE/ACM Transactions on, 4(3):375-385, 1996.

Digital Library

[36]

K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pages 1-10. IEEE, 2010.

Digital Library

[37]

B. H. Sigelman, L. A. Barroso, M. Burrows, P. Stephenson, M. Plakal, D. Beaver, S. Jaspan, and C. Shanbhag. Dapper, a large-scale distributed systems tracing infrastructure. Technical report, Google, Inc., 2010.

[38]

D. Stiliadis and A. Varma. Latency-rate servers: a general model for analysis of traffic scheduling algorithms. IEEE/ACM Transactions on Networking (ToN), 6(5):611- 624, 1998.

Digital Library

[39]

E. Thereska, H. Ballani, G. O'Shea, T. Karagiannis, A. Rowstron, T. Talpey, R. Black, and T. Zhu. IOFlow: A Software-defined Storage Architecture. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 182-196. ACM, 2013.

Digital Library

[40]

E. Thereska, B. Salmon, J. Strunk, M. Wachs, M. Abd-El-Malek, J. Lopez, and G. R. Ganger. Stardust: Tracking activity in a distributed storage system. SIGMETRICS Perform. Eval. Rev., 34(1):3-14, June 2006.

Digital Library

[41]

A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy. Hive - a petabyte scale data warehouse using Hadoop. In ICDE'10, pages 996-1005, march 2010.

[42]

V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, pages 5:1- 5:16, New York, NY, USA, 2013. ACM.

Digital Library

[43]

A. Wang. Personal communication.

[44]

A. Wang, S. Venkataraman, S. Alspaugh, R. Katz, and I. Stoica. Cake: Enabling High-level SLOs on Shared Storage Systems. In Proc. SoCC. ACM, 2012.

Digital Library

Cited By

Yu LSonchack JLiu VKuipers FOrda A(2022)CebinaeProceedings of the ACM SIGCOMM 2022 Conference10.1145/3544216.3544240(219-232)Online publication date: 22-Aug-2022
https://dl.acm.org/doi/10.1145/3544216.3544240
Isstaif A(2020)Self-managed services using MirageOS unikernelsProceedings of the 21st International Middleware Conference Doctoral Symposium10.1145/3429351.3431748(35-37)Online publication date: 7-Dec-2020
https://dl.acm.org/doi/10.1145/3429351.3431748
Demoulin HPedisich IVasilakis NLiu VLoo BPhan LDan TDahlia M(2019)Detecting asymmetric application-layer denial-of-service attacks in-flight with finelameProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358866(693-707)Online publication date: 10-Jul-2019
https://dl.acm.org/doi/10.5555/3358807.3358866
Show More Cited By

Retro: Targeted resource management in multi-tenant distributed systems

Recommendations

Retro-λ: An Event-sourced Platform for Serverless Applications with Retroactive Computing Support
DEBS '18: Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems

State changes over time are inherent characteristics of stateful applications. So far, there are almost no attempts to make the past application history programmatically accessible or even modifiable. This is primarily due to the complexity of temporal ...
Retro: a methodology for retrospection everywhere
Building Your Next Big Thing with Google Cloud Platform: A Guide for Developers and Enterprise Architects

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NSDI'15: Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation

May 2015

620 pages

ISBN:9781931971218

Program Chairs:
Paul Barham
Google
,
Arvind Krishnamurthy
University of Washington

Sponsors

VMware
NSF: National Science Foundation
Google Inc.
Microsoft Reasearch: Microsoft Reasearch
CISCO

Publisher

USENIX Association

United States

Publication History

Published: 04 May 2015

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yu LSonchack JLiu VKuipers FOrda A(2022)CebinaeProceedings of the ACM SIGCOMM 2022 Conference10.1145/3544216.3544240(219-232)Online publication date: 22-Aug-2022
https://dl.acm.org/doi/10.1145/3544216.3544240
Isstaif A(2020)Self-managed services using MirageOS unikernelsProceedings of the 21st International Middleware Conference Doctoral Symposium10.1145/3429351.3431748(35-37)Online publication date: 7-Dec-2020
https://dl.acm.org/doi/10.1145/3429351.3431748
Demoulin HPedisich IVasilakis NLiu VLoo BPhan LDan TDahlia M(2019)Detecting asymmetric application-layer denial-of-service attacks in-flight with finelameProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358866(693-707)Online publication date: 10-Jul-2019
https://dl.acm.org/doi/10.5555/3358807.3358866
Mitra SMondal SSheoran NDhake NNehra RSimha R(2019)DeepPlaceProceedings of the 10th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3343737.3343741(61-68)Online publication date: 19-Aug-2019
https://dl.acm.org/doi/10.1145/3343737.3343741
Shan HChen YLiu HZhang YXiao XHe XLi MDing W(2019)?-Diagnosis: Unsupervised and Real-time Diagnosis of Small- window Long-tail Latency in Large-scale Microservice PlatformsThe World Wide Web Conference10.1145/3308558.3313653(3215-3222)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313653
Misra PBorge MGoiri ÍLebeck AZwaenepoel WBianchini R(2019)Managing Tail Latency in Datacenter-Scale File Systems Under Production ConstraintsProceedings of the Fourteenth EuroSys Conference 201910.1145/3302424.3303973(1-15)Online publication date: 25-Mar-2019
https://dl.acm.org/doi/10.1145/3302424.3303973
Khalid JRozner EFelter WXu CRajamani KFerreira AAkella ASeshan SBanerjee S(2018)IronProceedings of the 15th USENIX Conference on Networked Systems Design and Implementation10.5555/3307441.3307468(313-328)Online publication date: 9-Apr-2018
https://dl.acm.org/doi/10.5555/3307441.3307468
Tootoonchian APanda ALan CWalls MArgyraki KRatnasamy SShenker SSeshan SBanerjee S(2018)ResQProceedings of the 15th USENIX Conference on Networked Systems Design and Implementation10.5555/3307441.3307466(283-297)Online publication date: 9-Apr-2018
https://dl.acm.org/doi/10.5555/3307441.3307466
Yang SLiu JArpaci-Dusseau AArpaci-Dusseau RArpaci-Dusseau AVoelker G(2018)Principled schedulability analysis for distributed storage systems using thread architecture modelsProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291181(161-176)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.5555/3291168.3291181
Xu CRajamani KFelter W(2018)NBWGuardProceedings of the 19th International Middleware Conference Industry10.1145/3284028.3284033(32-38)Online publication date: 10-Dec-2018
https://dl.acm.org/doi/10.1145/3284028.3284033
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents