Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Achievable Stability in Redundancy Systems

Published: 30 November 2020 Publication History

Abstract

We consider a system with N~parallel servers where incoming jobs are immediately replicated to, say, d~servers. Each of the N servers has its own queue and follows a FCFS discipline. As soon as the first job replica is completed, the remaining replicas are abandoned. We investigate the achievable stability region for a quite general workload model with different job types and heterogeneous servers, reflecting job-server affinity relations which may arise from data locality issues and soft compatibility constraints. Under the assumption that job types are known beforehand we show for New-Better-than-Used (NBU) distributed speed variations that no replication $(d=1)$ gives a strictly larger stability region than replication $(d>1)$. Strikingly, this does not depend on the underlying distribution of the intrinsic job sizes, but observing the job types is essential for this statement to hold. In case of non-observable job types we show that for New-Worse-than-Used (NWU) distributed speed variations full replication ($d=N$) gives a larger stability region than no replication $(d=1)$.

References

[1]
M. Aktas, G. Joshi, S. Kadhe, F. Kazemi, and E. Soljanin. 2020. Service rate region: A new aspect of coded distributed system design. ArXiv 2009.01598 (2020), 1--43.
[2]
S.E. Anderson, A. Johnston, G. Joshi, G.L. Matthews, C. Mayer, and E. Soljanin. 2018. Service rate region of content access from erasure coded storage. Proceedings of the 2018 Information Theory Workshop (2018), 600--605.
[3]
E. Anton, U. Ayesta, M. Jonckheere, and I.M. Verloop. 2019. On the stability of redudancy models. ArXiv 1903.04414 (2019).
[4]
E. Anton, U. Ayesta, M. Jonckheere, and I.M. Verloop. 2020. Improving the performance of heterogeneous data centers through redundancy. ArXiv 2003.01394 (2020).
[5]
K. Bimpikis and M.G. Markakis. 2018. Learning and hierarchies in service systems. Management Science, Vol. 65, 3 (2018), 1--18.
[6]
K. Gardner, M. Harchol-Balter, A. Scheller-Wolf, and B. Van Houdt. 2017a. A better model for job redundancy: Decoupling server slowdown and job size. IEEE ACM Transactions on Networking, Vol. 25, 6 (2017), 3353--3367.
[7]
K. Gardner, M. Harchol-Balter, A. Scheller-Wolf, M. Velednitsky, and S. Zbarsky. 2017b. Redundancy-d: The power of d choices for redundancy. Operations Research, Vol. 65, 4 (2017), 1078--1094.
[8]
J.M. Harrison and M.J. López. 1999. Heavy traffic resource pooling in parallel-server systems. Queueing Systems, Vol. 33, 4 (1999), 339--368.
[9]
T. Hellemans, T. Bodas, and B. Van Houdt. 2019. Performance analysis of workload dependent load balancing policies. Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 3, 2 (2019), 1--35.
[10]
T. Hellemans and B. Van Houdt. 2019. Performance of Redundancy(d) with identical/independent replicas. ACM Transactions on Modeling and Performance Evaluation of Computing Systems, Vol. 4, 2 (2019), 1--28.
[11]
G. Joshi. 2016. Efficient Redundancy Techniques to Reduce Delay in Cloud Systems. Ph.D. Dissertation. Massachusetts Institute of Technology.
[12]
G. Joshi. 2018. Synergy via Redundancy: Boosting service capacity with adaptive replication. ACM SIGMETRICS Performance Evaluation Review, Vol. 45, 3 (2018), 21--28.
[13]
G. Joshi, Y. Liu, and E. Soljanin. 2014. On the delay-storage trade-off in content download from coded distributed storage systems. IEEE Journal on Selected Areas in Communications, Vol. 32, 5 (2014), 989--997.
[14]
Y. Kim, R. Righter, and R. Wolff. 2009. Job replication on multiserver systems. Advances in Applied Probability, Vol. 41, 2 (2009), 546--575.
[15]
G. Koole and R. Righter. 2008. Resource allocation in grid computing. Journal of Scheduling, Vol. 11 (2008), 163--173.
[16]
G. Mendelson. 2020. A lower bound on the stability region for redundancy-d with FIFO service discipline. ArXiv 2004.14793 (2020).
[17]
F. Poloczek and F. Ciucu. 2016. Contrasting effects of replication in parallel systems: From overload to underload and back. ACM SIGMETRICS Performance Evaluation Review, Vol. 44, 1 (2016), 375--376.
[18]
Y. Raaijmakers, S.C. Borst, and O.J. Boxma. 2018. Delta probing policies for redundancy. Performance Evaluation, Vol. 127--128 (2018), 21--35.
[19]
Y. Raaijmakers, S.C. Borst, and O.J. Boxma. 2019. Redundancy scheduling with scaled Bernoulli service requirements. Queueing Systems, Vol. 93, 1--2 (2019), 67--82.
[20]
Y. Raaijmakers, S.C. Borst, and O.J. Boxma. 2020. Stability of redundancy systems with processor sharing. VALUETOOLS '20: Proceedings of the 13th EAI International Conference on Performance Evaluation Methodologies and Tools (2020), 120--127.
[21]
N.B. Shah, K. Lee, and K. Ramchandran. 2016. When do redundant requests reduce latency? IEEE Transactions on Communications, Vol. 64, 2 (2016), 715--722.
[22]
A.L. Stolyar. 2005. Optimal routing in output-queued flexible server systems. Probability in the Engineering and Informational Sciences, Vol. 19, 2 (2005), 141--189.
[23]
D. Stoyan. 1983. Comparison Methods for Queues and Other Stochastic Models. Chichester, Wiley. (edited with revisions by D.J. Daley).
[24]
Y. Sun, C.E. Koksal, and N.B. Shroff. 2017. On delay-optimal scheduling in queueing systems with replications computing. ArXiv 1603.07322v8 (2017).
[25]
D. Wang, G. Joshi, and G.W. Wornell. 2019. Efficient straggler replication in large-scale parallel computing. ACM Transactions on Modeling and Performance Evaluation of Computing Systems, Vol. 4, 2 (2019), 1--23.

Cited By

View all
  • (2024)Approximations to Study the Impact of the Service Discipline in Systems with RedundancyProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390408:1(1-33)Online publication date: 21-Feb-2024
  • (2024)Editorial introduction: second part of the special issue on product forms, stochastic matching, and redundancyQueueing Systems10.1007/s11134-024-09922-1107:3-4(199-203)Online publication date: 9-Aug-2024
  • (2024)Editorial introduction: special issue on product forms, stochastic matching, and redundancyQueueing Systems: Theory and Applications10.1007/s11134-024-09908-z106:3-4(193-198)Online publication date: 1-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 4, Issue 3
POMACS
December 2020
345 pages
EISSN:2476-1249
DOI:10.1145/3440131
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2020
Published in POMACS Volume 4, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. parallel-server system
  2. redundancy
  3. stability

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)12
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Approximations to Study the Impact of the Service Discipline in Systems with RedundancyProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390408:1(1-33)Online publication date: 21-Feb-2024
  • (2024)Editorial introduction: second part of the special issue on product forms, stochastic matching, and redundancyQueueing Systems10.1007/s11134-024-09922-1107:3-4(199-203)Online publication date: 9-Aug-2024
  • (2024)Editorial introduction: special issue on product forms, stochastic matching, and redundancyQueueing Systems: Theory and Applications10.1007/s11134-024-09908-z106:3-4(193-198)Online publication date: 1-Apr-2024
  • (2024)Efficient scheduling in redundancy systems with general service timesQueueing Systems: Theory and Applications10.1007/s11134-024-09904-3106:3-4(333-372)Online publication date: 1-Apr-2024
  • (2022)Achievable Stability in Redundancy SystemsACM SIGMETRICS Performance Evaluation Review10.1145/3543516.345626749:1(27-28)Online publication date: 7-Jun-2022
  • (2022)Correlation in redundancy systemsQueueing Systems: Theory and Applications10.1007/s11134-022-09829-9100:3-4(197-199)Online publication date: 1-Apr-2022
  • (2022)Replication vs speculation for load balancingQueueing Systems: Theory and Applications10.1007/s11134-022-09809-z100:3-4(389-391)Online publication date: 1-Apr-2022
  • (2021)Achievable Stability in Redundancy SystemsAbstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems10.1145/3410220.3456267(27-28)Online publication date: 31-May-2021
  • (2021)Stability and Optimization of Speculative Queueing NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2021.312877830:2(911-922)Online publication date: 24-Nov-2021
  • (2021)Service Rate Region: A New Aspect of Coded Distributed System DesignIEEE Transactions on Information Theory10.1109/TIT.2021.311769567:12(7940-7963)Online publication date: Dec-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media