Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Delay Asymptotics and Bounds for Multi-Task Parallel Jobs

Published: 25 January 2019 Publication History
  • Get Citation Alerts
  • Abstract

    We study delay of jobs that consist of multiple parallel tasks, which is a critical performance metric in a wide range of applications such as data file retrieval in coded storage systems and parallel computing. In this problem, each job is completed only when all of its tasks are completed, so the delay of a job is the maximum of the delays of its tasks. Despite the wide attention this problem has received, tight analysis is still largely unknown since analyzing job delay requires characterizing the complicated correlation among task delays, which is hard to do.
    We first consider an asymptotic regime where the number of servers, n, goes to infinity, and the number of tasks in a job, k(n), is allowed to increase with n. We establish the asymptotic independence of any k(n) queues under the condition k(n) = o(n1/4). This greatly generalizes the asymptotic-independence type of results in the literature where asymptotic independence is shown only for a fixed constant number of queues. As a consequence of our independence result, the job delay converges to the maximum of independent task delays.
    We next consider the non-asymptotic regime. Here we prove that independence yields a stochastic upper bound on job delay for any n and any k(n) with k(n)≤n. The key component of our proof is a new technique we develop, called "Poisson oversampling". Our approach converts the job delay problem into a corresponding balls-and-bins problem. However, in contrast with typical balls-and-bins problems where there is a negative correlation among bins, we prove that our variant exhibits positive correlation. A full version of this paper will all proofs appears in [28].

    References

    [1]
    F. Baccelli. Two parallel queues created by arrivals with two demands:the M/G/2 symmetrical case. Technical Re- port RR-0426, INRIA, July 1985.
    [2]
    F. Baccelli, A. M. Makowski, and A. Shwartz. The fork- join queue and related systems with synchronization con- straints: stochastic ordering and computable bounds. Adv. Appl. Probab., 21:629{660, 1989.
    [3]
    M. Bramson, Y. Lu, and B. Prabhakar. Asymptotic in- dependence of queues under randomized load balancing. Queueing Syst., 71(3):247{292, July 2012.
    [4]
    Y. Chen, S. Alspaugh, and R. Katz. Interactive analytical processing in big data systems: A cross-industry study of MapReduce workloads. Proc. VLDB Endow., 5(12):1802{ 1813, Aug. 2012. ISSN 2150--8097.
    [5]
    J. Dean and S. Ghemawat. MapReduce: Simpli ed data processing on large clusters. In Proc. USENIX Conf. Oper- ating Systems Design and Implementation (OSDI), pages 10{10, San Francisco, CA, 2004.
    [6]
    J. D. Esary, F. Proschan, and D. W. Walkup. Association of random variables, with applications. Ann. Math. Statist., 38(5):1466{1474, 10 1967.
    [7]
    L. Flatto and S. Hahn. Two parallel queues created by arrivals with two demands I. SIAM J. Appl. Math., 44(5): 1041{1053, 1984.
    [8]
    K. Gardner, M. Harchol-Balter, and A. Scheller-Wolf. A better model for job redundancy: Decoupling server slow- down and job size. In IEEE Int. Symp. Modeling, Analysis and Simulation of Computer and Telecommunication Sys- tems (MASCOTS), pages 1{10, London, United Kingdom, Sept. 2016.
    [9]
    K. Gardner, M. Harchol-Balter, A. Scheller-Wolf, and B. Van Houdt. A better model for job redundancy: De- coupling server slowdown and job size. IEEE/ACM Trans. Netw., 25(6):3353{3367, Dec. 2017.
    [10]
    K. Gardner, M. Harchol-Balter, A. Scheller-Wolf, M. Veled- nitsky, and S. Zbarsky. Redundancy-d: The power of d choices for redundancy. Oper. Res., 65(4):1078{1094, 2017.
    [11]
    K. Joag-Dev and F. Proschan. Negative association of ran- dom variables with applications. Ann. Statist., 11(1):286{ 295, Mar. 1983.
    [12]
    G. Joshi, Y. Liu, and E. Soljanin. Coding for fast con- tent download. In Proc. Ann. Allerton Conf. Communica- tion, Control and Computing, pages 326{333, Monticello, IL, Oct. 2012.Kumar and R. Shorey. Performance analysis and scheduling of stochastic fork-join jobs in a multicomputer system. IEEE Trans. Parallel Distrib. Syst., 4(10):1147{ 1164, Oct. 1993.
    [13]
    K. Lee, N. B. Shah, L. Huang, and K. Ramchandran. The MDS queue: Analysing the latency performance of erasure codes. IEEE Trans. Inf. Theory, 63(5):2822{2842, May 2017.
    [14]
    B. Li, A. Ramamoorthy, and R. Srikant. Mean- eld- analysis of coding versus replication in cloud storage sys- tems. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM), pages 1{9, San Francisco, CA, Apr. 2016.
    [15]
    S. P. Meyn and R. L. Tweedie. Stability of Markovian processes III: Foster-Lyapunov criteria for continuous-time processes. Adv. Appl. Probab., 25(3):518{548, 1993.
    [16]
    B. Moseley, A. Dasgupta, R. Kumar, and T. Sarl os. On scheduling in Map-Reduce and ow-shops. In Proc. Ann. ACM Symp. Parallelism in Algorithms and Architectures (SPAA), pages 289{298, San Jose, CA, 2011.
    [17]
    R. Nelson and A. N. Tantawi. Approximate analysis of fork/join synchronization in parallel queues. IEEE Trans. Comput., 37(6):739{743, June 1988.
    [18]
    R. Nelson, D. Towsley, and A. N. Tantawi. Performance analysis of parallel processing systems. IEEE Trans. Softw. Eng., 14(4):532{540, Apr. 1988. Rizk, F. Poloczek, and F. Ciucu. Stochastic bounds in fork{join queueing systems under full and partial mapping. Queueing Syst., 83(3):261{291, Aug. 2016.
    [19]
    N. B. Shah, K. Lee, and K. Ramchandran. When do re- dundant requests reduce latency - In Proc. Ann. Allerton Conf. Communication, Control and Computing, pages 731{ 738, Monticello, IL, Oct. 2013.
    [20]
    V. Shah, A. Bouillard, and F. Baccelli. Delay comparison of delivery and coding policies in data clusters. In Proc. Ann. Allerton Conf. Communication, Control and Computing, pages 397{404, Monticello, IL, Oct. 2017.
    [21]
    Y. Sun, C. E. Koksal, and N. B. Shro . Near delay-optimal scheduling of batch jobs in multi-server systems. Technical report, The Ohio State University, 2017.
    [22]
    J. Tan, X. Meng, and L. Zhang. Delay tails in MapReduce scheduling. In Proc. ACM SIGMET- RICS/PERFORMANCE Jt. Int. Conf. Measurement and Modeling of Computer Systems, pages 5{16, London, United Kingdom, 2012.Thomasian. Analysis of fork/join and related queue- ing systems. ACM Comput. Surv., 47(2):17:1{17:71, Aug. 2014.
    [23]
    Vulimiri, O. Michel, P. B. Godfrey, and S. Shenker. More is less: Reducing latency via redundancy. In Proc. ACM Workshop Hot Topics in Networks (HotNets), pages 13{18, Redmond, WA, Oct. 2012.
    [24]
    W.Wang, K. Zhu, L. Ying, J. Tan, and L. Zhang. MapTask scheduling in MapReduce with data locality: Throughput and heavy-trac optimality. IEEE/ACM Trans. Netw., 24: 190--203, Feb. 2016.
    [25]
    W. Wang, M. Harchol-Balter, H. Jiang, A. Scheller-Wolf, and R. Srikant. Delay asymptotics and bounds for multi- task parallel jobs. arXiv:1710.00296 {cs.PF}, 2018.
    [26]
    Y. Xiang, T. Lan, V. Aggarwal, and Y.-F. R. Chen. Joint latency and cost optimization for erasure-coded data center storage. IEEE/ACM Trans. Netw., 24(4):2443{2457, Aug. 2016.
    [27]
    Q. Xie and Y. Lu. Priority algorithm for near-data schedul- ing: Throughput and heavy-trac optimality. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM), pages 963{972, Hong Kong, China, Apr. 2015.
    [28]
    Q. Xie, X. Dong, Y. Lu, and R. Srikant. Power of d choices for large-scale bin packing: A loss model. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems, pages 321{334, Portland, OR, 2015.
    [29]
    L. Ying, R. Srikant, and X. Kang. The power of slightly more than one sample in randomized load balancing. In Proc. IEEE Int. Conf. Computer Communications (INFO- COM), pages 1131{1139, Kowloon, Hong Kong, Apr. 2015.
    [30]
    Y. Zheng, N. B. Shro, and P. Sinha. A new analyti- cal technique for designing provably efficient MapReduce schedulers. In Proc. IEEE Int. Conf. Computer Communi- cations (INFOCOM), pages 1600{1608, Turin, Italy, Apr. 2013.

    Cited By

    View all
    • (2024)Investigation of the Fork–Join System with Markovian Arrival Process Arrivals and Phase-Type Service Time Distribution Using Machine Learning MethodsMathematics10.3390/math1205065912:5(659)Online publication date: 23-Feb-2024
    • (2023)The Delay Time Profile of Multistage Networks with SynchronizationMathematics10.3390/math1114323211:14(3232)Online publication date: 23-Jul-2023
    • (2021)Modeling and analysis of distributed schedulers in data center cluster networksCluster Computing10.1007/s10586-021-03343-y24:4(3351-3366)Online publication date: 1-Dec-2021
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGMETRICS Performance Evaluation Review
    ACM SIGMETRICS Performance Evaluation Review  Volume 46, Issue 3
    December 2018
    174 pages
    ISSN:0163-5999
    DOI:10.1145/3308897
    Issue’s Table of Contents
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 January 2019
    Published in SIGMETRICS Volume 46, Issue 3

    Check for updates

    Author Tags

    1. association of random variables
    2. asymptotic independence
    3. large systems
    4. parallel jobs

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Investigation of the Fork–Join System with Markovian Arrival Process Arrivals and Phase-Type Service Time Distribution Using Machine Learning MethodsMathematics10.3390/math1205065912:5(659)Online publication date: 23-Feb-2024
    • (2023)The Delay Time Profile of Multistage Networks with SynchronizationMathematics10.3390/math1114323211:14(3232)Online publication date: 23-Jul-2023
    • (2021)Modeling and analysis of distributed schedulers in data center cluster networksCluster Computing10.1007/s10586-021-03343-y24:4(3351-3366)Online publication date: 1-Dec-2021
    • (2020)Delay-optimal Policies in Partial Fork-Join Systems with Redundancy and Random SlowdownsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33794684:1(1-49)Online publication date: 5-Jun-2020
    • (2020)Tail Latency in Datacenter NetworksModelling, Analysis, and Simulation of Computer and Telecommunication Systems10.1007/978-3-030-68110-4_17(254-272)Online publication date: 17-Nov-2020
    • (2019)Performance Analysis of Workload Dependent Load Balancing PoliciesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/3341617.33261503:2(1-35)Online publication date: 19-Jun-2019
    • (2019)Performance Analysis of Workload Dependent Load Balancing PoliciesAbstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3309697.3331504(7-8)Online publication date: 20-Jun-2019
    • (2019)TTLoC: Taming Tail Latency for Erasure-Coded Cloud Storage SystemsIEEE Transactions on Network and Service Management10.1109/TNSM.2019.291687716:4(1609-1623)Online publication date: Dec-2019

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media