Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Delay tails in MapReduce scheduling

Published: 11 June 2012 Publication History

Abstract

MapReduce/Hadoop production clusters exhibit heavy-tailed characteristics for job processing times. These phenomena are resultant of the workload features and the adopted scheduling algorithms. Analytically understanding the delays under different schedulers for MapReduce can facilitate the design and deployment of large Hadoop clusters. The map and reduce tasks of a MapReduce job have fundamental difference and tight dependence between them, complicating the analysis. This also leads to an interesting starvation problem with the widely used Fair Scheduler due to its greedy approach to launching reduce tasks. To address this issue, we design and implement Coupling Scheduler, which gradually launches reduce tasks depending on map task progresses. Real experiments demonstrate improvements to job response times by up to an order of magnitude.
Based on extensive measurements and source code investigations, we propose analytical models for the default FIFO and Fair Scheduler as well as our implemented Coupling Scheduler. For a class of heavy-tailed map service time distributions, i.e., regularly varying of index -a, we derive the distribution tail of the job processing delay under the three schedulers, respectively. The default FIFO Scheduler causes the delay to be regularly varying of index -a+1. Interestingly, we discover a criticality phenomenon for Fair Scheduler, the delay under which can change from regularly varying of index -a to -a+1, depending on the maximum number of reduce tasks of a job. Other more subtle behaviors also exist. In contrast, the delay distribution tail under Coupling Scheduler can be one order lower than Fair Scheduler under some conditions, implying a better performance.

References

[1]
Fair Scheduler, http://hadoop.apache.org/mapreduce/docs/r0.21.0/fair_scheduler.html.
[2]
Capacity Scheduler, http://hadoop.apache.org/mapreduce/docs/r0.21.0/capacity_scheduler.html.
[3]
N. H. Bingham, C. M. Goldie, and J. L. Teugels. Regular Variation, volume 27. Cambridge University Press, 1987.
[4]
S. C. Borst, O. J. Boxma, R. Núnez-Queija, and A. P. Zwart. The impact of the service discipline on delay asymptotics. Performance Evaluation, 54(2):175--206, 2003.
[5]
Y. Chen, A. Ganapathi, R. Griffith, and R. H. Katz. The case for evaluating MapReduce performance using workload suites. In Proceedings of the 19th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), July 2011.
[6]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107--113, January 2008.
[7]
A. B. Downey. The structural cause of file size distributions. In Proceedings of ACM SIGMETRICS 2001, pages 328--329, New York, NY, USA, 2001.
[8]
V. Gupta and M. Harchol-Balter. Self-adaptive admission control policies for resource-sharing systems. In Proceedings of ACM SIGMETRICS 2009, pages 311--322, Seattle, WA, USA, 2009.
[9]
Hadoop. http://hadoop.apache.org.
[10]
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, pages 261--276, New York, NY, USA, 2009.
[11]
P. Jelenković and P. Momcilović. Large deviation analysis of subexponential waiting times in a processor-sharing queue. Mathematics of Operations Research, 28(3):587--608, 2003.
[12]
S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan. An analysis of traces from a production mapreduce cluster. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID '10, pages 94--103, Washington, DC, USA, 2010.
[13]
R. M. Loynes. The stability of a queue with non-independent inter-arrival and service times. Proc. Cambridge Philos. Soc., 58:497--520, 1962.
[14]
E. Nummelin. Regeneration in tandem queues. Advances in Applied Probability, 13(1):221--230, 1981.
[15]
R. Núnez-Queija. Processor-Sharing Models for Integrated-Service Networks. PhD thesis, Eindhoven University of Technology, 2000.
[16]
A. G. Pakes. On the tails of waiting-time distributions. Journal of Applied Probability, 12:555--564, 1975.
[17]
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In Proceedings of the 35th ACM SIGMOD international conference on Management of data, pages 165--178, New York, NY, USA, 2009.
[18]
T. Sandholm and K. Lai. Dynamic proportional share scheduling in hadoop. In Proceedings of the 15th international conference on job scheduling strategies for parallel processing, JSSPP'10, pages 110--131, Berlin, Heidelberg, 2010. Springer-Verlag.
[19]
K. Sigman. Regeneration in tandem queues with multiserver stations. Journal of Applied Probability, 25(2):391--403, 1988.
[20]
J. Tan, X. Meng, and L. Zhang. Coupling task progress for mapreduce resource-aware scheduling. Technical report, IBM T. J. Watson Research Center, Hawthorne, New York, July 2011.
[21]
G. Wang, A. R. Butt, P. Pandey, and K. Gupta. Using realistic simulation for performance analysis of mapreduce setups. In Proceedings of the 1st ACM workshop on Large-Scale system and application performance, LSAP '09, pages 19--26, New York, NY, USA, 2009.
[22]
W. Whitt. Embedded renewal processes in the GI/G/s queue. Journal of Applied Probability, 9(3):650--658, 1972.
[23]
J. Wolf, D. Rajan, K. Hildrum, R. Khandekar, V. Kumar, S. Parekh, K.-L. Wu, and A. Balmin. Flex: A slot allocation scheduling optimizer for mapreduce workloads. In Middleware 2010, volume 6452 of Lecture Notes in Computer Science, pages 1--20. Springer Berlin/Heidelberg, 2010.
[24]
R. W. Wolff. An upper bound for multi-channel queues. Journal of Applied Probability, 14(4):884--888, 1977.
[25]
R. W. Wolff. Stochastic Modeling and Theory of Queues. Prentice Hall, 1989.
[26]
S. Zachary. A note on insensitivity in stochastic networks. Journal of Applied Probability, 44(1):238--248, 2007.
[27]
M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Job scheduling for multi-user mapreduce clusters. Technical Report UCB/EECS-2009--55, EECS Department, University of California, Berkeley, April 2009.
[28]
A. P. Zwart. Tail asymptotics for the busy period in the GI/G/1 queue. Mathematics of Operations Research, 26(3):485--493, 2001.

Cited By

View all
  • (2023)Nonlinear approximation of characteristics of a fork–join queueing system with Pareto service as a model of parallel structure of data processingMathematics and Computers in Simulation10.1016/j.matcom.2023.07.029214(409-428)Online publication date: Dec-2023
  • (2021)Fork and Join Queueing Networks with Heavy Tails: Scaling Dimension and Throughput LimitJournal of the ACM10.1145/344821368:3(1-30)Online publication date: 25-May-2021
  • (2021)Non-asymptotic delay bounds for (k, l) fork-join systems and multi-stage fork-join networksIEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications10.1109/INFOCOM.2016.7524362(1-9)Online publication date: 10-Mar-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review
ACM SIGMETRICS Performance Evaluation Review  Volume 40, Issue 1
Performance evaluation review
June 2012
433 pages
ISSN:0163-5999
DOI:10.1145/2318857
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
    June 2012
    450 pages
    ISBN:9781450310970
    DOI:10.1145/2254756
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2012
Published in SIGMETRICS Volume 40, Issue 1

Check for updates

Author Tags

  1. MapReduce
  2. coupling scheduler
  3. fair scheduler
  4. first in first out
  5. hadoop
  6. heavy-tails
  7. processor sharing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Nonlinear approximation of characteristics of a fork–join queueing system with Pareto service as a model of parallel structure of data processingMathematics and Computers in Simulation10.1016/j.matcom.2023.07.029214(409-428)Online publication date: Dec-2023
  • (2021)Fork and Join Queueing Networks with Heavy Tails: Scaling Dimension and Throughput LimitJournal of the ACM10.1145/344821368:3(1-30)Online publication date: 25-May-2021
  • (2021)Non-asymptotic delay bounds for (k, l) fork-join systems and multi-stage fork-join networksIEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications10.1109/INFOCOM.2016.7524362(1-9)Online publication date: 10-Mar-2021
  • (2020)MaRe: Processing Big Data with application containers on Apache SparkGigaScience10.1093/gigascience/giaa0429:5Online publication date: 5-May-2020
  • (2020)MapReduce scheduling algorithms: a reviewThe Journal of Supercomputing10.1007/s11227-018-2719-576:7(4915-4945)Online publication date: 1-Jul-2020
  • (2019)A systematic literature review on MapReduce scheduling methodsIntelligent Decision Technologies10.3233/IDT-19036313:1(1-21)Online publication date: 1-Jan-2019
  • (2019)Heterogeneity Aware Workload Management in Distributed Sustainable DatacentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286592730:2(375-387)Online publication date: 1-Feb-2019
  • (2019)A data locality based scheduler to enhance MapReduce performance in heterogeneous environmentsFuture Generation Computer Systems10.1016/j.future.2018.07.04390(423-434)Online publication date: Jan-2019
  • (2018)POSUM: A Portfolio Scheduler for MapReduce Workloads2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622215(351-357)Online publication date: Dec-2018
  • (2018)PP: Popularity-based Proactive Data Recovery for HDFS RAID systemsFuture Generation Computer Systems10.1016/j.future.2017.03.03286(1146-1153)Online publication date: Sep-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media