research-article

Delay tails in MapReduce scheduling

Authors:

Li ZhangAuthors Info & Claims

ACM SIGMETRICS Performance Evaluation Review, Volume 40, Issue 1

Pages 5 - 16

https://doi.org/10.1145/2318857.2254761

Published: 11 June 2012 Publication History

Abstract

MapReduce/Hadoop production clusters exhibit heavy-tailed characteristics for job processing times. These phenomena are resultant of the workload features and the adopted scheduling algorithms. Analytically understanding the delays under different schedulers for MapReduce can facilitate the design and deployment of large Hadoop clusters. The map and reduce tasks of a MapReduce job have fundamental difference and tight dependence between them, complicating the analysis. This also leads to an interesting starvation problem with the widely used Fair Scheduler due to its greedy approach to launching reduce tasks. To address this issue, we design and implement Coupling Scheduler, which gradually launches reduce tasks depending on map task progresses. Real experiments demonstrate improvements to job response times by up to an order of magnitude.

Based on extensive measurements and source code investigations, we propose analytical models for the default FIFO and Fair Scheduler as well as our implemented Coupling Scheduler. For a class of heavy-tailed map service time distributions, i.e., regularly varying of index -a, we derive the distribution tail of the job processing delay under the three schedulers, respectively. The default FIFO Scheduler causes the delay to be regularly varying of index -a+1. Interestingly, we discover a criticality phenomenon for Fair Scheduler, the delay under which can change from regularly varying of index -a to -a+1, depending on the maximum number of reduce tasks of a job. Other more subtle behaviors also exist. In contrast, the delay distribution tail under Coupling Scheduler can be one order lower than Fair Scheduler under some conditions, implying a better performance.

References

[1]

Fair Scheduler, http://hadoop.apache.org/mapreduce/docs/r0.21.0/fair_scheduler.html.

[2]

Capacity Scheduler, http://hadoop.apache.org/mapreduce/docs/r0.21.0/capacity_scheduler.html.

[3]

N. H. Bingham, C. M. Goldie, and J. L. Teugels. Regular Variation, volume 27. Cambridge University Press, 1987.

[4]

S. C. Borst, O. J. Boxma, R. Núnez-Queija, and A. P. Zwart. The impact of the service discipline on delay asymptotics. Performance Evaluation, 54(2):175--206, 2003.

Digital Library

[5]

Y. Chen, A. Ganapathi, R. Griffith, and R. H. Katz. The case for evaluating MapReduce performance using workload suites. In Proceedings of the 19th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), July 2011.

Digital Library

[6]

J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107--113, January 2008.

Digital Library

[7]

A. B. Downey. The structural cause of file size distributions. In Proceedings of ACM SIGMETRICS 2001, pages 328--329, New York, NY, USA, 2001.

Digital Library

[8]

V. Gupta and M. Harchol-Balter. Self-adaptive admission control policies for resource-sharing systems. In Proceedings of ACM SIGMETRICS 2009, pages 311--322, Seattle, WA, USA, 2009.

Digital Library

[9]

Hadoop. http://hadoop.apache.org.

[10]

M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, pages 261--276, New York, NY, USA, 2009.

Digital Library

[11]

P. Jelenković and P. Momcilović. Large deviation analysis of subexponential waiting times in a processor-sharing queue. Mathematics of Operations Research, 28(3):587--608, 2003.

Digital Library

[12]

S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan. An analysis of traces from a production mapreduce cluster. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID '10, pages 94--103, Washington, DC, USA, 2010.

Digital Library

[13]

R. M. Loynes. The stability of a queue with non-independent inter-arrival and service times. Proc. Cambridge Philos. Soc., 58:497--520, 1962.

[14]

E. Nummelin. Regeneration in tandem queues. Advances in Applied Probability, 13(1):221--230, 1981.

[15]

R. Núnez-Queija. Processor-Sharing Models for Integrated-Service Networks. PhD thesis, Eindhoven University of Technology, 2000.

[16]

A. G. Pakes. On the tails of waiting-time distributions. Journal of Applied Probability, 12:555--564, 1975.

[17]

A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In Proceedings of the 35th ACM SIGMOD international conference on Management of data, pages 165--178, New York, NY, USA, 2009.

Digital Library

[18]

T. Sandholm and K. Lai. Dynamic proportional share scheduling in hadoop. In Proceedings of the 15th international conference on job scheduling strategies for parallel processing, JSSPP'10, pages 110--131, Berlin, Heidelberg, 2010. Springer-Verlag.

Digital Library

[19]

K. Sigman. Regeneration in tandem queues with multiserver stations. Journal of Applied Probability, 25(2):391--403, 1988.

[20]

J. Tan, X. Meng, and L. Zhang. Coupling task progress for mapreduce resource-aware scheduling. Technical report, IBM T. J. Watson Research Center, Hawthorne, New York, July 2011.

[21]

G. Wang, A. R. Butt, P. Pandey, and K. Gupta. Using realistic simulation for performance analysis of mapreduce setups. In Proceedings of the 1st ACM workshop on Large-Scale system and application performance, LSAP '09, pages 19--26, New York, NY, USA, 2009.

Digital Library

[22]

W. Whitt. Embedded renewal processes in the GI/G/s queue. Journal of Applied Probability, 9(3):650--658, 1972.

[23]

J. Wolf, D. Rajan, K. Hildrum, R. Khandekar, V. Kumar, S. Parekh, K.-L. Wu, and A. Balmin. Flex: A slot allocation scheduling optimizer for mapreduce workloads. In Middleware 2010, volume 6452 of Lecture Notes in Computer Science, pages 1--20. Springer Berlin/Heidelberg, 2010.

Digital Library

[24]

R. W. Wolff. An upper bound for multi-channel queues. Journal of Applied Probability, 14(4):884--888, 1977.

[25]

R. W. Wolff. Stochastic Modeling and Theory of Queues. Prentice Hall, 1989.

[26]

S. Zachary. A note on insensitivity in stochastic networks. Journal of Applied Probability, 44(1):238--248, 2007.

[27]

M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Job scheduling for multi-user mapreduce clusters. Technical Report UCB/EECS-2009--55, EECS Department, University of California, Berkeley, April 2009.

[28]

A. P. Zwart. Tail asymptotics for the busy period in the GI/G/1 queue. Mathematics of Operations Research, 26(3):485--493, 2001.

Digital Library

Cited By

Gorbunova ALebedev A(2023)Nonlinear approximation of characteristics of a fork–join queueing system with Pareto service as a model of parallel structure of data processingMathematics and Computers in Simulation10.1016/j.matcom.2023.07.029214(409-428)Online publication date: Dec-2023
https://doi.org/10.1016/j.matcom.2023.07.029
Zeng YTan JXia C(2021)Fork and Join Queueing Networks with Heavy Tails: Scaling Dimension and Throughput LimitJournal of the ACM10.1145/344821368:3(1-30)Online publication date: 25-May-2021
https://dl.acm.org/doi/10.1145/3448213
Fidler MJiang Y(2021)Non-asymptotic delay bounds for (k, l) fork-join systems and multi-stage fork-join networksIEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications10.1109/INFOCOM.2016.7524362(1-9)Online publication date: 10-Mar-2021
https://dl.acm.org/doi/10.1109/INFOCOM.2016.7524362
Show More Cited By

Recommendations

Delay tails in MapReduce scheduling
SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems

MapReduce/Hadoop production clusters exhibit heavy-tailed characteristics for job processing times. These phenomena are resultant of the workload features and the adopted scheduling algorithms. Analytically understanding the delays under different ...
TaskTracker aware scheduler with resource availability control for Hadoop MapReduce

Schedulers are playing a vital role in task assignment for Hadoop MapReduce. In some scenario, the default schedulers of Hadoop spawn tasks in TaskTracker without checking the external dependency and may fail. As a result, Hadoop should rerun the tasks in ...
Coupling scheduler for MapReduce/Hadoop
HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing

Current schedulers of MapReduce/Hadoop are quite successful in providing good performance. However improving spaces still exist: map and reduce tasks are not jointly optimized for scheduling, albeit there is a strong dependence between them. This can ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMETRICS Performance Evaluation Review

ACM SIGMETRICS Performance Evaluation Review Volume 40, Issue 1

Performance evaluation review

June 2012

433 pages

ISSN:0163-5999

DOI:10.1145/2318857

Issue’s Table of Contents

SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
June 2012
450 pages
ISBN:9781450310970
DOI:10.1145/2254756
General Chair:
Peter Harrison
Imperial College London, United Kingdom
,
Program Chairs:
Martin Arlitt
HP Labs, USA and University of Calgary, Canada
,
Giuliano Casale
Imperial College London, United Kingdom

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2012

Published in SIGMETRICS Volume 40, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

74
Total Citations
View Citations
1,252
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gorbunova ALebedev A(2023)Nonlinear approximation of characteristics of a fork–join queueing system with Pareto service as a model of parallel structure of data processingMathematics and Computers in Simulation10.1016/j.matcom.2023.07.029214(409-428)Online publication date: Dec-2023
https://doi.org/10.1016/j.matcom.2023.07.029
Zeng YTan JXia C(2021)Fork and Join Queueing Networks with Heavy Tails: Scaling Dimension and Throughput LimitJournal of the ACM10.1145/344821368:3(1-30)Online publication date: 25-May-2021
https://dl.acm.org/doi/10.1145/3448213
Fidler MJiang Y(2021)Non-asymptotic delay bounds for (k, l) fork-join systems and multi-stage fork-join networksIEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications10.1109/INFOCOM.2016.7524362(1-9)Online publication date: 10-Mar-2021
https://dl.acm.org/doi/10.1109/INFOCOM.2016.7524362
Capuccini MDahlö MToor SSpjuth O(2020)MaRe: Processing Big Data with application containers on Apache SparkGigaScience10.1093/gigascience/giaa0429:5Online publication date: 5-May-2020
https://doi.org/10.1093/gigascience/giaa042
Hashem IAnuar NMarjani MAhmed EChiroma HFirdaus AAbdullah MAlotaibi FAli WYaqoob IGani A(2020)MapReduce scheduling algorithms: a reviewThe Journal of Supercomputing10.1007/s11227-018-2719-576:7(4915-4945)Online publication date: 1-Jul-2020
https://dl.acm.org/doi/10.1007/s11227-018-2719-5
Rahimi MHasheminejad S(2019)A systematic literature review on MapReduce scheduling methodsIntelligent Decision Technologies10.3233/IDT-19036313:1(1-21)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.3233/IDT-190363
Cheng DZhou XDing ZWang YJi M(2019)Heterogeneity Aware Workload Management in Distributed Sustainable DatacentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286592730:2(375-387)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1109/TPDS.2018.2865927
Naik NNegi AB.R. TAnitha R(2019)A data locality based scheduler to enhance MapReduce performance in heterogeneous environmentsFuture Generation Computer Systems10.1016/j.future.2018.07.04390(423-434)Online publication date: Jan-2019
https://doi.org/10.1016/j.future.2018.07.043
Voinea MUta AIosup A(2018)POSUM: A Portfolio Scheduler for MapReduce Workloads2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622215(351-357)Online publication date: Dec-2018
https://doi.org/10.1109/BigData.2018.8622215
Wu SZhu WMao BLi K(2018)PP: Popularity-based Proactive Data Recovery for HDFS RAID systemsFuture Generation Computer Systems10.1016/j.future.2017.03.03286(1146-1153)Online publication date: Sep-2018
https://doi.org/10.1016/j.future.2017.03.032
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents