Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/CCGRID.2017.70acmconferencesArticle/Chapter ViewAbstractPublication PagesccgridConference Proceedingsconference-collections
tutorial

Techniques for Handling Error in User-estimated Execution Times During Resource Management on Systems Processing MapReduce Jobs

Published: 14 May 2017 Publication History

Abstract

In our previous work, we described a resource allocation and scheduling technique for processing an open stream of MapReduce jobs with SLAs (characterized by an earliest start time, an execution time, and a deadline) called the Hadoop Constraint Programming based Resource Management technique (HCP-RM). Since the user-estimated job execution times are used to perform resource allocation and scheduling, error/inaccuracies in the execution times can hinder the ability of HCP-RM from making effective scheduling decisions, leading to a degradation in system performance. This paper focuses on improving the robustness of HCP-RM by introducing a mechanism to handle errors/inaccuracies in user estimates of job execution times that are submitted as part of the job's SLA. A Prescheduling Error Handling technique (PSEH) is devised to adjust the user-estimated execution times of the jobs to make them more accurate before they are used by the resource management algorithm. Results of experiments conducted on a Hadoop cluster deployed on Amazon EC2 demonstrate the effectiveness of the PSEH technique in improving system performance.

References

[1]
N. Lim, S. Majumdar, P. Ashwood-Smith, "MRCP-RM: a Technique for Resource Allocation and Scheduling of MapReduce Jobs with Deadlines", IEEE Trans. on Parallel and Distributed Systems, in press. Preprint available:
[2]
J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters", Int'l Symp. on Operating System Design and Implementation, 6-8 Dec 2004, pp. 137--150.
[3]
The Apache Software Foundation, "Hadoop". {Online}. Available: http://hadoop.apache.org
[4]
C. Bailey Lee, Y. Scwartzman, J. Hardy, and A. Snavely, "Are User Runtime Estimates Inherently Inaccurate?", Job Scheduling Strategies for Parallel Processing, D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn, Eds. Berlin, Germany: Springer, 2005, pp. 253--263.
[5]
D. Tsafrir, Y. Etsion, and D. G. Feitelson, "Backfilling Using System-Generated Predictions Rather than User Runtime Estimates", IEEE Trans. on Parallel and Distributed Systems, vol. 18, no. 6, 2007, pp. 789--803.
[6]
W. Tang, N. Desai, D. Buettner, and Z. Lan, "Analyzing and adjusting user runtime estimates to improve job scheduling on the Blue Gene/P", Int'l Symp. on Parallel & Distributed Processing, 19-23 April 2010, pp. 1--11.
[7]
U. Farooq, S. Majumdar, and E. W. Parsons, "Achieving efficiency, quality of service and robustness in multi-organizational Grids", Journal of Systems and Software, vol. 82, pp. 23--38, Jan. 2009.
[8]
G. Birkenheuer, A. Brinkmann, and H. Karl, "Risk aware overbooking for commercial grids", Int'l Workshop on Job Scheduling Strategies for Parallel Processing, 23 April 2010, pp. 51--76.
[9]
A. Matsunaga and J. Fortes, "On the Use of Machine Learning to Predict the Time and Resources Consumed by Applications", Int'l Conf. on Cluster, Cloud and Grid Computing, 17-20 May 2010, pp. 495--504.
[10]
Y. Murata, R. Egawa, M. Higashida, and H. Kobayashi, "A History-Based Job Scheduling Mechanism for the Vector Computing Cloud", Int'l Symp. on Applications and the Internet, 19-23 July 2010, pp. 125--128.
[11]
IBM. IBM ILOG CPLEX Optimization Studio. {Online}. Available: http://pic.dhe.ibm.com/infocenter/cosinfoc/v12r5/index.jsp
[12]
D. England, J. Weissman, and J. Sadagopan, "A new metric for robustness with application to job scheduling", Int'l Symp. on High Performance Distributed Computing, 24-27 July 2005, pp.135 - 143.
[13]
A. W. Mu'alem and D. G. Feitelson, "Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling", IEEE Trans. on Parallel and Distributed Systems, vol. 12, no. 6, Jun. 2001, pp. 529 -543.

Cited By

View all
  • (2020)SLA Management for Big Data Analytical Applications in CloudsACM Computing Surveys10.1145/338346453:3(1-40)Online publication date: 12-Jun-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
May 2017
1167 pages
ISBN:9781509066100

Sponsors

Publisher

IEEE Press

Publication History

Published: 14 May 2017

Check for updates

Author Tags

  1. Handling error in user-estimated job execution times
  2. MapReduce with SLAs
  3. Resource allocation and scheduling

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

CCGrid '17
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)SLA Management for Big Data Analytical Applications in CloudsACM Computing Surveys10.1145/338346453:3(1-40)Online publication date: 12-Jun-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media