Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3368474.3368490acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

Effect of an Incentive Implementation for Specifying Accurate Walltime in Job Scheduling

Published: 15 January 2020 Publication History

Abstract

Backfill is a widely adopted scheduling technique in shared large scale systems. Accurate estimates of walltime of jobs benefit both users and operators of such systems because backfill uses the estimated walltime for scheduling decisions. However, reports on the accuracy analyses have shown that the accuracy is very low, which causes low utilization and long wait time. To overcome this situation, we propose to implement incentives for users to request accurate walltime in scheduling policy. We introduce a measure, named WRSA (Walltime Request Specification Accuracy), which represents the accuracy of requested walltime of each user and propose WRSA-aware backfill where jobs submitted by users with high WRSA are prioritized in scheduling. Through simulation using synthetic and real workloads, we confirm that utilization is improved up to 30% and the incentive for specifying accurate walltime is also improved against existing methods.

References

[1]
[n.d.]. TSUBAME Computing Services. http://www.t3.gsic.titech.ac.jp/en/node/182.
[2]
Cynthia Bailey Lee, Yael Schwartzman, Jennifer Hardy, and Allan Snavely. 2005. Are User Runtime Estimates Inherently Inaccurate?. In Job Scheduling Strategies for Parallel Processing.
[3]
Li Bo, Chen Jun, Man Yang, and Wang Erfei. 2009. Incentives to Tight the Runtime Estimates of EASY Backfilling. In Distributed Computing and Networking. 193--199.
[4]
Steve J Chapin, Walfredo Cirne, Dror G Feitelson, James Patton Jones, Scott T Leutenegger, Uwe Schwiegelshohn, Warren Smith, and David Talby. 1999. Benchmarks and Standards for the Evaluation of Parallel Job Schedulers. In Job Scheduling Strategies for Parallel Processing. 67--90.
[5]
Su-Hui Chiang, Andrea Arpaci-Dusseau, and Mary K. Vernon. 2002. The Impact of More Accurate Requested Runtimes on Production Job Scheduling Performance. In Job Scheduling Strategies for Parallel Processing.
[6]
Dror Feitelson. [n.d.]. Parallel Workloads Archive. http://www.cs.huji.ac.il/labs/parallel/workload/index.html.
[7]
Dror G. Feitelson and Ahuva Mu'alem Weil. 1998. Utilization and Predictability in Scheduling the IBM SP2 with Backfilling. In Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.
[8]
Cristian Galleguillos, Alina Sîrbu, Zeynep Kiziltan, Ozalp Babaoglu, Andrea Borghesi, and Thomas Bridi. 2018. Data-Driven Job Dispatching in HPC Systems. In The Third International Conference on Machine Learning, Optimization, and Big Data.
[9]
Eric Gaussier, David Glesser, Valentin Reis, and Denis Trystram. 2015. Improving backfilling by using machine learning to predict running times. In SC15.
[10]
Yiannis Georgiou, David Glesser, Krzysztof Rzadca, and Denis Trystram. 2015. A Scheduler-Level Incentive Mechanism for Energy Efficiency in HPC. In 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[11]
Syed Munir Hussain Shah, Kalim Qureshi, and Haroon Rasheed. 2010. Optimal job packing, a backill scheduling optimization for a cluster of workstations. The Journal of Supercomputing 54, 3 (2010).
[12]
David A. Lifka. 1995. The ANL/IBM SP Scheduling System. In Job Scheduling Strategies for Parallel Processing.
[13]
Uri Lublin and Dror G Feitelson. 2003. The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel and Distrib. Comput. 63, 11 (2003). https://doi.org/10.1016/S0743-7315(03)00108-4
[14]
Andrea Matsunaga and Jose Fortes. 2010. On the Use of Machine Learning to Predict the Time and Resources Consumed by Applications. In 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. 495--504.
[15]
Ahuva W. Mu'alem and Dror G. Feitelson. 2001. Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Transactions on Parallel and Distributed Systems 12, 6 (jun 2001).
[16]
Shuangcheng Niu, Jidong Zhai, Xiaosong Ma, Mingliang Liu, Yan Zhai, Wenguang Chen, and Weimin Zheng. 2013. Employing Checkpoint to Improve Job Scheduling in Large-Scale Systems. In Job Scheduling Strategies for Parallel Processing.
[17]
Gonzalo P. Rodrigo, P.-O. Östberg, Erik Elmroth, Katie Antypas, Richard Gerber, and Lavanya Ramakrishnan. 2018. Towards understanding HPC users and systems: A NERSC case study. J. Parallel and Distrib. Comput. 111 (2018).
[18]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. https://arxiv.org/abs/1409.1556
[19]
Srividya Srinivasan, Rajkumar Kettimuthu, Vijay Subramani, and P Sadayappan. 2002. Characterization of Backilling Strategies for Parallel Job Scheduling. In Proceedings. International Conference on Parallel Processing Workshop. 514--519.
[20]
Srividya Srinivasan, Rajkumar Kettimuthu, Vijay Subramani, and Ponnuswamy Sadayappan. 2002. Selective Reservation Strategies for Backfill Job Scheduling. In Job Scheduling Strategies for Parallel Processing.
[21]
Wei Tang, Narayan Desai, Daniel Buettner, and Zhiling Lan. 2013. Job Scheduling with Adjusted Runtime Estimates on Production Supercomputers. J. Parallel and Distrib. Comput. 73, 7 (2013).
[22]
Dan Tsafrir, Yoav Etsion, and Dror G. Feitelson. 2007. Backfilling Using System-Generated Predictions Rather than User Runtime Estimates. IEEE Transactions on Parallel and Distributed Systems 18, 6 (2007).
[23]
Adam K.L. Wong and Andrzej M. Goscinski. 2008. The Impact of Under-Estimated Length of Jobs on EASY-Backfill Scheduling. In 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing.
[24]
Adam K. L. Wong and Andrzej M. Goscinski. 2007. Evaluating the EASY-backfill job scheduling of static workloads on clusters. In 2007 IEEE International Conference on Cluster Computing.
[25]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łtukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. https://arxiv.org/abs/1609.08144
[26]
Andy B. Yoo, Morris A. Jette, and Mark Grondona. 2003. SLURM: Simple Linux Utility for Resource Management. In Job Scheduling Strategies for Parallel Processing.
[27]
Haihang You and Hao Zhang. 2013. Comprehensive Workload Analysis and Modeling of a Petascale Supercomputer. In Job Scheduling Strategies for Parallel Processing.
[28]
Dmitry Zotkin and Peter J. Keleher. 1999. Job-length estimation and performance in backfilling schedulers. In Proceedings. The Eighth International Symposium on High Performance Distributed Computing.

Cited By

View all
  • (2023)BEASY: Making EASY Backfilling Renewable-Only2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD59825.2023.00015(57-67)Online publication date: 17-Oct-2023
  • (2022)Analyzing Power Decisions in Data Center Powered by Renewable Sources2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD55451.2022.00041(305-314)Online publication date: Nov-2022
  • (2022)Evaluation of Heuristics to Manage a Data Center Under Power Constraints2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)10.1109/IGSC55832.2022.9969362(1-8)Online publication date: 24-Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
HPCAsia '20: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
January 2020
247 pages
ISBN:9781450372367
DOI:10.1145/3368474
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 January 2020

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

HPCAsia2020

Acceptance Rates

Overall Acceptance Rate 69 of 143 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)BEASY: Making EASY Backfilling Renewable-Only2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD59825.2023.00015(57-67)Online publication date: 17-Oct-2023
  • (2022)Analyzing Power Decisions in Data Center Powered by Renewable Sources2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD55451.2022.00041(305-314)Online publication date: Nov-2022
  • (2022)Evaluation of Heuristics to Manage a Data Center Under Power Constraints2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)10.1109/IGSC55832.2022.9969362(1-8)Online publication date: 24-Oct-2022
  • (2022)Mixing Offline and Online Electrical Decisions in Data Centers Powered by Renewable SourcesIECON 2022 – 48th Annual Conference of the IEEE Industrial Electronics Society10.1109/IECON49645.2022.9968999(1-6)Online publication date: 17-Oct-2022
  • (2021)Simulation vs Actual Walltime Correction in a Real Production Resource-Constrained HPCPractice and Experience in Advanced Research Computing 2021: Evolution Across All Dimensions10.1145/3437359.3465583(1-7)Online publication date: 17-Jul-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media