AccaSim: a customizable workload management simulator for job dispatching research in HPC systems

Galleguillos, Cristian; Kiziltan, Zeynep; Netti, Alessio; Soto, Ricardo

doi:10.1007/s10586-019-02905-5

AccaSim: a customizable workload management simulator for job dispatching research in HPC systems

Published: 01 February 2019

Volume 23, pages 107–122, (2020)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Cristian Galleguillos ORCID: orcid.org/0000-0001-9460-8719^1,2,
Zeynep Kiziltan²,
Alessio Netti² &
…
Ricardo Soto¹

516 Accesses
Explore all metrics

Abstract

We present AccaSim, a simulator for workload management in HPC systems. Thanks to AccaSim’s scalability to large workload datasets, support for easy customization, and practical automated tools to aid experimentation, users can easily represent various real HPC systems, develop novel advanced dispatchers and evaluate them in a convenient way across different workload sources. AccaSim is thus an attractive tool for conducting job dispatching research in HPC systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AccaSim: An HPC Simulator for Workload Management

ScSF: A Scheduling Simulation Framework

PerficientCloudSim: a tool to simulate large-scale computation in heterogeneous clouds

Article 10 September 2020

Notes

References

Acun, B., Jain, N., Bhatele, A., Mubarak, M., Carothers, C.D., Kalé, L.V.: Preliminary evaluation of a parallel trace replay tool for HPC network simulations. In: Proc. of Euro-Par’15 Workshops, vol. 9523 of LNCS, pp. 417–429. Springer (2015)
Auweter, A., Bode, A., Brehm, M., Brochard, L., Hammer, N., Huber, H., Panda, R., Thomas, F., Wilde, T.: A case study of energy aware scheduling on supermuc. In:Proc. of ISC’14, vol. 8488 of LNCS, pp. 394–409. Springer (2014)
Banerjee, A., Mukherjee, T., Varsamopoulos, G., Gupta, S.K.: Integrating cooling awareness with thermal aware workload placement for hpc data centers. Sustain. Comput. 1(2), 134–150 (2011)
Google Scholar
Blazewicz, J., Lenstra, J.K., Kan, A.H.G.R.: Scheduling subject to resource constraints: classification and complexity. Discret. Appl. Math. 5(1), 11–24 (1983)
Article MathSciNet Google Scholar
Bodas, D., Song, J., Rajappa, M., Hoffman, A.: Simple power-aware scheduler to limit power consumption by HPC system within a budget. In: Proc. of E2SC@SC’14, pp. 21–30. IEEE (2014)
Borghesi, A., Collina, F., Lombardi, M., Milano, M., Benini, L.: Power capping in high performance computing systems. In:Proc. of CP’15, vol. 9255 of LNCS, pp. 524–540. Springer (2015)
Brandt, J.M., Debusschere, B.J., Gentile, A.C., Mayo, J., Pébay, P.P., Thompson, D.C., Wong, M.: Using probabilistic characterization to reduce runtime faults in HPC systems. In: Proc. of CCGRID’08, pp. 759–764. IEEE CS (2008)
Brennan, J., Kureshi, I., Holmes, V.: CDES: an approach to HPC workload modelling. In: Proc. of DS-RT’14, pp. 47–54. IEEE CS (2014)
Bridi, T., Bartolini, A., Lombardi, M., Milano, M., Benini, L.: A constraint programming scheduler for heterogeneous high-performance computing machines. IEEE Trans. Parallel Distrib. Syst. 27(10), 2781–2794 (2016)
Article Google Scholar
Dutot, P., Mercier, M., Poquet, M., Richard, O.: Batsim: A realistic language-independent resources and jobs management systems simulator. In: Proc. of JSSPP’16, vol. 10353 of Lecture Notes in Computer Science, pp. 178–197. Springer (2016)
Feitelson, D.G.: Metrics for parallel job scheduling and their convergence. In: Proc. of JSSPP’01, vol. 2221 of LNCS, pp. 188–206. Springer (2001)
Feitelson, D.G., Tsafrir, D., Krakov, D.: Experience with using the parallel workloads archive. J. Parallel Distrib. Comput. 74(10), 2967–2982 (2014)
Article Google Scholar
Galleguillos, C., Kiziltan, Z., Netti, A.: Accasim: an HPC simulator for workload management. In: Proc. of CARLA’17, vol. 796 of Communications in Computer and Information Science, pp. 169–184. Springer (2017)
Galleguillos, C., Sîrbu, A., Kiziltan, Z., Babaoglu, Ö., Borghesi, A., Bridi, T.: Data-driven job dispatching in HPC systems. In: Proc. of MOD’17, vol. 10710 of Lecture Notes in Computer Science, pp. 449–461. Springer (2017)
Gaussier, É., Glesser, D., Reis, V., Trystram, D.: Improving backfilling by using machine learning to predict running times. In: Proc. of SC’15, pp. 64:1–64:10. ACM (2015)
Gómez-Martín, C., Vega-Rodríguez, M.A., Sánchez, J.L.G.: Performance and energy aware scheduling simulator for HPC: evaluating different resource selection methods. Concurr. Comput. 27(17), 5436–5459 (2015)
Article Google Scholar
Hurst, W.B., Ramaswamy, S., Lenin, R.B., Hoffman, D.: Modeling and simulation of hpc systems through job scheduling analysis. In: Conference on Applied Research in Information Technology. Acxiom Laboratory of Applied Research (2010)
Jain, N., Bhatele, A., White, S., Gamblin, T., Kalé, L. V.: Evaluating HPC networks via simulation of parallel workloads. In: Proc. of SC’16, pp. 154–165. IEEE CS (2016)
Klusácek, D., Rudová, H.: Alea 2: job scheduling simulator. In: Proc. of SimuTools’10, pp. 61:1–61:10. ICST/ACM (2010)
Klusácek, D., Tóth, S., Podolníková, G.: Real-life experience with major reconfiguration of job scheduling system. In: Proc. of JSSPP’15, vol. 10353 of Lecture Notes in Computer Science, pp. 83–101. Springer (2015)
Lelong, J., Reis, V., Trystram, D.: Tuning easy-backfilling queues. In: Proc. of JSSPP’17, vol. 10773 of Lecture Notes in Computer Science, pp. 43–61. Springer (2017)
Li, Y., Gujrati, P., Lan, Z., Sun, X.: Fault-driven re-scheduling for improving system-level fault resilience. In: Proc. of ICPP’07, p. 39. IEEE CS (2007)
Liu, F., Weissman, J.B.: Elastic job bundling: an adaptive resource request strategy for large-scale parallel applications. In: Proc. of SC’15, pp. 33:1–33:12. ACM (2015)
Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput. 63(11), 1105–1122 (2003)
Article Google Scholar
Lucero, A.: Simulation of batch scheduling using real production-ready software tools. In: Proc. of IBERGRID’11, pp. 345–356. Netbiblo (2011)
Mohamed, N., Al-Jaroodi, J.: Real-time big data analytics: applications and challenges. In: Proc. of HPCS’14, pp. 305–310. IEEE (2014)
Mubarak, M., Carothers, C.D., Ross, R.B., Carns, P.H.: Enabling parallel simulation of large-scale HPC network systems. IEEE Trans. Parallel Distrib. Syst. 28(1), 87–100 (2017)
Article Google Scholar
Murali, P., Vadhiyar, S.: Metascheduling of HPC jobs in day-ahead electricity markets. IEEE Trans. Parallel Distrib. Syst. 29(3), 614–627 (2018)
Article Google Scholar
Nakata, M.: All about RICC: RIKEN integrated cluster of clusters. In: Proc. of ICNC’11, pp. 27–29. IEEE Computer Society (2011)
Netti, A., Galleguillos, C., Kiziltan, Z., Sîrbu, A., Babaoglu, Ö.: Heterogeneity-aware resource allocation in HPC systems. In: Proc. of ISC’18, vol. 10876 of Lecture Notes in Computer Science, pp. 3–21. Springer (2018)
Nuñez, A., Fernández, J., García, J.D., García, F., Carretero, J.: New techniques for simulating high performance MPI applications on large storage networks. J. Supercomput. 51(1), 40–57 (2010)
Article Google Scholar
Rodrigo, G.P., Elmroth, E., Östberg, P., Ramakrishnan, L.: Scsf: a scheduling simulation framework. In: Proc. of JSSPP’17, vol. 10773 of Lecture Notes in Computer Science, pp. 152–173. Springer (2017)
Snyder, S., Carns, P.H., Latham, R., Mubarak, M., Ross, R.B., Carothers, C.D., Behzad, B., Luu, H.V.T., Byna, S., Prabhat.: Techniques for modeling large-scale HPC I/O workloads. In: Proc. of PMBS@SC’15, pp. 5:1–5:11. ACM (2015)
Stephen, T., Benini, M.: Using and modifying the bsc slurm workload simulator. Technical report, Slurm User Group Meeting (2015)
Tang, Q., Gupta, S.K.S., Varsamopoulos, G.: Energy-efficient thermal-aware task scheduling for homogeneous high-performance computing data centers: a cyber-physical approach. IEEE Trans. Parallel Distrib. Syst. 19(11), 1458–1472 (2008)
Article Google Scholar
Wong, A.K.L., Goscinski, A.M.: Evaluating the easy-backfill job scheduling of static workloads on clusters. In: Proc. of CLUSTER’07. IEEE Computer Society (2007)
Zhou, Z., Lan, Z., Tang, W., Desai, N.: Reducing energy costs for IBM blue gene/p via power-aware job scheduling. In: Proc. of JSSPP’13, vol. 8429 of LNCS, pp. 96–115. Springer (2014)

Download references

Acknowledgements

C. Galleguillos is supported by Postgraduate Grant PUCV 2018. A. Netti is supported by a research fellowship from the Oprecomp-Open Transprecision Computing project. R. Soto is supported by Grant CONICYT/FONDECYT/ REGULAR/1160455. We are grateful to Åke Sandgren, Motoyoshi Kurokawa, and the Czech National Grid Infrastructure MetaCentrum, for providing, respectively, the Seth, RICC and the MetaCentrum workload datasets. We thank Alina Sîrbu for fruitful discussions on the work presented here. Finally, we appreciate the precious comments of the reviewers which helped improve the paper significantly. We especially thank Millian Poquet for signing his review and giving us the possibility to interact during the revision of the paper.

Author information

Authors and Affiliations

Pontificia Universidad Católica de Valparaíso, 2362807, Valparaiso, Chile
Cristian Galleguillos & Ricardo Soto
University of Bologna, 40126, Bologna, Italy
Cristian Galleguillos, Zeynep Kiziltan & Alessio Netti

Authors

Cristian Galleguillos
View author publications
You can also search for this author in PubMed Google Scholar
Zeynep Kiziltan
View author publications
You can also search for this author in PubMed Google Scholar
Alessio Netti
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Soto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristian Galleguillos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Galleguillos, C., Kiziltan, Z., Netti, A. et al. AccaSim: a customizable workload management simulator for job dispatching research in HPC systems. Cluster Comput 23, 107–122 (2020). https://doi.org/10.1007/s10586-019-02905-5

Download citation

Received: 14 February 2018
Revised: 15 July 2018
Accepted: 05 January 2019
Published: 01 February 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10586-019-02905-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AccaSim: a customizable workload management simulator for job dispatching research in HPC systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

AccaSim: An HPC Simulator for Workload Management

ScSF: A Scheduling Simulation Framework

PerficientCloudSim: a tool to simulate large-scale computation in heterogeneous clouds

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now