short-paper

Public Access

HPC System Lifetime Story: Workload Characterization and Evolutionary Analyses on NERSC Systems

Authors:

Gonzalo Pedro Rodrigo Álvarez,

Lavanya RamakrishnanAuthors Info & Claims

HPDC '15: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing

Pages 57 - 60

https://doi.org/10.1145/2749246.2749270

Published: 15 June 2015 Publication History

PDF eReader

Abstract

High performance computing centers have traditionally served monolithic MPI applications. However, in recent years, many of the large scientific computations have included high throughput and data-intensive jobs. HPC systems have mostly used batch queue schedulers to schedule these workloads on appropriate resources. There is a need to understand future scheduling scenarios that can support the diverse scientific workloads in HPC centers. In this paper, we analyze the workloads on two systems (Hopper, Carver) at the National Energy Research Scientific Computing (NERSC) Center. Specifically, we present a trend analysis towards understanding the evolution of the workload over the lifetime of the two systems.

References

[1]

K. Antypas, B. A. Austin, T. L. Butler, and R. A. Gerber. NERSC workload analysis on Hopper. Technical report, LBNL Report: 6804E, October 2014.

Google Scholar

[2]

M. A. Bauer, A. Biem, S. McIntyre, N. Tamura, and Y. Xie. High-performance parallel and stream processing of x-ray microdiffraction data on multicores. In Journal of Physics: Conference Series, volume 341, page 012025. IOP Publishing, 2012.

Google Scholar

[3]

S. Di, D. Kondo, and W. Cirne. Characterization and comparison of cloud versus grid workloads. In 2012 IEEE International Conference on Cluster Computing (CLUSTER), pages 230--238. IEEE, 2012.

Digital Library

Google Scholar

[4]

D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn. Parallel job scheduling, a status report. In Job Scheduling Strategies for Parallel Processing, pages 1--16. Springer, 2005.

Digital Library

Google Scholar

[5]

I. Foster, Y. Zhao, I. Raicu, and S. Lu. Cloud computing and grid computing 360-degree compared. In Grid Computing Environments Workshop, 2008. GCE'08, pages 1--10. Ieee, 2008.

Crossref

Google Scholar

[6]

A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, and D. Epema. The grid workloads archive. Future Generation Computer Systems, 24(7):672--686, 2008.

Digital Library

Google Scholar

[7]

D. A. Lifka. The ANL/IBM SP scheduling system. In Job Scheduling Strategies for Parallel Processing, pages 295--303. Springer, 1995.

Digital Library

Google Scholar

[8]

S. N. Srirama, P. Jakovits, and E. Vainikko. Adapting scientific computing problems to clouds using MapReduce. Future Generation Computer Systems, 28(1):184--192, 2012.

Digital Library

Google Scholar

[9]

G. Staples. Torque resource manager. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 8. ACM, 2006.

Digital Library

Google Scholar

[10]

W. W.-S. Wei. Time series analysis. Addison-Wesley publ, 1994.

Google Scholar

Cited By

View all

Zhou ZSun JSun G(2024)Automated HPC Workload Generation Combining Statistical Modeling and Autoregressive AnalysisBenchmarking, Measuring, and Optimizing10.1007/978-981-97-0316-6_10(153-170)Online publication date: 14-Feb-2024
https://doi.org/10.1007/978-981-97-0316-6_10
Park JHuang XLee C(2023)Analyzing and predicting job failures from HPC system logThe Journal of Supercomputing10.1007/s11227-023-05482-y80:1(435-462)Online publication date: 24-Jun-2023
https://doi.org/10.1007/s11227-023-05482-y
Paul AChoi JKarimi AWang FWeissman JChandra AGavrilovska ATiwari D(2022)Machine Learning Assisted HPC Workload Trace Generation for Leadership Scale Storage SystemsProceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing10.1145/3502181.3531457(199-212)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3502181.3531457
Show More Cited By

Index Terms

HPC System Lifetime Story: Workload Characterization and Evolutionary Analyses on NERSC Systems
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Enabling Workflow-Aware Scheduling on HPC Systems
HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Scientific workflows are increasingly common in the workloads of current High Performance Computing (HPC) systems. However, HPC schedulers do not incorporate workflow-specific mechanisms beyond the capacity to declare dependencies between their jobs. ...
An analysis of computational workloads for the ORNL Jaguar system
ICS '12: Proceedings of the 26th ACM international conference on Supercomputing

This study presents an analysis of science application workloads for the Jaguar Cray XT5 system during its tenure as a 2.3 petaflop supercomputer at Oak Ridge National Laboratory. Jaguar was the first petascale system to be deployed for open science and ...
Virtualized HPC: a contradiction in terms?

System virtualization has become the enabling technology to manage the increasing number of different applications inside data centers. The abstraction from the underlying hardware and the provision of multiple virtual machines (VM) on a single physical ...

Comments

Information & Contributors

Information

Published In

HPDC '15: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing

June 2015

296 pages

ISBN:9781450335508

DOI:10.1145/2749246

General Chair:
Thilo Kielmann
VU University Amsterdam, The Netherlands
,
Program Chairs:
Dean Hildebrand
IBM Research Almaden
,
Michela Taufer
University of Delaware

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

U.S. Department of Energy
Swedish Government's strategic effort eSSENCE
Swedish Research Council (VR)
European Union's Seventh Framework Programme

Conference

HPDC'15

Sponsor:

University of Arizona
SIGARCH

HPDC'15: The 24th International Symposium on High-Performance Parallel and Distributed Computing

June 15 - 19, 2015

Oregon, Portland, USA

Acceptance Rates

HPDC '15 Paper Acceptance Rate 19 of 116 submissions, 16%;

Overall Acceptance Rate 166 of 966 submissions, 17%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
664
Total Downloads

Downloads (Last 12 months)70
Downloads (Last 6 weeks)12

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhou ZSun JSun G(2024)Automated HPC Workload Generation Combining Statistical Modeling and Autoregressive AnalysisBenchmarking, Measuring, and Optimizing10.1007/978-981-97-0316-6_10(153-170)Online publication date: 14-Feb-2024
https://doi.org/10.1007/978-981-97-0316-6_10
Park JHuang XLee C(2023)Analyzing and predicting job failures from HPC system logThe Journal of Supercomputing10.1007/s11227-023-05482-y80:1(435-462)Online publication date: 24-Jun-2023
https://doi.org/10.1007/s11227-023-05482-y
Paul AChoi JKarimi AWang FWeissman JChandra AGavrilovska ATiwari D(2022)Machine Learning Assisted HPC Workload Trace Generation for Leadership Scale Storage SystemsProceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing10.1145/3502181.3531457(199-212)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3502181.3531457
Patel TLiu ZKettimuthu RRich PAllcock WTiwari DCuicchi CQualters IKramer W(2020)Job characteristics on large-scale systemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433812(1-17)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.5555/3433701.3433812
Patel TLiu ZKettimuthu RRich PAllcock WTiwari D(2020)Job Characteristics on Large-Scale Systems: Long-Term Analysis, Quantification, and ImplicationsSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00088(1-17)Online publication date: Nov-2020
https://doi.org/10.1109/SC41405.2020.00088
Domke JMatsuoka SIvanov ITsushima YYuki TNomura AMiura SMcDonald NFloyd DDubé NTaufer MBalaji PPeña A(2019)HyperX topologyProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356140(1-23)Online publication date: 17-Nov-2019
https://dl.acm.org/doi/10.1145/3295500.3356140
Aupy GGainaru AHonore VRaghavan PRobert YSun H(2019)Reservation Strategies for Stochastic Jobs2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00027(166-175)Online publication date: May-2019
https://doi.org/10.1109/IPDPS.2019.00027
Yang WYang ZZhou YWang FChen CWang Y(2019)A Comprehensive Analysis of User Job Data on a Petascale Supercomputer Dedicated to CFD2019 IEEE 5th International Conference on Computer and Communications (ICCC)10.1109/ICCC47050.2019.9064094(86-91)Online publication date: Dec-2019
https://doi.org/10.1109/ICCC47050.2019.9064094
Nie BYang LJog ASmirni EOskin MInoue K(2018)Fault site pruning for practical reliability analysis of GPGPU applicationsProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00066(749-761)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00066
Feng JLiu GZhang JZhang ZYu JZhang Z(2018)Workload Characterization and Evolutionary Analyses of Tianhe-1A SupercomputerComputational Science – ICCS 201810.1007/978-3-319-93698-7_44(578-585)Online publication date: 12-Jun-2018
https://doi.org/10.1007/978-3-319-93698-7_44
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Enabling Workflow-Aware Scheduling on HPC Systems

An analysis of computational workloads for the ORNL Jaguar system

Virtualized HPC: a contradiction in terms?

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF

eReader

Login options

Full Access

Abstract

References

Cited By

Index Terms

Recommendations

Enabling Workflow-Aware Scheduling on HPC Systems

An analysis of computational workloads for the ORNL Jaguar system

Virtualized HPC: a contradiction in terms?

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations