research-article

An early prototype of an autonomic performance environment for exascale

Authors:

Hartmut Kaiser,

Allan Porterfield,

Ron BrightwellAuthors Info & Claims

ROSS '13: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers

Article No.: 8, Pages 1 - 8

https://doi.org/10.1145/2491661.2481434

Published: 10 June 2013 Publication History

Abstract

Extreme-scale computing requires a new perspective on the role of performance observation in the Exascale system software stack. Because of the anticipated high concurrency and dynamic operation in these systems, it is no longer reasonable to expect that a post-mortem performance measurement and analysis methodology will suffice. Rather, there is a strong need for performance observation that merges first-and third-person observation, in situ analysis, and introspection across stack layers that serves online dynamic feedback and adaptation. In this paper we describe the DOE-funded XPRESS project and the role of autonomic performance support in Exascale systems. XPRESS will build an integrated Exascale software stack (called OpenX) that supports the ParalleX execution model and is targeted towards future Exascale platforms. An initial version of an autonomic performance environment called APEX has been developed for OpenX using the current TAU performance technology and results are presented that highlight the challenges of highly integrative observation and runtime analysis.

References

[1]

Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., and Tallent, N. HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685--701. http://hpctoolkit.org/.

Digital Library

[2]

Anderson, M., Brodowicz, M., Kaiser, H., and Sterling, T. L. An Application Driven Analysis of the ParalleX Execution Model. CoRR abs/1109.5201 (2011). http://arxiv.org/abs/1109.5201.

[3]

Baker, C., Davidson, G., Evans, T. M., Hamilton, S., Jarrell, J., and Joubert, W. High performance radiation transport simulations: preparing for titan. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Los Alamitos, CA, USA, 2012), SC '12, IEEE Computer Society Press, pp. 47:1--47:10.

Digital Library

[4]

Boost: a collection of free peer-reviewed portable C++ source libraries, 2011. http://www.boost.org/.

[5]

Brightwell, R., and Pedretti, K. An intra-node implementation of OpenSHMEM using virtual address space mapping. In Proceedings of the Fifth Partitioned Global Address Space Conference (October 2011).

[6]

Dongarra, J., London, K., Moore, S., Mucci, P., and Terpstra, D. Using PAPI for hardware performance monitoring on linux systems. In International Conference on Linux Clusters: The HPC Revolution (June 2001).

[7]

Ethier, S., Tang, W. M., and Lin, Z. Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms. J. Phys: Conf. Ser.16 (2005).

[8]

Fowler, R., Cox, A., Elnikety, S., and Zwaenepoel, W. Using Performance Reflection in Systems Software. In Proceedings of USENIX Workshop on Hot Topics in Operating Systems (HOTOS IX) (Lihue, HI, Mar. 2003). Extended abstract.

Digital Library

[9]

Gamblin, T., de Supinski, B., Schulz, M., Fowler, R., and Reed, D. Efficiently clustering performance data at massive scales. In Proceedings of the International Conference on Supercomputing 2010 (ICS2010) (Tsukuba, Japan, June 2010), ACM.

Digital Library

[10]

Gamblin, T., de Supinski, B. R., Schultz, M., Fowler, R., and Reed, D. A. Scalable load-balance measurement for SPMD codes. In Proceedings of Supercomputing 2008 (Austin, TX, Nov. 2008), ACM/IEEE.

Digital Library

[11]

Heroux, M., Bartlett, R., Hoekstra, V. H. R., Hu, J., Kolda, T., Lehoucq, R., Long, K., Pawlowski, R., Phipps, E., Salinger, A., Thornquist, H., Tuminaro, R., Willenbring, J., and Williams, A. An Overview of Trilinos. Tech. Rep. SAND2003-2927, Sandia National Laboratories, 2003.

[12]

Intel. Intel® ITT API open source version. http://software.intel.com/en-us/articles/intel-itt-api-open-source, 2013.

[13]

Intel Corporation. Intel(R) Xeon(R) Processor 7500 Series Uncore Programming Guide, March 2010.

[14]

Intel Corporation. Intel MIC. http://www.intel.com/content/www/us/en/high-performance-computing/high-performance-xeon-phi-coprocessor-brief.html, 2013.

[15]

John Levon et al. OProfile. http://oprofile.sourceforge.net/. 14 April 2006.

[16]

Kaiser, H., Adelstein-Lelbach, B., et al. HPX SVN repository, 2011. Available under a BSD-style open source license. Contact [email protected] for repository access.

[17]

Kaiser, H., Brodowicz, M., and Sterling, T. ParalleX: An advanced parallel execution model for scaling-impaired applications. In Parallel Processing Workshops (Los Alamitos, CA, USA, 2009), IEEE Computer Society, pp. 394--401.

Digital Library

[18]

Kumar, R., Tullsen, D. M., Ranganathan, P., Jouppi, N. P., and Farkas, K. I. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. Computer Architecture, International Symposium on 0 (2004), 64.

Digital Library

[19]

Lin, Z., Ethier, S., and Lewandowski, J. GTC: 3D Gyrokinetic Toroidal Code, 2012.

[20]

Lin, Z., Hahm, T. S., Lee, W. W., Tang, W. M., and White, R. B. Turbulent transport reduction by zonal flows: Massively parallel simulations. Science 281, 5384 (1998), 1835--1837.

[21]

Nvidia Corporation. The benefits of quad core CPUs in mobile devices. http://www.nvidia.com/content/PDF/tegra_white_papers/tegra-whitepaper-0911a.pdf.

[22]

Olivier, S., Porterfield, A., Wheeler, K., and Prins, J. Scheduling task parallelism on multi-socket multicore systems. In International Workshop on Runtime and Operating Systems for Supercomputers (Tuson, AZ, USA, June 2011).

Digital Library

[23]

Open|SpeedShop. http://www.openspeedshop.org/wp/.

[24]

Porterfield, A., Fowler, R., and Lim, M. Y. RCRTool design document; version 0.1. Tech. Rep. RENCI Technical Report TR-10-01, RENCI, 2010.

[25]

Sandia National Laboratories. The Kitten Lightweight Kernel. https://software.sandia.gov/trac/kitten.

[26]

Shende, S., and Malony, A. The TAU Parallel Performance System. International Journal of High Performance Computing Applications 20, 2, Summer (2006), 287--311. ACTS Collection Special Issue.

Digital Library

[27]

Shende, S., and Malony, A. D. The TAU Parallel Performance System. International Journal of High Performance Computing Applications 20, 2 (Summer 2006), 287--331.

Digital Library

[28]

STE||AR Group. Systems Technologies, Emerging Parallelism, and Algorithms Reseach, 2011. http://stellar.cct.lsu.edu.

[29]

The C++ Standards Committee. ISO/IEC 14882:2011, Standard for Programming Language C++. Tech. rep., ISO/IEC, 2011. http://www.open-std.org/jtc1/sc22/wg21.

[30]

University Corporation for Atmospheric Research. Community Earth System Model (CESM). http://www.cesm.ucar.edu, 2013.

[31]

University of Oregon. ACISS. http://aciss.uoregon.edu, 2013.

Cited By

Yokelson DLappi ORamesh SVäisälä MHuck KPuro TNorris BKorpi‐Lagg MHeljanko KMalony A(2024)SOMA: Observability, monitoring, and in situ analytics for exascale applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.814136:19Online publication date: 2-Jun-2024
https://doi.org/10.1002/cpe.8141
Dais GSinganaboina SDiehl PKaiser HPfluger D(2022)From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)10.1109/ESPM256814.2022.00007(10-19)Online publication date: Nov-2022
https://doi.org/10.1109/ESPM256814.2022.00007
Ramesh SMalony ACarns PRoss RDorier MSoumagne JSnyder S(2021)SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00013(35-45)Online publication date: May-2021
https://doi.org/10.1109/IPDPS49936.2021.00013
Show More Cited By

Index Terms

An early prototype of an autonomic performance environment for exascale

Recommendations

An Autonomic Performance Environment for Exascale

Exascale systems will require new approaches to performance observation, analysis, and runtime decision-making to optimize for performance and efficiency. The standard "first-person" model, in which multiple operating system processes and threads ...
Performance at Exascale

Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components. Deep software hierarchies of large, complex software components will be required to make use of such systems. While the software layers ...
Enabling Autonomic Meta-Scheduling in Grid Environments
ICAC '08: Proceedings of the 2008 International Conference on Autonomic Computing

Grid computing supports workload execution on computing resources that are shared across a set of collaborative organizations. At the core of workload management for Grid computing is a software component, called meta-scheduler or Grid resource broker, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ROSS '13: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers

June 2013

75 pages

ISBN:9781450321464

DOI:10.1145/2491661

Conference Chairs:
Torsten Hoefler
ETH Zurich, Switzerland
,
Kamil Iskra
Argonne National Laboratory

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Office of Cyberinfrastructure
U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research (and Basic Energy Sciences/Biological and Environmental Research/High Energy Physics/Fusion Energy Sciences/Nuclear Physics)

Conference

ICS'13

Sponsor:

SIGARCH

ICS'13: International Conference on Supercomputing

June 10, 2013

Oregon, Eugene

Acceptance Rates

ROSS '13 Paper Acceptance Rate 9 of 18 submissions, 50%;

Overall Acceptance Rate 58 of 169 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
170
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)2

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yokelson DLappi ORamesh SVäisälä MHuck KPuro TNorris BKorpi‐Lagg MHeljanko KMalony A(2024)SOMA: Observability, monitoring, and in situ analytics for exascale applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.814136:19Online publication date: 2-Jun-2024
https://doi.org/10.1002/cpe.8141
Dais GSinganaboina SDiehl PKaiser HPfluger D(2022)From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)10.1109/ESPM256814.2022.00007(10-19)Online publication date: Nov-2022
https://doi.org/10.1109/ESPM256814.2022.00007
Ramesh SMalony ACarns PRoss RDorier MSoumagne JSnyder S(2021)SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00013(35-45)Online publication date: May-2021
https://doi.org/10.1109/IPDPS49936.2021.00013
Fakih BEl Baz DKotenko I(2019)GRIDHPC: A decentralized environment for high performance computingConcurrency and Computation: Practice and Experience10.1002/cpe.532032:10Online publication date: 16-May-2019
https://doi.org/10.1002/cpe.5320
Thoman PDichev KHeller TIakymchuk RAguilar XHasanov KGschwandtner PLemarinier PMarkidis SJordan HFahringer TKatrinis KLaure ENikolopoulos D(2018)A taxonomy of task-based parallel programming technologies for high-performance computingThe Journal of Supercomputing10.1007/s11227-018-2238-474:4(1422-1434)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1007/s11227-018-2238-4
Thoman PHasanov KDichev KIakymchuk RAguilar XGschwandtner PLemarinier PMarkidis SJordan HLaure EKatrinis KNikolopoulos DFahringer T(2018)A Taxonomy of Task-Based Technologies for High-Performance ComputingParallel Processing and Applied Mathematics10.1007/978-3-319-78054-2_25(264-274)Online publication date: 23-Mar-2018
https://doi.org/10.1007/978-3-319-78054-2_25
Ramesh SMahéo AShende SMalony ASubramoni HPanda DPeña ABalaji PGropp WThakur R(2017)MPI performance engineering with the MPI tool interfaceProceedings of the 24th European MPI Users' Group Meeting10.1145/3127024.3127036(1-11)Online publication date: 25-Sep-2017
https://dl.acm.org/doi/10.1145/3127024.3127036
Logan JChoi JWolf MOstrouchov GWan LPodhorszki NGodoy WKlasky SLohrmann EEisenhauer GWood CHuck K(2017)Extending Skel to Support the Development and Optimization of Next Generation I/O Systems2017 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2017.30(563-571)Online publication date: Sep-2017
https://doi.org/10.1109/CLUSTER.2017.30
Grubel PKaiser HHuck KCook J(2016)Using Intrinsic Performance Counters to Assess Efficiency in Task-Based Parallel Applications2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.115(1692-1701)Online publication date: May-2016
https://doi.org/10.1109/IPDPSW.2016.115
Kaiser HHeller TBourgeois DFey DPanda DSchulz KHamidouche KSubramoni H(2015)Higher-level parallelization for local and distributed asynchronous task-based programmingProceedings of the First International Workshop on Extreme Scale Programming Models and Middleware10.1145/2832241.2832244(29-37)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2832241.2832244
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents