Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2491661.2481434acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

An early prototype of an autonomic performance environment for exascale

Published: 10 June 2013 Publication History

Abstract

Extreme-scale computing requires a new perspective on the role of performance observation in the Exascale system software stack. Because of the anticipated high concurrency and dynamic operation in these systems, it is no longer reasonable to expect that a post-mortem performance measurement and analysis methodology will suffice. Rather, there is a strong need for performance observation that merges first-and third-person observation, in situ analysis, and introspection across stack layers that serves online dynamic feedback and adaptation. In this paper we describe the DOE-funded XPRESS project and the role of autonomic performance support in Exascale systems. XPRESS will build an integrated Exascale software stack (called OpenX) that supports the ParalleX execution model and is targeted towards future Exascale platforms. An initial version of an autonomic performance environment called APEX has been developed for OpenX using the current TAU performance technology and results are presented that highlight the challenges of highly integrative observation and runtime analysis.

References

[1]
Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., and Tallent, N. HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685--701. http://hpctoolkit.org/.
[2]
Anderson, M., Brodowicz, M., Kaiser, H., and Sterling, T. L. An Application Driven Analysis of the ParalleX Execution Model. CoRR abs/1109.5201 (2011). http://arxiv.org/abs/1109.5201.
[3]
Baker, C., Davidson, G., Evans, T. M., Hamilton, S., Jarrell, J., and Joubert, W. High performance radiation transport simulations: preparing for titan. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Los Alamitos, CA, USA, 2012), SC '12, IEEE Computer Society Press, pp. 47:1--47:10.
[4]
Boost: a collection of free peer-reviewed portable C++ source libraries, 2011. http://www.boost.org/.
[5]
Brightwell, R., and Pedretti, K. An intra-node implementation of OpenSHMEM using virtual address space mapping. In Proceedings of the Fifth Partitioned Global Address Space Conference (October 2011).
[6]
Dongarra, J., London, K., Moore, S., Mucci, P., and Terpstra, D. Using PAPI for hardware performance monitoring on linux systems. In International Conference on Linux Clusters: The HPC Revolution (June 2001).
[7]
Ethier, S., Tang, W. M., and Lin, Z. Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms. J. Phys: Conf. Ser.16 (2005).
[8]
Fowler, R., Cox, A., Elnikety, S., and Zwaenepoel, W. Using Performance Reflection in Systems Software. In Proceedings of USENIX Workshop on Hot Topics in Operating Systems (HOTOS IX) (Lihue, HI, Mar. 2003). Extended abstract.
[9]
Gamblin, T., de Supinski, B., Schulz, M., Fowler, R., and Reed, D. Efficiently clustering performance data at massive scales. In Proceedings of the International Conference on Supercomputing 2010 (ICS2010) (Tsukuba, Japan, June 2010), ACM.
[10]
Gamblin, T., de Supinski, B. R., Schultz, M., Fowler, R., and Reed, D. A. Scalable load-balance measurement for SPMD codes. In Proceedings of Supercomputing 2008 (Austin, TX, Nov. 2008), ACM/IEEE.
[11]
Heroux, M., Bartlett, R., Hoekstra, V. H. R., Hu, J., Kolda, T., Lehoucq, R., Long, K., Pawlowski, R., Phipps, E., Salinger, A., Thornquist, H., Tuminaro, R., Willenbring, J., and Williams, A. An Overview of Trilinos. Tech. Rep. SAND2003-2927, Sandia National Laboratories, 2003.
[12]
Intel. Intel® ITT API open source version. http://software.intel.com/en-us/articles/intel-itt-api-open-source, 2013.
[13]
Intel Corporation. Intel(R) Xeon(R) Processor 7500 Series Uncore Programming Guide, March 2010.
[14]
Intel Corporation. Intel MIC. http://www.intel.com/content/www/us/en/high-performance-computing/high-performance-xeon-phi-coprocessor-brief.html, 2013.
[15]
John Levon et al. OProfile. http://oprofile.sourceforge.net/. 14 April 2006.
[16]
Kaiser, H., Adelstein-Lelbach, B., et al. HPX SVN repository, 2011. Available under a BSD-style open source license. Contact [email protected] for repository access.
[17]
Kaiser, H., Brodowicz, M., and Sterling, T. ParalleX: An advanced parallel execution model for scaling-impaired applications. In Parallel Processing Workshops (Los Alamitos, CA, USA, 2009), IEEE Computer Society, pp. 394--401.
[18]
Kumar, R., Tullsen, D. M., Ranganathan, P., Jouppi, N. P., and Farkas, K. I. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. Computer Architecture, International Symposium on 0 (2004), 64.
[19]
Lin, Z., Ethier, S., and Lewandowski, J. GTC: 3D Gyrokinetic Toroidal Code, 2012.
[20]
Lin, Z., Hahm, T. S., Lee, W. W., Tang, W. M., and White, R. B. Turbulent transport reduction by zonal flows: Massively parallel simulations. Science 281, 5384 (1998), 1835--1837.
[21]
Nvidia Corporation. The benefits of quad core CPUs in mobile devices. http://www.nvidia.com/content/PDF/tegra_white_papers/tegra-whitepaper-0911a.pdf.
[22]
Olivier, S., Porterfield, A., Wheeler, K., and Prins, J. Scheduling task parallelism on multi-socket multicore systems. In International Workshop on Runtime and Operating Systems for Supercomputers (Tuson, AZ, USA, June 2011).
[23]
Open|SpeedShop. http://www.openspeedshop.org/wp/.
[24]
Porterfield, A., Fowler, R., and Lim, M. Y. RCRTool design document; version 0.1. Tech. Rep. RENCI Technical Report TR-10-01, RENCI, 2010.
[25]
Sandia National Laboratories. The Kitten Lightweight Kernel. https://software.sandia.gov/trac/kitten.
[26]
Shende, S., and Malony, A. The TAU Parallel Performance System. International Journal of High Performance Computing Applications 20, 2, Summer (2006), 287--311. ACTS Collection Special Issue.
[27]
Shende, S., and Malony, A. D. The TAU Parallel Performance System. International Journal of High Performance Computing Applications 20, 2 (Summer 2006), 287--331.
[28]
STE||AR Group. Systems Technologies, Emerging Parallelism, and Algorithms Reseach, 2011. http://stellar.cct.lsu.edu.
[29]
The C++ Standards Committee. ISO/IEC 14882:2011, Standard for Programming Language C++. Tech. rep., ISO/IEC, 2011. http://www.open-std.org/jtc1/sc22/wg21.
[30]
University Corporation for Atmospheric Research. Community Earth System Model (CESM). http://www.cesm.ucar.edu, 2013.
[31]
University of Oregon. ACISS. http://aciss.uoregon.edu, 2013.

Cited By

View all
  • (2024)SOMA: Observability, monitoring, and in situ analytics for exascale applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.814136:19Online publication date: 2-Jun-2024
  • (2022)From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)10.1109/ESPM256814.2022.00007(10-19)Online publication date: Nov-2022
  • (2021)SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00013(35-45)Online publication date: May-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ROSS '13: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
June 2013
75 pages
ISBN:9781450321464
DOI:10.1145/2491661
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • Office of Cyberinfrastructure
  • U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research (and Basic Energy Sciences/Biological and Environmental Research/High Energy Physics/Fusion Energy Sciences/Nuclear Physics)

Conference

ICS'13
Sponsor:

Acceptance Rates

ROSS '13 Paper Acceptance Rate 9 of 18 submissions, 50%;
Overall Acceptance Rate 58 of 169 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SOMA: Observability, monitoring, and in situ analytics for exascale applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.814136:19Online publication date: 2-Jun-2024
  • (2022)From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)10.1109/ESPM256814.2022.00007(10-19)Online publication date: Nov-2022
  • (2021)SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00013(35-45)Online publication date: May-2021
  • (2019)GRIDHPC: A decentralized environment for high performance computingConcurrency and Computation: Practice and Experience10.1002/cpe.532032:10Online publication date: 16-May-2019
  • (2018)A taxonomy of task-based parallel programming technologies for high-performance computingThe Journal of Supercomputing10.1007/s11227-018-2238-474:4(1422-1434)Online publication date: 1-Apr-2018
  • (2018)A Taxonomy of Task-Based Technologies for High-Performance ComputingParallel Processing and Applied Mathematics10.1007/978-3-319-78054-2_25(264-274)Online publication date: 23-Mar-2018
  • (2017)MPI performance engineering with the MPI tool interfaceProceedings of the 24th European MPI Users' Group Meeting10.1145/3127024.3127036(1-11)Online publication date: 25-Sep-2017
  • (2017)Extending Skel to Support the Development and Optimization of Next Generation I/O Systems2017 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2017.30(563-571)Online publication date: Sep-2017
  • (2016)Using Intrinsic Performance Counters to Assess Efficiency in Task-Based Parallel Applications2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.115(1692-1701)Online publication date: May-2016
  • (2015)Higher-level parallelization for local and distributed asynchronous task-based programmingProceedings of the First International Workshop on Extreme Scale Programming Models and Middleware10.1145/2832241.2832244(29-37)Online publication date: 15-Nov-2015
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media