Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
OpenHPC: A Cohesive and
Comprehensive System Software Stack
The Time is Right
IDC HPC User Forum
April 19, 2017
Dr. Robert W. Wisniewski
Chief Software Architect Extreme Scale Computing
Senior Principal Engineer, Intel
1
2
• Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or
service activation. Learn more at intel.com, or from the OEM or retailer.
• No computer system can be absolutely secure.
• Software and workloads used in performance tests may have been optimized for performance only on Intel
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,
components, software, operations and functions. Any change to any of those factors may cause the results to vary. You
should consult other information and performance tests to assist you in fully evaluating your contemplated purchases,
including the performance of that product when combined with other products. For more complete information visit
http://www.intel.com/performance.
• Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified
circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does
not guarantee any costs or cost reduction.
• No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
• Intel, the Intel logo and others are trademarks of Intel Corporation in the U.S. and/or other countries.
• *Other names and brands may be claimed as the property of others.
• © 2017 Intel Corporation.
Legal Notices and Disclaimers
Agenda
•Trends and Challenges
•OpenHPC Vision
•OpenHPC Architecture and Implementation of Stack
•Intel® HPC Orchestrator
•Questions
3
Trends and Challenges
• Talk is software focused, but
–Hardware: scale, power, reliability, network bandwidth and latency, memory
bandwidth and latency
• Complexity of software stack
–Increase in classical HPC computing
–Richer environments, python
–New models: UQ, workflows
–Introduction of big data and analytics
–BDEC
–Multi tenancy
–Need for new frameworks
–AI: ML/DL
–Cloud 4
Trends and Challenges  Needs
• Talk is software focused, but
–Hardware: Value of co-designing and integrating cores, network, and memory
•Complexity drives the need to integrate and provide a coherent and
comprehensive system software stack rather than a bag of parts
–More components
–More components lead to a greater potential for incompatibilities
–Co-design applies within system software also
–Need to test and continuously integrate
–Over time fewer organizations could assemble the whole stack
–Increasing time going to just standing up
–Versus focusing on mission needs
5
Overview
Status
Governance:
 Technical steering committee active
since June 2016
 Technical submission process
published
 Currently there are 30 official members
in Platinum, Silver, Academic and
Technical Committees
 Held first face to face meeting, post-
formation, in June 2016 at ISC, one at
SC 2016, and one planned for ISC 2017
Goals
 Provide a common SW platform to the HPC
community that works across multiple
segments and on which end-users can
collaborate and innovate
 Simplify the complexity of installation,
configuration, and ongoing maintenance of a
custom software stack
 Receive contributions and feedback from
community to drive innovation
 Enable developers to focus on their
differentiation and unique value, rather than
having to spend on developing, testing, and
maintaining a core stack
 Deliver integrated hardware and software
innovations to ease the path to exascale
6Courtesy of OpenHPC*
*Other names and brands may be claimed as the property of others.
Background Motivation for Community Effort
7Courtesy of OpenHPC*
• Many sites spend considerable effort aggregating a large suite of open-
source projects to provide a capable HPC environment for their users:
–necessary to build/deploy HPC focused packages that are either absent or do not keep
pace from distro providers
–local packaging or customization frequently tries to give software versioning access to
users (e.g. via modules or similar equivalent)
–hierarchal packaging necessary for multiple compiler/mpi families
• On the developer front, many successful projects must engage in continual
triage and debugging regarding configuration and installation issues on HPC
systems
*Other names and brands may be claimed as the property of others.
What is OpenHPC?
8Courtesy of OpenHPC*
• OpenHPC is a community effort endeavoring to:
–provide collection(s) of pre-packaged components that can be used to
help install and manage flexible HPC systems throughout their lifecycle
–leverage standard Linux delivery model
to retain admin familiarity (ie. package repos)
–allow and promote multiple system configuration recipes that leverage
community reference
designs and best practices
–implement integration testing to
gain validation confidence
–provide additional distribution
mechanism for groups releasing
open-source software
–provide a stable platform for
new R&D initiatives
Install Guide - CentOS7.1 Version (v1.0)
eth1eth0
Data
Center
Network
high speed network
tcp networking
to compute eth interface
to compute BMC interface
compute
nodes
Lustre* storage system
Master
(SMS)
Figure 1: Overview of physical cluster architecture.
Typical cluster
architecture
*Other names and brands may be claimed as the property of others.
OpenHPC: Mission and Vision
9Courtesy of OpenHPC*
• Mission: to provide a reference collection of open-source HPC software
components and best practices, lowering barriers to deployment, advancement,
and use of modern HPC methods and tools.
• Vision: OpenHPC components and best practices will enable and accelerate
innovation and discoveries by broadening access to state-of-the-art, open-source
HPC methods and tools in a consistent environment, supported by a collaborative,
worldwide community of HPC users, developers, researchers, administrators, and
vendors.
*Other names and brands may be claimed as the property of others.
OpenHPC: Project Members
10Courtesy of OpenHPC*
Project member participation interest?
Contact Kevlin Husser or Jeff ErnstFriedman
jernstfriedman@linuxfoundation.org
Mixture of Academics, Labs, OEMs, and ISVs/OSVs
• Argonne National Laboratory • Center for Research in Extreme Scale Technologies
– Indiana University
30 Members
OpenHPC is a Linux Foundation Project
initiated by Intel and gained wide
participation right away
The goal is to collaboratively advance
the state of the software ecosystem
Governing board is composed of
Platinum members (Intel, Dell, HPE,
SUSE) plus reps from Silver &
Academic, Technical committees
WWW.OpenHPC.Community
*Other names and brands may be claimed as the property of others.
• University of Cambridge
Repository server metrics: monthly visitors
11Courtesy of OpenHPC*
v1.0
v1.0.1
v1.1
v1.1.1
v1.2
v1.2.1
v1.3
0
500
1000
1500
2000
Jul-15 Oct-15 Jan-16 May-16 Aug-16 Nov-16 Mar-17
#ofUniqueVisitors
Build Server Access: Unique Visitors
Releases
*Other names and brands may be claimed as the property of others.
Intel® HPC Orchestrator Modular View
• Intra-stack APIs to allow for customization/differentiation (OEMs enabling)
• Defined external APIs for consistency across versions (ISVs)
Node-specific OS Kernel(s)
Linux* Distro Runtime Libraries
Overlay & Pub-sub Networks, Identity
User Space
Utilities
SW
Development
Toolchain
Compiler &
Programming
Model
Runtimes
High
Performance
Parallel
LibrariesScalable
Debugging
& Perf
Analysis
Tools
Optimized
I/O
Libraries
I/O
Services
Data
Collection
And
System
Monitors
Workload
Manager
Resource
Mgmnt
Runtimes
DB
Schema
Scalable
DB
SystemManagement
(Confi,Inventory)
Provisioning
SystemDiagnostics
FabricMgmnt
Operator Interface Applications (not part of initial stack)
ISV Applications
Hardware
12*Other names and brands may be claimed as the property of others.
OpenHPC v1.3 - Current S/W components
13Courtesy of OpenHPC*
Functional Areas Components
Base OS CentOS 7.3, SLES12 SP2
Architecture x86_64, aarch64 (Tech Preview)
Administrative Tools
Conman, Ganglia, Lmod, LosF, Nagios, pdsh, prun, EasyBuild, ClusterShell,
mrsh, Genders, Shine, Spack, test-suite
Provisioning Warewulf
Resource Mgmt. SLURM, Munge, PBS Professional
Runtimes OpenMP, OCR
I/O Services Lustre client (community version)
Numerical/Scientific
Libraries
Boost, GSL, FFTW, Metis, PETSc, Trilinos, Hypre, SuperLU, SuperLU_Dist,
Mumps, OpenBLAS, Scalapack
I/O Libraries HDF5 (pHDF5), NetCDF (including C++ and Fortran interfaces), Adios
Compiler Families GNU (gcc, g++, gfortran)
MPI Families MVAPICH2, OpenMPI, MPICH
Development Tools Autotools (autoconf, automake, libtool), Valgrind,R, SciPy/NumPy
Performance Tools PAPI, IMB, mpiP, pdtoolkit TAU, Scalasca, ScoreP, SIONLib
Notes:
• Additional dependencies
that are not provided by
the BaseOS or community
repos (e.g. EPEL) are also
included
• 3rd Party libraries are built
for each compiler/MPI
family
• Resulting repositories
currently comprised of
~300 RPMs
Future additions approved
for inclusion:
• BeeGFS client
• hwloc
• Singularity
• xCAT
*Other names and brands may be claimed as the property of others.
Intel® HPC Orchestrator Framework
14
University
Community
OEM
CommunityGNU
Linux
Parallel File system
Upstream
source
Communities
Resource Manager
Upstream
source
Communities
Upstream
source
Communities
Upstream
source
Communities
Integrates and
tests HPC
stacks and
makes them
available as OS
Base
HPC Stack
OEM
Stack
University
Stack
Contributors include
Intel, OEMs, ISVs,
labs, academia
RRV
RRV
RRV
RRV
RRVs
Continuous Integration Environment
-Build Environment & Source Control
-Bug Tracking
-User & Dev Forums
-Collaboration tools
-Validation Environment
Cadence 6~12 mo
“RRV” = Relevant and Reliable Version
Intel® HPC Orchestrator
Core HPC Stack
PRODUCT
Supported HPC Stack
-Premium Features
-Advanced Integration Testing
-Testing at scale
-Validated updates
-Level3 Support across stack
OEM
Stack
PROJECT
*Other names and brands may be claimed as the property of others.
Base Stack and Derivatives
- Sufficient
performance
and scalability
- Ease of Install
- Performance &
scalability
- Energy efficiency
- Ease of use &
administration
- Auto-configuration
Ease of
administration
across multiple
tiers in the same
data center
Common Core
(same across all offerings)
Additions Targeting High End Market
Additions Targeting Volume Market
“CUSTOM”
AdditionsTargetingTop500
&Verticals
Provides
“TURNKEY”
Provides
15
“ADVANCED”
Conclusion
•Trends and challenges led to the need for a cohesive and
comprehensive system software stack for HPC
•OpenHPC provides a vehicle that facilitates collaboration,
removes replicative work, and provides a more efficient
ecosystem
•Intel® HPC Orchestrator provides a supported version of
OpenHPC analogous to CentOS and RHEL with three tiers for
different computing needs
•OpenHPC is gaining momentum with increased contributions
16
*Other names and brands may be claimed as the property of others.

More Related Content

OpenHPC: A Comprehensive System Software Stack

  • 1. OpenHPC: A Cohesive and Comprehensive System Software Stack The Time is Right IDC HPC User Forum April 19, 2017 Dr. Robert W. Wisniewski Chief Software Architect Extreme Scale Computing Senior Principal Engineer, Intel 1
  • 2. 2 • Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. • No computer system can be absolutely secure. • Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. • Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. • No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. • Intel, the Intel logo and others are trademarks of Intel Corporation in the U.S. and/or other countries. • *Other names and brands may be claimed as the property of others. • © 2017 Intel Corporation. Legal Notices and Disclaimers
  • 3. Agenda •Trends and Challenges •OpenHPC Vision •OpenHPC Architecture and Implementation of Stack •Intel® HPC Orchestrator •Questions 3
  • 4. Trends and Challenges • Talk is software focused, but –Hardware: scale, power, reliability, network bandwidth and latency, memory bandwidth and latency • Complexity of software stack –Increase in classical HPC computing –Richer environments, python –New models: UQ, workflows –Introduction of big data and analytics –BDEC –Multi tenancy –Need for new frameworks –AI: ML/DL –Cloud 4
  • 5. Trends and Challenges  Needs • Talk is software focused, but –Hardware: Value of co-designing and integrating cores, network, and memory •Complexity drives the need to integrate and provide a coherent and comprehensive system software stack rather than a bag of parts –More components –More components lead to a greater potential for incompatibilities –Co-design applies within system software also –Need to test and continuously integrate –Over time fewer organizations could assemble the whole stack –Increasing time going to just standing up –Versus focusing on mission needs 5
  • 6. Overview Status Governance:  Technical steering committee active since June 2016  Technical submission process published  Currently there are 30 official members in Platinum, Silver, Academic and Technical Committees  Held first face to face meeting, post- formation, in June 2016 at ISC, one at SC 2016, and one planned for ISC 2017 Goals  Provide a common SW platform to the HPC community that works across multiple segments and on which end-users can collaborate and innovate  Simplify the complexity of installation, configuration, and ongoing maintenance of a custom software stack  Receive contributions and feedback from community to drive innovation  Enable developers to focus on their differentiation and unique value, rather than having to spend on developing, testing, and maintaining a core stack  Deliver integrated hardware and software innovations to ease the path to exascale 6Courtesy of OpenHPC* *Other names and brands may be claimed as the property of others.
  • 7. Background Motivation for Community Effort 7Courtesy of OpenHPC* • Many sites spend considerable effort aggregating a large suite of open- source projects to provide a capable HPC environment for their users: –necessary to build/deploy HPC focused packages that are either absent or do not keep pace from distro providers –local packaging or customization frequently tries to give software versioning access to users (e.g. via modules or similar equivalent) –hierarchal packaging necessary for multiple compiler/mpi families • On the developer front, many successful projects must engage in continual triage and debugging regarding configuration and installation issues on HPC systems *Other names and brands may be claimed as the property of others.
  • 8. What is OpenHPC? 8Courtesy of OpenHPC* • OpenHPC is a community effort endeavoring to: –provide collection(s) of pre-packaged components that can be used to help install and manage flexible HPC systems throughout their lifecycle –leverage standard Linux delivery model to retain admin familiarity (ie. package repos) –allow and promote multiple system configuration recipes that leverage community reference designs and best practices –implement integration testing to gain validation confidence –provide additional distribution mechanism for groups releasing open-source software –provide a stable platform for new R&D initiatives Install Guide - CentOS7.1 Version (v1.0) eth1eth0 Data Center Network high speed network tcp networking to compute eth interface to compute BMC interface compute nodes Lustre* storage system Master (SMS) Figure 1: Overview of physical cluster architecture. Typical cluster architecture *Other names and brands may be claimed as the property of others.
  • 9. OpenHPC: Mission and Vision 9Courtesy of OpenHPC* • Mission: to provide a reference collection of open-source HPC software components and best practices, lowering barriers to deployment, advancement, and use of modern HPC methods and tools. • Vision: OpenHPC components and best practices will enable and accelerate innovation and discoveries by broadening access to state-of-the-art, open-source HPC methods and tools in a consistent environment, supported by a collaborative, worldwide community of HPC users, developers, researchers, administrators, and vendors. *Other names and brands may be claimed as the property of others.
  • 10. OpenHPC: Project Members 10Courtesy of OpenHPC* Project member participation interest? Contact Kevlin Husser or Jeff ErnstFriedman jernstfriedman@linuxfoundation.org Mixture of Academics, Labs, OEMs, and ISVs/OSVs • Argonne National Laboratory • Center for Research in Extreme Scale Technologies – Indiana University 30 Members OpenHPC is a Linux Foundation Project initiated by Intel and gained wide participation right away The goal is to collaboratively advance the state of the software ecosystem Governing board is composed of Platinum members (Intel, Dell, HPE, SUSE) plus reps from Silver & Academic, Technical committees WWW.OpenHPC.Community *Other names and brands may be claimed as the property of others. • University of Cambridge
  • 11. Repository server metrics: monthly visitors 11Courtesy of OpenHPC* v1.0 v1.0.1 v1.1 v1.1.1 v1.2 v1.2.1 v1.3 0 500 1000 1500 2000 Jul-15 Oct-15 Jan-16 May-16 Aug-16 Nov-16 Mar-17 #ofUniqueVisitors Build Server Access: Unique Visitors Releases *Other names and brands may be claimed as the property of others.
  • 12. Intel® HPC Orchestrator Modular View • Intra-stack APIs to allow for customization/differentiation (OEMs enabling) • Defined external APIs for consistency across versions (ISVs) Node-specific OS Kernel(s) Linux* Distro Runtime Libraries Overlay & Pub-sub Networks, Identity User Space Utilities SW Development Toolchain Compiler & Programming Model Runtimes High Performance Parallel LibrariesScalable Debugging & Perf Analysis Tools Optimized I/O Libraries I/O Services Data Collection And System Monitors Workload Manager Resource Mgmnt Runtimes DB Schema Scalable DB SystemManagement (Confi,Inventory) Provisioning SystemDiagnostics FabricMgmnt Operator Interface Applications (not part of initial stack) ISV Applications Hardware 12*Other names and brands may be claimed as the property of others.
  • 13. OpenHPC v1.3 - Current S/W components 13Courtesy of OpenHPC* Functional Areas Components Base OS CentOS 7.3, SLES12 SP2 Architecture x86_64, aarch64 (Tech Preview) Administrative Tools Conman, Ganglia, Lmod, LosF, Nagios, pdsh, prun, EasyBuild, ClusterShell, mrsh, Genders, Shine, Spack, test-suite Provisioning Warewulf Resource Mgmt. SLURM, Munge, PBS Professional Runtimes OpenMP, OCR I/O Services Lustre client (community version) Numerical/Scientific Libraries Boost, GSL, FFTW, Metis, PETSc, Trilinos, Hypre, SuperLU, SuperLU_Dist, Mumps, OpenBLAS, Scalapack I/O Libraries HDF5 (pHDF5), NetCDF (including C++ and Fortran interfaces), Adios Compiler Families GNU (gcc, g++, gfortran) MPI Families MVAPICH2, OpenMPI, MPICH Development Tools Autotools (autoconf, automake, libtool), Valgrind,R, SciPy/NumPy Performance Tools PAPI, IMB, mpiP, pdtoolkit TAU, Scalasca, ScoreP, SIONLib Notes: • Additional dependencies that are not provided by the BaseOS or community repos (e.g. EPEL) are also included • 3rd Party libraries are built for each compiler/MPI family • Resulting repositories currently comprised of ~300 RPMs Future additions approved for inclusion: • BeeGFS client • hwloc • Singularity • xCAT *Other names and brands may be claimed as the property of others.
  • 14. Intel® HPC Orchestrator Framework 14 University Community OEM CommunityGNU Linux Parallel File system Upstream source Communities Resource Manager Upstream source Communities Upstream source Communities Upstream source Communities Integrates and tests HPC stacks and makes them available as OS Base HPC Stack OEM Stack University Stack Contributors include Intel, OEMs, ISVs, labs, academia RRV RRV RRV RRV RRVs Continuous Integration Environment -Build Environment & Source Control -Bug Tracking -User & Dev Forums -Collaboration tools -Validation Environment Cadence 6~12 mo “RRV” = Relevant and Reliable Version Intel® HPC Orchestrator Core HPC Stack PRODUCT Supported HPC Stack -Premium Features -Advanced Integration Testing -Testing at scale -Validated updates -Level3 Support across stack OEM Stack PROJECT *Other names and brands may be claimed as the property of others.
  • 15. Base Stack and Derivatives - Sufficient performance and scalability - Ease of Install - Performance & scalability - Energy efficiency - Ease of use & administration - Auto-configuration Ease of administration across multiple tiers in the same data center Common Core (same across all offerings) Additions Targeting High End Market Additions Targeting Volume Market “CUSTOM” AdditionsTargetingTop500 &Verticals Provides “TURNKEY” Provides 15 “ADVANCED”
  • 16. Conclusion •Trends and challenges led to the need for a cohesive and comprehensive system software stack for HPC •OpenHPC provides a vehicle that facilitates collaboration, removes replicative work, and provides a more efficient ecosystem •Intel® HPC Orchestrator provides a supported version of OpenHPC analogous to CentOS and RHEL with three tiers for different computing needs •OpenHPC is gaining momentum with increased contributions 16 *Other names and brands may be claimed as the property of others.