research-article

On the scalability of the clusters-booster concept: a critical assessment of the DEEP architecture

Authors:

Damian Alvarez Mallon,

Norbert Eicker,

Maria Elena Innocenti,

Giovanni Lapenta,

Thomas Lippert,

Estela SuarezAuthors Info & Claims

FutureHPC '12: Proceedings of the Future HPC Systems: the Challenges of Power-Constrained Performance

Article No.: 3, Pages 1 - 10

https://doi.org/10.1145/2322156.2322159

Published: 25 June 2012 Publication History

Abstract

Cluster computers are dominating high performance computing (HPC) today. The success of this architecture is based on the fact that it proffits from the improvements provided by mainstream computing well known under the label of Moore's law. But trying to get to Exascale within this decade might require additional endeavors beyond surfing this technology wave. In order to find possible directions for the future we review Amdahl's and Gustafson's thoughts on scalability. Based on this analysis we propose an advance architecture combining a Cluster with a so called Booster element comprising of accelerators interconnected by a high performance fabric. We argue that this architecture provides significant advantages compared to today's accelerated clusters and might pave the way for clusters into the era of Exascale computing. The DEEP project has been presented aiming for an implementation of this concept. Six applications from fields having the potential to exploit Exascale systems will be ported to DEEP.We analyze one application in detail and explore the consequences of the constraints of the DEEP systems on its scalability.

References

[1]

http://www.top500.org

[2]

http://www.deep-project.eu

[3]

http://http://www.mpi-forum.org

[4]

Gordon E. Moore, "Cramming more components onto integrated circuits.", Electronics. 19, Nr. 3, 1965, pp. 114-117.

[5]

www.cse.nd.edu/Reports/2008/TR-2008-13.pdf

[6]

http://www.theregister.co.uk/2010/11/22/ibm_blue_gene_q_super

[7]

http://developer.nvidia.com/gpudirect

[8]

http://www.green500.org

[9]

H. Baier et al., "QPACE: power-efficient parallel architecture based on IBM PowerXCell 8i", Computer Science - R&D 25 (2010), pp. 149-154.

[10]

Gene Amdahl (1967), "Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities", (PDF), AFIPS Conference Proceedings (30), pp. 483-485.

Digital Library

[11]

John L. Gustafson, "Re-evaluating Amdahl's Law", Communications of the ACM 31(5), 1988, pp. 532-533.

Digital Library

[12]

Charles Clos, "A Study of Non-blocking Switching Networks", The Bell System Technical Journal, 1953, vol. 32, no. 2, pp. 406-424

[13]

http://www.intel.com/pressroom/archive/releases/2010/20100531comp.htm

[14]

http://newsroom.intel.com/servlet/JiveServlet/download/38-6968/Intel_SC11_presentation.pdf

[15]

Mondrian Nüssle et al., "A resource optimized remote-memory-access architecture for low-latency communication", The 38th International Conferenceon Parallel Processing (ICPP-2009), September 22-25, Vienna, Austria.

Digital Library

[16]

H. Fröning und H. Litz, Effcient Hardware Support for the Partitioned Global Address Space, 10th Workshop on Communication Architecture for Clusters (CAC2010), co-located with 24th International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, Georgia, 2012.

[17]

S. Markidis, G. Lapenta and Rizwan-Uddin, "Multi-scale simulations of plasma with iPIC3D", Mathematics and Computers in Simulation, pp. 1509-1519, 2010.

Digital Library

[18]

J. U. Brackbill and D. W. Forslund, "Simulation of low frequency, electromagnetic phenomena in plasmas", Journal of Computational Physics, 1982, p. 271.

[19]

P. Ricci, G. Lapenta and J. U. Brackbill, "A simplified implicit Maxwell solver", Journal of Computational Physics (2002), p. 117.

Digital Library

[20]

B. Marder, "A method for incorporating Gauss' law into electromagnetic PIC codes", J. Comput. Phys., vol. 68 (1987), p. 48

Digital Library

[21]

A. Bruce Langdon, "On enforcing Gauss' law in electromagnetic particle-in-cell codes", Computer Physics Communications, vol. 70, Issue 3 (1992).

[22]

A. Duran, E. Ayguaée, R. M. Badia, J. Labarta, L. Martinell, X. Martorell and J. Planas, "OmpSs: A Proposal for Programming Heterogeneous Multi-Core Architectures", in Parallel Processing Letters, vol. 21, Issue 2 (2011) pp. 173-193.

[23]

G. R. Gao, T. L. Sterling, R. Stevens, M. Hereld and W. Zhu, "ParalleX: A Study of A New Parallel Computation Model", in Proc. of 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, California, USA

Cited By

Neuwirth S(2022)Assessment of the I/O and Storage Subsystem in Modular Supercomputing Architectures2022 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER51413.2022.00077(589-596)Online publication date: Sep-2022
https://doi.org/10.1109/CLUSTER51413.2022.00077
Christou MChristoudias TMorillo JAlvarez DMerx H(2016)Earth system modelling on system-level heterogeneous architectures: EMAC (version 2.42) on the Dynamical Exascale Entry Platform (DEEP)Geoscientific Model Development10.5194/gmd-9-3483-20169:9(3483-3491)Online publication date: 29-Sep-2016
https://doi.org/10.5194/gmd-9-3483-2016
Sainz FBellon JBeltran VLabarta J(2015)Collective Offload for Heterogeneous ClustersProceedings of the 2015 IEEE 22nd International Conference on High Performance Computing (HiPC)10.1109/HiPC.2015.20(376-385)Online publication date: 16-Dec-2015
https://dl.acm.org/doi/10.1109/HiPC.2015.20
Show More Cited By

Index Terms

On the scalability of the clusters-booster concept: a critical assessment of the DEEP architecture

Recommendations

Beowulf Clusters: From Research Curiosity to Exascale
Beowulf '14: Proceedings of the 20 Years of Beowulf Workshop on Honor of Thomas Sterling's 65th Birthday

This paper reviews the technical and social events that stimulated early deployments of large-scale Beowulf-style clusters for production scientific and engineering use at the National Center for Supercomputing Applications (NCSA) and the subsequent ...
Virtual Organization Clusters
PDP '09: Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing

Sharing traditional clusters based on multiprogramming systems among different Virtual Organizations (VOs) can lead to complex situations resulting from the differing software requirements of each VO. This complexity could be eliminated if each cluster ...
An analysis of computational workloads for the ORNL Jaguar system
ICS '12: Proceedings of the 26th ACM international conference on Supercomputing

This study presents an analysis of science application workloads for the Jaguar Cray XT5 system during its tenure as a 2.3 petaflop supercomputer at Oak Ridge National Laboratory. Jaguar was the first petascale system to be deployed for open science and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FutureHPC '12: Proceedings of the Future HPC Systems: the Challenges of Power-Constrained Performance

June 2012

31 pages

ISBN:9781450314534

DOI:10.1145/2322156

Organizing Chairs:
Sanzio Bassini
CINECA, Italy
,
Adolfy Hoisie
Pacific Northwest National Laboratory, USA
,
Darren J. Kerbyson
Pacific Northwest National Laboratory, USA
,
Dirk Pleiter
Jülich Supercomputing Centre and University of Regensburg, Germany
,
Sebastiano Fabio Schifano
University of Ferrara, Italy

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Seventh Framework Programme

Conference

ICS'12

Sponsor:

SIGARCH

ICS'12: International Conference on Supercomputing

June 25, 2012

Venezia, Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
221
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Neuwirth S(2022)Assessment of the I/O and Storage Subsystem in Modular Supercomputing Architectures2022 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER51413.2022.00077(589-596)Online publication date: Sep-2022
https://doi.org/10.1109/CLUSTER51413.2022.00077
Christou MChristoudias TMorillo JAlvarez DMerx H(2016)Earth system modelling on system-level heterogeneous architectures: EMAC (version 2.42) on the Dynamical Exascale Entry Platform (DEEP)Geoscientific Model Development10.5194/gmd-9-3483-20169:9(3483-3491)Online publication date: 29-Sep-2016
https://doi.org/10.5194/gmd-9-3483-2016
Sainz FBellon JBeltran VLabarta J(2015)Collective Offload for Heterogeneous ClustersProceedings of the 2015 IEEE 22nd International Conference on High Performance Computing (HiPC)10.1109/HiPC.2015.20(376-385)Online publication date: 16-Dec-2015
https://dl.acm.org/doi/10.1109/HiPC.2015.20
Neuwirth SFrey DNuessle MBruening U(2015)Scalable communication architecture for network-attached accelerators2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2015.7056068(627-638)Online publication date: Feb-2015
https://doi.org/10.1109/HPCA.2015.7056068
Innocenti MBeck APonweiser TMarkidis SLapenta G(2015)Introduction of temporal sub-stepping in the Multi-Level Multi-Domain semi-implicit Particle-In-Cell code Parsek2D-MLMDComputer Physics Communications10.1016/j.cpc.2014.12.004189(47-59)Online publication date: Apr-2015
https://doi.org/10.1016/j.cpc.2014.12.004
Eicker NLippert TMoschny TSuarez E(2015)The DEEP Project An alternative approach to heterogeneous cluster‐computing in the many‐core eraConcurrency and Computation: Practice and Experience10.1002/cpe.356228:8(2394-2411)Online publication date: 27-Jul-2015
https://doi.org/10.1002/cpe.3562
Prabhakaran SIqbal MRinke SWolf F(2013)A Dynamic Resource Management System for Network-Attached Accelerator ClustersProceedings of the 2013 42nd International Conference on Parallel Processing10.1109/ICPP.2013.91(773-782)Online publication date: 1-Oct-2013
https://dl.acm.org/doi/10.1109/ICPP.2013.91
Eicker NLippert TMoschny TSuarez E(2013)The DEEP Project - Pursuing Cluster-Computing in the Many-Core EraProceedings of the 2013 42nd International Conference on Parallel Processing10.1109/ICPP.2013.105(885-892)Online publication date: 1-Oct-2013
https://dl.acm.org/doi/10.1109/ICPP.2013.105
Rinke SPrabhakaran SWolf F(2013)Efficient Offloading of Parallel Kernels Using MPI_Comm_SpawnProceedings of the 2013 42nd International Conference on Parallel Processing10.1109/ICPP.2013.104(877-884)Online publication date: 1-Oct-2013
https://dl.acm.org/doi/10.1109/ICPP.2013.104

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents