Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
In the past couple of decades, the massive computational power provided by the most modern supercomputers has resulted in simulation of higher-order computational chemistry methods, previously considered intractable. As the system sizes... more
In the past couple of decades, the massive computational power provided by the most modern supercomputers has resulted in simulation of higher-order computational chemistry methods, previously considered intractable. As the system sizes continue to increase, the computational chemistry domain continues to escalate this trend using parallel computing with programming models such as Message Passing Interface (MPI) and Partitioned Global Address Space (PGAS) programming models such as Global Arrays. The ever increasing scale of these supercomputers comes at a cost of reduced Mean Time Between Failures (MTBF), currently on the order of days and projected to be on the order of hours for upcoming extreme scale systems. While traditional disk-based check pointing methods are ubiquitous for storing intermediate solutions, they suffer from high overhead of writing and recovering from checkpoints. In practice, checkpointing itself often brings the system down. Clearly, methods beyond checkpointing are imperative to handling the aggravating issue of reducing MTBF. In this paper, we address this challenge by designing and implementing an efficient fault tolerant version of the Coupled Cluster (CC) method with NWChem, using in-memory data redundancy. We present the challenges associated with our design, including an efficient data storage model, maintenance of at least one consistent data copy, and the recovery process. Our performance evaluation without faults shows that the current design exhibits a small overhead. In the presence of a simulated fault, the proposed design incurs negligible overhead in comparison to the state of the art implementation without faults.
InfiniBand is becoming increasingly popular in the area of cluster computing due to its open standard and high performance. Fat Tree is a primary interconnection topology for building large scale InfiniBand clusters. Instead of using a... more
InfiniBand is becoming increasingly popular in the area of cluster computing due to its open standard and high performance. Fat Tree is a primary interconnection topology for building large scale InfiniBand clusters. Instead of using a shared bus approach, InfiniBand employs an arbitrary switched point-to-point topology. In order to manage the subnet, InfiniBand specifies a basic management infrastructure responsible for discovery, configuration and maintaining the active state of the network. In the literature, simulation studies have been done on irregular topologies to characterize the subnet management mechanism. However, there is no study to model subnet management mechanism on regular topologies using actual implementations. In this paper, we take up the challenge of modeling subnet management mechanism for Fat Tree InfiniBand networks using a popular subnet manager OpenSM. We present the timings for various subnet management phases namely topology discovery, path computation ...
ABSTRACT The Cray Gemini Interconnect has been recently introduced as a next generation network architecture for building multi-petaflop supercomputers. Cray XE6 systems including LANL Cielo, NERSC Hopper, and the proposed NCSA... more
ABSTRACT The Cray Gemini Interconnect has been recently introduced as a next generation network architecture for building multi-petaflop supercomputers. Cray XE6 systems including LANL Cielo, NERSC Hopper, and the proposed NCSA Blue-Waters, as well as the Cray XK6 ORNL Titan leverage the Gemini Interconnect as their primary Interconnection network. At the same time, programming models such as the Message Passing Interface (MPI) and Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) and Co-Array Fortran (CAF) have become available on these systems. Global Arrays is a popular PGAS model used in a variety of application domains including hydrodynamics, chemistry and visualization. Global Arrays uses Aggregate Remote Memory Copy Interface (ARMCI) as the communication runtime system for Remote Memory Access (RMA) communication. This paper presents a design, implementation and performance evaluation of scalable and high performance communication ARMCI on Cray Gemini. The design space is explored and time-space complexities of communication protocols for one-sided communication primitives such as contiguous and uniformly non-contiguous datatypes, atomic memory operations (AMOs) and memory synchronization is presented. An implementation of the proposed design (referred as ARMCI-Gemini) demonstrates the efficacy on communication primitives, application kernels such as LU decomposition and applications such as Smooth Particle Hydrodynamics (SPH).
On the behalf of the Technical Program Committee, it is our great pleasure to welcome you to the Second International Workshop on Parallel Programming Models and Systems Software for High-End Computing. The workshop is held in conjunction... more
On the behalf of the Technical Program Committee, it is our great pleasure to welcome you to the Second International Workshop on Parallel Programming Models and Systems Software for High-End Computing. The workshop is held in conjunction with the ...
Abstract There has been a massive increase in computing requirements for parallel applications. These parallel applications and supporting cluster services often need to share system-wide resources. The coordination of these applications... more
Abstract There has been a massive increase in computing requirements for parallel applications. These parallel applications and supporting cluster services often need to share system-wide resources. The coordination of these applications is typically managed by a ...
ABSTRACT Three types of systems dominate the current High Performance Computing landscape: the Cray XE6, the IBM Blue Gene, and commodity clusters using InfiniBand. These systems have quite different characteristics making the choice for... more
ABSTRACT Three types of systems dominate the current High Performance Computing landscape: the Cray XE6, the IBM Blue Gene, and commodity clusters using InfiniBand. These systems have quite different characteristics making the choice for a particular deployment difficult. The XE6 uses Cray's proprietary Gemini 3-D torus interconnect with two nodes at each network endpoint. The latest IBM Blue Gene/Q uses a single socket integrating processor and communication in a 5-D torus network. InfiniBand provides the flexibility of using nodes from many vendors connected in many possible topologies. The performance characteristics of each vary vastly along with their utilization model. In this work we compare the performance of these three systems using a combination of micro-benchmarks and a set of production applications. We also discuss the causes of variability in performance across the systems and quantify where performance is lost using a combination of measurements and models. Our results show that significant performance can be lost in normal production operation of the Cray XE6 and InfiniBand Clusters in comparison to Blue Gene/Q.
Abstract In this work we consider a novel application centric approach for saving energy on large-scale parallel systems. By using a priori information on the expected application behavior we identify points at which processor-cores will... more
Abstract In this work we consider a novel application centric approach for saving energy on large-scale parallel systems. By using a priori information on the expected application behavior we identify points at which processor-cores will wait for incoming data and thus ...
ABSTRACT The database community has a long history of considering fault tolerance, in HPC fault tolerance has been limited to simple checkpoint/restart strategies. On emerging extreme scale compute facilities these simple techniques are... more
ABSTRACT The database community has a long history of considering fault tolerance, in HPC fault tolerance has been limited to simple checkpoint/restart strategies. On emerging extreme scale compute facilities these simple techniques are no longer feasible and the ...
Overview This workshop explored innovative ways of integrating COTS software into software systems for purposes often unimagined by their original designers. It emphasized tools and techniques for plugging COTS into software systems... more
Overview This workshop explored innovative ways of integrating COTS software into software systems for purposes often unimagined by their original designers. It emphasized tools and techniques for plugging COTS into software systems safely and predictably. The past had ...

And 39 more