Abhinav Vishnu

Publication Date: 2015

Publication Name: 2015 IEEE International Conference on Cluster Computing

Publication Date: 2007

Publication Name: 2007 Ieee International Parallel and Distributed Processing Symposium

Research Interests:
Computer Architecture, High Performance Computing, Cluster Computing, Parallel & Distributed Computing, PCI Express, and 11 moreThroughput, Message Passing, HCA, Data transfer, High performance, Fourier transform, Next Generation, Message Passing Interface, Bandwidth, Fourier Transforms, and Round Robin

In the past couple of decades, the massive computational power provided by the most modern supercomputers has resulted in simulation of higher-order computational chemistry methods, previously considered intractable. As the system sizes... more

In the past couple of decades, the massive computational power provided by the most modern supercomputers has resulted in simulation of higher-order computational chemistry methods, previously considered intractable. As the system sizes continue to increase, the computational chemistry domain continues to escalate this trend using parallel computing with programming models such as Message Passing Interface (MPI) and Partitioned Global Address Space (PGAS) programming models such as Global Arrays. The ever increasing scale of these supercomputers comes at a cost of reduced Mean Time Between Failures (MTBF), currently on the order of days and projected to be on the order of hours for upcoming extreme scale systems. While traditional disk-based check pointing methods are ubiquitous for storing intermediate solutions, they suffer from high overhead of writing and recovering from checkpoints. In practice, checkpointing itself often brings the system down. Clearly, methods beyond checkpointing are imperative to handling the aggravating issue of reducing MTBF. In this paper, we address this challenge by designing and implementing an efficient fault tolerant version of the Coupled Cluster (CC) method with NWChem, using in-memory data redundancy. We present the challenges associated with our design, including an efficient data storage model, maintenance of at least one consistent data copy, and the recovery process. Our performance evaluation without faults shows that the current design exhibits a small overhead. In the presence of a simulated fault, the proposed design incurs negligible overhead in comparison to the state of the art implementation without faults.

Publication Date: 2011

Publication Name: J Chem Theory Comput

Research Interests:
Computational Chemistry, Coupled Cluster Theory, Higher Order Thinking, Performance Evaluation, Case Study, and 9 moreData storage, Programming Model, Design Evaluation, THEORETICAL AND COMPUTATIONAL CHEMISTRY, Parallel Computer, Message Passing Interface, Fault Tolerant, High performance computer, and Chemical Theory

Publication Date: Jan 24, 2015

Publication Name: Proceedings of the 20th Acm Sigplan Symposium

Publication Date: 2006

Publication Name: Lecture Notes in Computer Science

Research Interests:
Distributed Computing, Distributed System, Linear Algebra, Cluster Computing, Communication Network, and 7 moreMPI, Shared memory, Virtual Machine, Message Passing, Matrix Multiplication, Next Generation, and Matrix Calculus

Download (.pdf)

Publication Date: 2011

Research Interests:
High End Computing and Programming Model

Publication Date: 2007

Download (.pdf)

Publication Date: May 28, 2013

Publication Date: 2011

Publication Name: Computer

Research Interests:
Distributed Computing, Modeling, Fault Tolerance, Codesign, and Computer

Publication Date: 2005

Publication Name: Proceedings of the 19th Ieee International Parallel and Distributed Processing Symposium Workshop 18 Volume 19

Research Interests:
Computer Science, Distributed Computing, High Performance Computing, Computer Networks, Engineering Management, and 5 moreCluster Computing, Parallel & Distributed Computing, Open Systems, Network Topology, and Large Scale

Download (.pdf)

Publication Date: 2010

Research Interests:
High End Computing and Programming Model

Publication Date: Jul 2, 2004

Research Interests:
Turbulence, Theory, Performance, A Priori Knowledge, Cluster Computing, and 6 morePerformance Evaluation, Protocols, High performance, Time Dependent, Eulerian, and Bandwidth

Download (.pdf)

Publication Date: 2015

Publication Name: Lecture Notes in Computer Science

Publication Date: 2015

Publication Name: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15

Publication Date: 2004

Publication Name: Proceedings of the ACM/IEEE SC2004 Conference

Research Interests:
Turbulence, Theory, Performance, A Priori Knowledge, Cluster Computing, and 6 morePerformance Evaluation, Protocols, High performance, Time Dependent, Eulerian, and Bandwidth

Download (.pdf)

Publication Date: 2015

Publication Name: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop

InfiniBand is becoming increasingly popular in the area of cluster computing due to its open standard and high performance. Fat Tree is a primary interconnection topology for building large scale InfiniBand clusters. Instead of using a... more

InfiniBand is becoming increasingly popular in the area of cluster computing due to its open standard and high performance. Fat Tree is a primary interconnection topology for building large scale InfiniBand clusters. Instead of using a shared bus approach, InfiniBand employs an arbitrary switched point-to-point topology. In order to manage the subnet, InfiniBand specifies a basic management infrastructure responsible for discovery, configuration and maintaining the active state of the network. In the literature, simulation studies have been done on irregular topologies to characterize the subnet management mechanism. However, there is no study to model subnet management mechanism on regular topologies using actual implementations. In this paper, we take up the challenge of modeling subnet management mechanism for Fat Tree InfiniBand networks using a popular subnet manager OpenSM. We present the timings for various subnet management phases namely topology discovery, path computation ...

Publication Date: 2014

Publication Name: 2014 21st International Conference on High Performance Computing (HiPC)

Publication Date: 2016

Publication Name: Parallel Computing

Research Interests:
Parallel Computing

ABSTRACT The Cray Gemini Interconnect has been recently introduced as a next generation network architecture for building multi-petaflop supercomputers. Cray XE6 systems including LANL Cielo, NERSC Hopper, and the proposed NCSA... more

ABSTRACT The Cray Gemini Interconnect has been recently introduced as a next generation network architecture for building multi-petaflop supercomputers. Cray XE6 systems including LANL Cielo, NERSC Hopper, and the proposed NCSA Blue-Waters, as well as the Cray XK6 ORNL Titan leverage the Gemini Interconnect as their primary Interconnection network. At the same time, programming models such as the Message Passing Interface (MPI) and Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) and Co-Array Fortran (CAF) have become available on these systems. Global Arrays is a popular PGAS model used in a variety of application domains including hydrodynamics, chemistry and visualization. Global Arrays uses Aggregate Remote Memory Copy Interface (ARMCI) as the communication runtime system for Remote Memory Access (RMA) communication. This paper presents a design, implementation and performance evaluation of scalable and high performance communication ARMCI on Cray Gemini. The design space is explored and time-space complexities of communication protocols for one-sided communication primitives such as contiguous and uniformly non-contiguous datatypes, atomic memory operations (AMOs) and memory synchronization is presented. An implementation of the proposed design (referred as ARMCI-Gemini) demonstrates the efficacy on communication primitives, application kernels such as LU decomposition and applications such as Smooth Particle Hydrodynamics (SPH).

Publication Date: 2012

Publication Name: 2012 19th International Conference on High Performance Computing

Publication Date: 2013

Publication Name: Parallel Computing

Research Interests:
Cognitive Science, Distributed Computing, and Parallel Computing

Publication Date: 2015

Publication Name: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015

Publication Date: 2010

Publication Name: ACM Transactions on Computing Education

Research Interests:
Information Systems, Sociology, Psychology, Cognitive Science, Distributed Computing, and 22 moreEconomics, Philosophy, Social Work, Parallel Programming, Palliative Care, Political Science, Child Welfare, Philosophical Logic, Knowledge Management in Software Engineering, Applied Economics, Early Childhood, Professional Role Socialization, Environmental Sciences, Computer Software, Humans, Child, Business and Management, Education Systems, Vine, Patient Care Team, Interdisciplinary Communication, and Systems Software

Download (.pdf)

Publication Date: 2015

Publication Name: Distributed and Parallel Databases

Research Interests:
Distributed Computing, Cloud Computing, Parallel and distributed Databases, Data Replication, and Data Format

Download (.pdf)

Publication Date: 2007

Publication Name: 2007 IEEE International Parallel and Distributed Processing Symposium

Research Interests:
Computer Architecture, High Performance Computing, Cluster Computing, Parallel & Distributed Computing, PCI Express, and 11 moreThroughput, Message Passing, HCA, Data transfer, High performance, Fourier transform, Next Generation, Message Passing Interface, Bandwidth, Fourier Transforms, and Round Robin

Download (.pdf)

Publication Date: 2005

Publication Name: Lecture Notes in Computer Science

Research Interests:
Distributed Computing, High Performance Computing, Modeling, Scheduling, Synchronization, and 9 moreDistributed System, Cluster Computing, Message Passing, High performance, Ordering, Programming Model, Input Output, Bandwidth, and Synchronisation

Download (.pdf)

Publication Date: 2010

Publication Name: 2010 International Conference on High Performance Computing

Research Interests:
High Performance Computing, Parallel Programming, Fault Detection, Communication System, Performance Evaluation, and 6 moreCollective Communication, Process management, High performance, Software Systems, Programming Model, and Fault Tolerant

Download (.pdf)

On the behalf of the Technical Program Committee, it is our great pleasure to welcome you to the Second International Workshop on Parallel Programming Models and Systems Software for High-End Computing. The workshop is held in conjunction... more

On the behalf of the Technical Program Committee, it is our great pleasure to welcome you to the Second International Workshop on Parallel Programming Models and Systems Software for High-End Computing. The workshop is held in conjunction with the ...

Download (.pdf)

Abstract There has been a massive increase in computing requirements for parallel applications. These parallel applications and supporting cluster services often need to share system-wide resources. The coordination of these applications... more

Abstract There has been a massive increase in computing requirements for parallel applications. These parallel applications and supporting cluster services often need to share system-wide resources. The coordination of these applications is typically managed by a ...

Publication Date: 2007

Publication Name: Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)

Research Interests:
Design Support, Health Service utilization, High performance, File System, and Communication Protocol

Publication Date: 2007

Publication Name: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07

Research Interests:
Case Study, Seismic analysis and design, and Programming Model

Download (.pdf)

Publication Date: 2013

Publication Name: 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

Research Interests:
Communication, Memory Management, Protocols, System on Chip, Sensorimotor synchronisation, and 2 moreElectronics Packaging and Synchronisation

Download (.pdf)

ABSTRACT Three types of systems dominate the current High Performance Computing landscape: the Cray XE6, the IBM Blue Gene, and commodity clusters using InfiniBand. These systems have quite different characteristics making the choice for... more

ABSTRACT Three types of systems dominate the current High Performance Computing landscape: the Cray XE6, the IBM Blue Gene, and commodity clusters using InfiniBand. These systems have quite different characteristics making the choice for a particular deployment difficult. The XE6 uses Cray&#39;s proprietary Gemini 3-D torus interconnect with two nodes at each network endpoint. The latest IBM Blue Gene/Q uses a single socket integrating processor and communication in a 5-D torus network. InfiniBand provides the flexibility of using nodes from many vendors connected in many possible topologies. The performance characteristics of each vary vastly along with their utilization model. In this work we compare the performance of these three systems using a combination of micro-benchmarks and a set of production applications. We also discuss the causes of variability in performance across the systems and quantify where performance is lost using a combination of measurements and models. Our results show that significant performance can be lost in normal production operation of the Cray XE6 and InfiniBand Clusters in comparison to Blue Gene/Q.

Publication Date: 2012

Publication Name: 2012 IEEE 18th International Conference on Parallel and Distributed Systems

Abstract In this work we consider a novel application centric approach for saving energy on large-scale parallel systems. By using a priori information on the expected application behavior we identify points at which processor-cores will... more

Abstract In this work we consider a novel application centric approach for saving energy on large-scale parallel systems. By using a priori information on the expected application behavior we identify points at which processor-cores will wait for incoming data and thus ...

Publication Date: 2011

Publication Name: 2011 IEEE International Conference on Cluster Computing

Research Interests:
High Performance Computing, Parallel Processing, Energy Optimization, Message Passing, and Templates

Publication Date: 2010

Publication Name: 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Research Interests:
Computer Architecture, Communication model, Computer Network, Collective Communication, Parallel & Distributed Computing, and 4 moreCase Study, Large Scale, High performance computer, and Data Transmission

Download (.pdf)

ABSTRACT The database community has a long history of considering fault tolerance, in HPC fault tolerance has been limited to simple checkpoint/restart strategies. On emerging extreme scale compute facilities these simple techniques are... more

ABSTRACT The database community has a long history of considering fault tolerance, in HPC fault tolerance has been limited to simple checkpoint/restart strategies. On emerging extreme scale compute facilities these simple techniques are no longer feasible and the ...

Publication Date: 2011

Publication Name: Proceedings of the first annual workshop on High performance computing meets databases - HPCDB '11

Research Interests:
Fault Detection, Recovery, Large Scale, Fault Tolerant, Error Recovery, and 2 moreData Redundancy and Fault recovery

Overview This workshop explored innovative ways of integrating COTS software into software systems for purposes often unimagined by their original designers. It emphasized tools and techniques for plugging COTS into software systems... more

Overview This workshop explored innovative ways of integrating COTS software into software systems for purposes often unimagined by their original designers. It emphasized tools and techniques for plugging COTS into software systems safely and predictably. The past had ...

Publication Date: 2015

Publication Name: 2015 IEEE International Conference on Cluster Computing

Publication Date: 2007

Publication Name: 2007 Ieee International Parallel and Distributed Processing Symposium

Publication Date: 2011

Publication Name: J Chem Theory Comput

Publication Date: Jan 24, 2015

Publication Name: Proceedings of the 20th Acm Sigplan Symposium

Publication Date: 2006

Publication Name: Lecture Notes in Computer Science

Publication Date: 2011

Research Interests: High End Computing and Programming Model<div>()</div>

Publication Date: 2007

Publication Date: May 28, 2013

Publication Date: 2011

Publication Name: Computer

Research Interests: Distributed Computing, Modeling, Fault Tolerance, Codesign, and Computer<div>()</div>

Publication Date: 2005

Publication Name: Proceedings of the 19th Ieee International Parallel and Distributed Processing Symposium Workshop 18 Volume 19

Publication Date: 2010

Research Interests: High End Computing and Programming Model<div>()</div>

Publication Date: Jul 2, 2004

Publication Date: 2015

Publication Name: Lecture Notes in Computer Science

Publication Date: 2015

Publication Name: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15

Publication Date: 2004

Publication Name: Proceedings of the ACM/IEEE SC2004 Conference

Publication Date: 2015

Publication Name: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Publication Date: 2014

Publication Name: 2014 21st International Conference on High Performance Computing (HiPC)

Publication Date: 2016

Publication Name: Parallel Computing

Research Interests: Parallel Computing<div>()</div>

Publication Date: 2012

Publication Name: 2012 19th International Conference on High Performance Computing

Publication Date: 2013

Publication Name: Parallel Computing

Research Interests: Cognitive Science, Distributed Computing, and Parallel Computing<div>()</div>

Publication Date: 2015

Publication Name: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015

Publication Date: 2010

Publication Name: ACM Transactions on Computing Education

Publication Date: 2015

Publication Name: Distributed and Parallel Databases

Research Interests: Distributed Computing, Cloud Computing, Parallel and distributed Databases, Data Replication, and Data Format<div>()</div>

Publication Date: 2007

Publication Name: 2007 IEEE International Parallel and Distributed Processing Symposium

Publication Date: 2005

Publication Name: Lecture Notes in Computer Science

Publication Date: 2010

Publication Name: 2010 International Conference on High Performance Computing

Publication Date: 2007

Publication Name: Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)

Research Interests: Design Support, Health Service utilization, High performance, File System, and Communication Protocol<div>()</div>

Publication Date: 2007

Publication Name: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07

Research Interests: Case Study, Seismic analysis and design, and Programming Model<div>()</div>

Publication Date: 2013

Publication Name: 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

Publication Date: 2012

Publication Name: 2012 IEEE 18th International Conference on Parallel and Distributed Systems

Publication Date: 2011

Publication Name: 2011 IEEE International Conference on Cluster Computing

Research Interests: High Performance Computing, Parallel Processing, Energy Optimization, Message Passing, and Templates<div>()</div>

Publication Date: 2010

Publication Name: 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

Publication Date: 2011

Publication Name: Proceedings of the first annual workshop on High performance computing meets databases - HPCDB '11

Publication Date: 1992

Publication Name: Nanostructures and Mesoscopic Systems

Research Interests: St<div>()</div>

Log In

Research Interests:
High End Computing and Programming Model

Research Interests:
Distributed Computing, Modeling, Fault Tolerance, Codesign, and Computer

Research Interests:
High End Computing and Programming Model

Research Interests:
Parallel Computing

Research Interests:
Cognitive Science, Distributed Computing, and Parallel Computing

Research Interests:
Distributed Computing, Cloud Computing, Parallel and distributed Databases, Data Replication, and Data Format

Research Interests:
Design Support, Health Service utilization, High performance, File System, and Communication Protocol

Research Interests:
Case Study, Seismic analysis and design, and Programming Model

Research Interests:
High Performance Computing, Parallel Processing, Energy Optimization, Message Passing, and Templates

Research Interests:
St