Alistair Rendell

The Australian National University, Engineering and Computer Science, Faculty Member

Followers

Following

Co-authors

Public Views

Interests

Uploads

Papers by Alistair Rendell

For moral support and broader intellectual contributions, I would like to thank

and the good times ahead. Acknowledgments If you were successful, somebody along the line gave yo... more

Download

The design of MPI based distributed shared memory systems to support OpenMP on clusters

2007 IEEE International Conference on Cluster Computing, 2007

Download

Micro-benchmarks for Cluster OpenMP Implementations: Memory Consistency Costs

Lecture Notes in Computer Science, 2008

Download

Use of Cluster OpenMP with the Gaussian Quantum Chemistry Code: A Preliminary Performance Analysis

Lecture Notes in Computer Science, 2009

Download

The SCore Cluster Enabled OpenMP Environment: Performance Prospects for Computational Science

Lecture Notes in Computer Science, 2005

Download

Non-threaded and Threaded Approaches to MultiRail Communication with uDAPL

2009 Sixth IFIP International Conference on Network and Parallel Computing, 2009

Download

Managing Complexity in the Parallel Sparse Grid Combination Technique

J. W. Larson, P. E. Strazdins, M. Hegland, B. Harding, S. Roberts , L. Stals , A. P. Rendell, Md.... more

Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10

Journal of computational chemistry, Jan 30, 2014

Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange... more Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine.

Download

Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport

High Performance Computing - HiPC 2006, 2006

Download

Region-Based Prefetch Techniques for Software Distributed Shared Memory Systems

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, 2010

Download

Cache Oblivious Matrix Transposition: Simulation and Experiment

Lecture Notes in Computer Science, 2004

Download

Performance models for Cluster-enabled OpenMP implementations

2008 13th Asia-Pacific Computer Systems Architecture Conference, 2008

Download

Profiling Directed NUMA Optimization on Linux Systems: A Case Study of the Gaussian Computational Chemistry Code

2011 IEEE International Parallel & Distributed Processing Symposium, 2011

ABSTRACT The parallel performance of applications running on Non-Uniform Memory Access (NUMA) pla... more ABSTRACT The parallel performance of applications running on Non-Uniform Memory Access (NUMA) platforms is strongly influenced by the relative placement of memory pages to the threads that access them. As a consequence there are Linux application programmer interfaces (APIs) to control this. For large parallel codes it can, however, be difficult to determine how and when to use these APIs. In this paper we introduce the NUMAgrind profiling tool which can be used to simplify this process. It extends the Val grind binary translation framework to include a model which incorporates cache coherency, memory locality domains and interconnect traffic for arbitrary NUMA topologies. Using NUMAgrind, cache misses can be mapped to memory locality domains, page access modes determined, and pages that are referenced by multiple threads quickly determined. We show how the NUMAgrind tool can be used to guide the use of Linux memory and thread placement APIs in the Gaussian computational chemistry code. The performance of the code before and after use of these APIs is also presented for three different commodity NUMA platforms.