Pawsey Supercomputing Centre is one of two national high performance computing centres in Austral... more Pawsey Supercomputing Centre is one of two national high performance computing centres in Australia providing 'tier 1' HPC services to researchers. Pawsey Supercomputing Centre is operated by a joint venture made of up of partner universities and CSIRO with operational and capital investment from the WA State government and the Australian Federal Government. Pawsey is currently in the early stages of the design and procurement of a new generation of computational infrastructure with a new $70m investment, made available in mid 2018. Rapid and emerging change in the demands made on the infrastructure by an increasingly diverse research community requires specific technical solutions to be developed during this process. We will discuss this rapid change, where Pawsey is now and how we are addressing evolving requirements with our current infrastructure. We will also discuss where HPC is challenged in the coming period and how we are expecting to meet these challenges. <br&g...
In this report, we analyze readiness of the code development and execution environment for adapti... more In this report, we analyze readiness of the code development and execution environment for adaptive supercomputers where a processing node is composed of heterogeneous computing and memory architectures. Current instances of such a system are Cray XK6 and XK7 compute nodes, which are composed of x86_64 CPU and NVIDIA GPU devices and DDR3 and GDDR5 memories respectively. Specifically, we focus on the integration of the CPU and accelerator programming environments, tools, MPI, numerical libraries as well as operational features such as resource monitoring, and system maintainability and upgradability. We highlight portable, platform independent technologies that exist for the Cray XE and XK, and XC30 platforms and discuss dependencies in the CPU, GPU and network tool chains that lead to current challenges for integrated solutions. This discussion enables us to formulate requirements for a future, adaptive supercomputing platform, which could contain a diverse set of node architectures...
Recently MPI implementations have been extended to support accelerator devices, Intel Many Integr... more Recently MPI implementations have been extended to support accelerator devices, Intel Many Integrated Core (MIC) and nVidia GPU. This has been accomplished by changes to different levels of the software stacks and MPI implementations. In order to evaluate performance and scalability of accelerator aware MPI libraries, we developed portable micro-benchmarks to indentify factors that influence efficincies of primitive MPI point-to-point and collective operations. These benchmarks have been implemented in OpenACC, CUDA and OpenCL. On the Intel MIC platform, existing MPI benchmarks can be executed with appropriate mapping onto the MIC and CPU cores. Our results demonstrate that the MPI operations are highly sensitive to the memory and I/O bus configurations on the node. The current implemetation of MIC on-node communication interface exhibit additional limitations on the placement of the card and data transfers over the memory bus.
In this era of diverse and heterogeneous computer architectures, the programmability issues, such... more In this era of diverse and heterogeneous computer architectures, the programmability issues, such as productivity and portable efficiency, are crucial to software development and algorithm design. One way to approach the problem is to step away from traditional sequential programming languages and move toward domain specific programming environments to balance between expressivity and efficiency. In order to demonstrate this principle, we developed a domain specific C++ generic library for stencil computations, like PDE solvers. The library features high level constructs to specify computation and allows the development of parallel stencil computations with very limited effort. The high abstraction constructs (like do_all and do_reduce) make the program shorter and cleaner with increased contextual information for better performance exploitation. The results show good performance from Windows multicores, to HPC clusters and machines with accelerators, like GPUs.
Recent developments in programming for multicore processors and accelerators using C++11, OpenCL ... more Recent developments in programming for multicore processors and accelerators using C++11, OpenCL and Domain Specific Languages (DSL) have prompted us to look into tools that offer compilers and both static and runtime analysis toolchains to complement the Cray Programming Environment capabilities. In this paper we report our preliminary experiences from using the CLang-LLVM framework on a hybrid Cray XC30 to perform tasks such as generating NVIDIA PTX code from C++ and OpenCL in a portable and flexible manner. Specifically we investigate how to overcome some of the limitations currently imposed by the standard tools such as the complete lack of C++11 support in CUDA C and outdated 32 bit versions of OpenCL. We also demonstrate how Clang-LLVM tools, for example, the static analyzer can bring additional capabilities to the Cray environment. Finally we describe how CLang-LLVM integrates with the standard Cray Programming Environment (PE), for instance, Cray MPI, perftools and libraries...
Real-time interaction is a necessary part of the modern high performance computing (HPC) environm... more Real-time interaction is a necessary part of the modern high performance computing (HPC) environment, used for tasks such as development, debugging, visualization, and experimentation. However, HPC systems are remote by nature, and current solutions for remote user interaction generally rely on remote desktop software or bespoke client-server implementations combined with an existing user interface. This can be an inhibiting factor for a domain scientist looking to incorporate simple remote interaction to their research software. Furthermore, there are very few solutions that allow the user to interact via the web, which is fast becoming a crucial platform for accessible scientific HPC software. To address this, we present a framework to support remote interaction with HPC software through web-based technologies. This lightweight framework is intended to allow HPC developers to expose remote procedure calls and data streaming to application users through a web browser, and allow rea...
IEEE Transactions on Visualization and Computer Graphics, 2000
Simulation and computation in chemistry studies have been improved as computational power has inc... more Simulation and computation in chemistry studies have been improved as computational power has increased over decades. Many types of chemistry simulation results are available, from atomic level bonding to volumetric representations of electron density. However, tools for the visualization of the results from quantum chemistry computations are still limited to showing atomic bonds and isosurfaces or isocontours corresponding to certain isovalues. In this work, we study the volumetric representations of the results from quantum chemistry computations, and evaluate and visualize the representations directly on the GPU without resampling the result in grid structures. Our visualization tool handles the direct evaluation of the approximated wavefunctions described as a combination of Gaussian-like primitive basis functions. For visualizations, we use a slice based volume rendering technique with a 2D transfer function, volume clipping, and illustrative rendering in order to reveal and enhance the quantum chemistry structure. Since there is no need of resampling the volume from the functional representations, two issues, data transfer and resampling resolution, can be ignored, therefore, it is possible to interactively explore large amount of different information in the computation results.
Pawsey Supercomputing Centre is one of two national high performance computing centres in Austral... more Pawsey Supercomputing Centre is one of two national high performance computing centres in Australia providing 'tier 1' HPC services to researchers. Pawsey Supercomputing Centre is operated by a joint venture made of up of partner universities and CSIRO with operational and capital investment from the WA State government and the Australian Federal Government. Pawsey is currently in the early stages of the design and procurement of a new generation of computational infrastructure with a new $70m investment, made available in mid 2018. Rapid and emerging change in the demands made on the infrastructure by an increasingly diverse research community requires specific technical solutions to be developed during this process. We will discuss this rapid change, where Pawsey is now and how we are addressing evolving requirements with our current infrastructure. We will also discuss where HPC is challenged in the coming period and how we are expecting to meet these challenges. <br&g...
In this report, we analyze readiness of the code development and execution environment for adapti... more In this report, we analyze readiness of the code development and execution environment for adaptive supercomputers where a processing node is composed of heterogeneous computing and memory architectures. Current instances of such a system are Cray XK6 and XK7 compute nodes, which are composed of x86_64 CPU and NVIDIA GPU devices and DDR3 and GDDR5 memories respectively. Specifically, we focus on the integration of the CPU and accelerator programming environments, tools, MPI, numerical libraries as well as operational features such as resource monitoring, and system maintainability and upgradability. We highlight portable, platform independent technologies that exist for the Cray XE and XK, and XC30 platforms and discuss dependencies in the CPU, GPU and network tool chains that lead to current challenges for integrated solutions. This discussion enables us to formulate requirements for a future, adaptive supercomputing platform, which could contain a diverse set of node architectures...
Recently MPI implementations have been extended to support accelerator devices, Intel Many Integr... more Recently MPI implementations have been extended to support accelerator devices, Intel Many Integrated Core (MIC) and nVidia GPU. This has been accomplished by changes to different levels of the software stacks and MPI implementations. In order to evaluate performance and scalability of accelerator aware MPI libraries, we developed portable micro-benchmarks to indentify factors that influence efficincies of primitive MPI point-to-point and collective operations. These benchmarks have been implemented in OpenACC, CUDA and OpenCL. On the Intel MIC platform, existing MPI benchmarks can be executed with appropriate mapping onto the MIC and CPU cores. Our results demonstrate that the MPI operations are highly sensitive to the memory and I/O bus configurations on the node. The current implemetation of MIC on-node communication interface exhibit additional limitations on the placement of the card and data transfers over the memory bus.
In this era of diverse and heterogeneous computer architectures, the programmability issues, such... more In this era of diverse and heterogeneous computer architectures, the programmability issues, such as productivity and portable efficiency, are crucial to software development and algorithm design. One way to approach the problem is to step away from traditional sequential programming languages and move toward domain specific programming environments to balance between expressivity and efficiency. In order to demonstrate this principle, we developed a domain specific C++ generic library for stencil computations, like PDE solvers. The library features high level constructs to specify computation and allows the development of parallel stencil computations with very limited effort. The high abstraction constructs (like do_all and do_reduce) make the program shorter and cleaner with increased contextual information for better performance exploitation. The results show good performance from Windows multicores, to HPC clusters and machines with accelerators, like GPUs.
Recent developments in programming for multicore processors and accelerators using C++11, OpenCL ... more Recent developments in programming for multicore processors and accelerators using C++11, OpenCL and Domain Specific Languages (DSL) have prompted us to look into tools that offer compilers and both static and runtime analysis toolchains to complement the Cray Programming Environment capabilities. In this paper we report our preliminary experiences from using the CLang-LLVM framework on a hybrid Cray XC30 to perform tasks such as generating NVIDIA PTX code from C++ and OpenCL in a portable and flexible manner. Specifically we investigate how to overcome some of the limitations currently imposed by the standard tools such as the complete lack of C++11 support in CUDA C and outdated 32 bit versions of OpenCL. We also demonstrate how Clang-LLVM tools, for example, the static analyzer can bring additional capabilities to the Cray environment. Finally we describe how CLang-LLVM integrates with the standard Cray Programming Environment (PE), for instance, Cray MPI, perftools and libraries...
Real-time interaction is a necessary part of the modern high performance computing (HPC) environm... more Real-time interaction is a necessary part of the modern high performance computing (HPC) environment, used for tasks such as development, debugging, visualization, and experimentation. However, HPC systems are remote by nature, and current solutions for remote user interaction generally rely on remote desktop software or bespoke client-server implementations combined with an existing user interface. This can be an inhibiting factor for a domain scientist looking to incorporate simple remote interaction to their research software. Furthermore, there are very few solutions that allow the user to interact via the web, which is fast becoming a crucial platform for accessible scientific HPC software. To address this, we present a framework to support remote interaction with HPC software through web-based technologies. This lightweight framework is intended to allow HPC developers to expose remote procedure calls and data streaming to application users through a web browser, and allow rea...
IEEE Transactions on Visualization and Computer Graphics, 2000
Simulation and computation in chemistry studies have been improved as computational power has inc... more Simulation and computation in chemistry studies have been improved as computational power has increased over decades. Many types of chemistry simulation results are available, from atomic level bonding to volumetric representations of electron density. However, tools for the visualization of the results from quantum chemistry computations are still limited to showing atomic bonds and isosurfaces or isocontours corresponding to certain isovalues. In this work, we study the volumetric representations of the results from quantum chemistry computations, and evaluate and visualize the representations directly on the GPU without resampling the result in grid structures. Our visualization tool handles the direct evaluation of the approximated wavefunctions described as a combination of Gaussian-like primitive basis functions. For visualizations, we use a slice based volume rendering technique with a 2D transfer function, volume clipping, and illustrative rendering in order to reveal and enhance the quantum chemistry structure. Since there is no need of resampling the volume from the functional representations, two issues, data transfer and resampling resolution, can be ignored, therefore, it is possible to interactively explore large amount of different information in the computation results.
Uploads
Papers by Ugo Varetto