Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2022
Preparing for performance analysis at exascale
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 34, Pages 1–13https://doi.org/10.1145/3524059.3532397Performance tools for emerging heterogeneous exascale platforms must address two principal challenges when analyzing execution measurements. First, measurement of large-scale executions may record mountains of performance data. Second, performance ...
- research-articleJune 2022
Low overhead and context sensitive profiling of GPU-accelerated applications
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 1, Pages 1–13https://doi.org/10.1145/3524059.3532388As we near the end of Moore's law scaling, the next-generation computing platforms are increasingly exploring heterogeneous processors for acceleration. Graphics Processing Units (GPUs) are the most widely used accelerators. Meanwhile, applications are ...
- research-articleJune 2020
Tools for top-down performance analysis of GPU-accelerated applications
ICS '20: Proceedings of the 34th ACM International Conference on SupercomputingArticle No.: 26, Pages 1–12https://doi.org/10.1145/3392717.3392752This paper describes extensions to Rice University's HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's ...
- research-articleJune 2018
Automated Analysis of Time Series Data to Understand Parallel Program Behaviors
ICS '18: Proceedings of the 2018 International Conference on SupercomputingPages 240–251https://doi.org/10.1145/3205289.3205308Traditionally, performance analysis tools have focused on collecting measurements, attributing them to program source code, and presenting them; responsibility for analysis and interpretation of measurement data falls to application developers. While ...
- proceedingJune 2015
ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
Welcome to the 29th ACM International Conference on Supercomputing (ICS), June 8-11, 2015 at Newport Beach, CA. ICS is well known as the premier technical forum where researchers present their latest results and share with colleagues their perspectives ...
-
- research-articleJune 2014
Author retrospective: compilation techniques for block-cyclic distributions
ACM International Conference on Supercomputing 25th Anniversary VolumePages 29–31https://doi.org/10.1145/2591635.2591651Compilers for data-parallel languages use data distribution specifications to guide code generation for distributed-memory machines. Our 1994 paper described how to generate efficient code for programs that employ block-cyclic data distributions. In ...
- research-articleJune 2014
Author retrospective for PTRAN's analysis and optimization techniques
ACM International Conference on Supercomputing 25th Anniversary VolumePages 1–3https://doi.org/10.1145/2591635.2591638The PTRAN (Parallel Translator) system at IBM had as its goal the analysis and optimization of sequential programs for parallel architectures. In this paper, we give our perspective on what has changed since PTRAN, and what is still relevant.
- research-articleJune 2013
A new approach for performance analysis of openMP programs
ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputingPages 69–80https://doi.org/10.1145/2464996.2465433The number of hardware threads is growing with each new generation of multicore chips; thus, one must effectively use threads to fully exploit emerging processors. OpenMP is a popular directive-based programming model that helps programmers exploit ...
- research-articleMay 2011
Scalable fine-grained call path tracing
ICS '11: Proceedings of the international conference on SupercomputingPages 63–74https://doi.org/10.1145/1995896.1995908Applications must scale well to make efficient use of even medium-scale parallel systems. Because scaling problems are often difficult to diagnose, there is a critical need for scalable tools that guide scientists to the root causes of performance ...
- research-articleJune 2009
Chunking parallel loops in the presence of synchronization
ICS '09: Proceedings of the 23rd international conference on SupercomputingPages 181–192https://doi.org/10.1145/1542275.1542304Modern languages for shared-memory parallelism are moving from a bulk-synchronous Single Program Multiple Data (SPMD) execution model to lightweight Task Parallel execution models for improved productivity. This shift is intended to encourage ...
- research-articleJune 2008
Phasers: a unified deadlock-free construct for collective and point-to-point synchronization
ICS '08: Proceedings of the 22nd annual international conference on SupercomputingPages 277–288https://doi.org/10.1145/1375527.1375568Coordination and synchronization of parallel tasks is a major source of complexity in parallel programming. These constructs take many forms in practice including mutual exclusion in accesses to shared resources, termination detection of child tasks, ...
- ArticleJune 2007
Scalability analysis of SPMD codes using expectations
ICS '07: Proceedings of the 21st annual international conference on SupercomputingPages 13–22https://doi.org/10.1145/1274971.1274976We present a new technique for identifying scalability bottlenecks in executions of single-program, multiple-data (SPMD) parallel programs, quantifying their impact on performance, and associating this information with the program source code. Our ...
- ArticleJune 2006
Profitable loop fusion and tiling using model-driven empirical search
ICS '06: Proceedings of the 20th annual international conference on SupercomputingPages 249–258https://doi.org/10.1145/1183401.1183437Loop fusion and tiling are both recognized as effective transformations for improving memory performance of scientific applications. However, because of their sensitivity to the underlying cache architecture and their interaction with each other it is ...
- ArticleJune 2005
Low-overhead call path profiling of unmodified, optimized code
ICS '05: Proceedings of the 19th annual international conference on SupercomputingPages 81–90https://doi.org/10.1145/1088149.1088161Call path profiling associates resource consumption with the calling context in which resources were consumed. We describe the design and implementation of a low-overhead call path profiler based on stack sampling. The profiler uses a novel sample-...
- ArticleJune 2002
Experiences tuning SMG98: a semicoarsening multigrid benchmark based on the hypre library
ICS '02: Proceedings of the 16th international conference on SupercomputingPages 305–314https://doi.org/10.1145/514191.514233LLNL's hypre library is an object-oriented library for the solution of sparse linear systems on parallel computers. While hypre facilitates rapid-prototyping of complex parallel applications, our experience is that without careful attention to temporal ...
- ArticleJune 2001
Tools for application-oriented performance tuning
ICS '01: Proceedings of the 15th international conference on SupercomputingPages 154–165https://doi.org/10.1145/377792.377826Application performance tuning is a complex process that requires assembling various types of information and correlating it with source code to pinpoint the causes of performance bottlenecks. Existing performance tools don't adequately support this ...
- ArticleJune 2001
Optimizing strategies for telescoping languages: procedure strength reduction and procedure vectorization
ICS '01: Proceedings of the 15th international conference on SupercomputingPages 92–101https://doi.org/10.1145/377792.377812At Rice University, we have undertaken a project to construct a framework for generating high-level problem solving languages that can achieve high performance on a variety of platforms.The underlying strategy, called telescoping languages, builds ...
- ArticleMay 2000
Fast greedy weighted fusion
ICS '00: Proceedings of the 14th international conference on SupercomputingPages 131–140https://doi.org/10.1145/335231.335244Loop fusion is important to optimizing compilers because it is an important tool in managing the memory hierarchy. By fusing loops that use the same data elements, we can reduce the distance between accesses to the same datum and avoid costly cache ...