Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleApril 2023
DrGPU: A Top-Down Profiler for GPU Applications
ICPE '23: Proceedings of the 2023 ACM/SPEC International Conference on Performance EngineeringApril 2023, Pages 43–53https://doi.org/10.1145/3578244.3583736GPUs have become common in HPC systems to accelerate scientific computing and machine learning applications. Efficiently mapping these applications to rapid evolutions of GPU architectures for high performance is a well-known challenge. Various ...
- research-articleOctober 2022
qprof: A gprof-Inspired Quantum Profiler
ACM Transactions on Quantum Computing (TQC), Volume 4, Issue 1Article No.: 4, Pages 1–28https://doi.org/10.1145/3529398We introduce qprof, a new and extensible quantum program profiler able to generate profiling reports of quantum circuits written using various quantum computing frameworks. We describe the internal structure and working of qprof and provide practical ...
- research-articleJune 2021
NumaPerf: predictive NUMA profiling
ICS '21: Proceedings of the 35th ACM International Conference on SupercomputingJune 2021, Pages 52–62https://doi.org/10.1145/3447818.3460361It is extremely challenging to achieve optimal performance of parallel applications on a NUMA architecture, which necessitates the assistance of profiling tools. However, existing NUMA-profiling tools share some similar shortcomings, such as portability, ...
- demonstrationOctober 2020
A profiler for the matching process of henshin
MODELS '20: Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems: Companion ProceedingsOctober 2020, Article No.: 3, Pages 1–5https://doi.org/10.1145/3417990.3422000Model transformations are essential operations in Model-Driven Software Engineering (MDSE). Due to the increasing size and complexity of software systems developed with the help of MDSE, the input models for transformations are also getting bigger. In ...
- research-articleJune 2020
Tools for top-down performance analysis of GPU-accelerated applications
ICS '20: Proceedings of the 34th ACM International Conference on SupercomputingJune 2020, Article No.: 26, Pages 1–12https://doi.org/10.1145/3392717.3392752This paper describes extensions to Rice University's HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's ...
-
- short-paperApril 2020
GAPP: A Fast Profiler for Detecting Serialization Bottlenecks in Parallel Linux Applications
ICPE '20: Proceedings of the ACM/SPEC International Conference on Performance EngineeringApril 2020, Pages 257–264https://doi.org/10.1145/3358960.3379136We present a parallel profiling tool, GAPP, that identifies serialization bottlenecks in parallel Linux applications arising from load imbalance or contention for shared resources . It works by tracing kernel context switch events using kernel probes ...
- posterFebruary 2020
A tool for top-down performance analysis of GPU-accelerated applications
PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingFebruary 2020, Pages 415–416https://doi.org/10.1145/3332466.3374534To support performance measurement and analysis of GPU-accelerated applications, we extended the HPCToolkit performance tools with several novel features. To support efficient monitoring of accelerated applications, HPCToolkit employs a new wait-free ...
- research-articleDecember 2019
Profiling Dynamic Data Access Patterns with Controlled Overhead and Quality
Middleware '19: Proceedings of the 20th International Middleware Conference Industrial TrackDecember 2019, Pages 1–7https://doi.org/10.1145/3366626.3368125Modern workloads tend to have huge working sets and low locality. Despite this trend, the capacity of DRAM has not been increased enough to accommodate such huge working sets. Therefore, memory management mechanisms optimized for such modern workloads ...
- research-articleSeptember 2019
Profiling Halide DSL with CPU Performance Events for Schedule Optimization
SBLP '19: Proceedings of the XXIII Brazilian Symposium on Programming LanguagesSeptember 2019, Pages 38–45https://doi.org/10.1145/3355378.3355381Halide is a domain-specific language (DSL) for image processing that enforces a separation of the algorithm and the execution schedule, allowing the generation of specialized code for distinct computer architectures by rewriting only the execution ...
- research-articleNovember 2015
DAGViz: a DAG visualization tool for analyzing task-parallel program traces
VPA '15: Proceedings of the 2nd Workshop on Visual Performance AnalysisNovember 2015, Article No.: 3, Pages 1–8https://doi.org/10.1145/2835238.2835241In task-based parallel programming, programmers can expose logical parallelism of their programs by creating fine-grained tasks at arbitrary places in their code. All other burdens in the parallel execution of these tasks such as thread management, task ...
- research-articleAugust 2015
JITProf: pinpointing JIT-unfriendly JavaScript code
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software EngineeringAugust 2015, Pages 357–368https://doi.org/10.1145/2786805.2786831Most modern JavaScript engines use just-in-time (JIT) compilation to translate parts of JavaScript code into efficient machine code at runtime. Despite the overall success of JIT compilers, programmers may still write code that uses the dynamic ...
- research-articleFebruary 2014
A tool to analyze the performance of multithreaded programs on NUMA architectures
PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programmingFebruary 2014, Pages 259–272https://doi.org/10.1145/2555243.2555271Almost all of today's microprocessors contain memory controllers and directly attach to memory. Modern multiprocessor systems support non-uniform memory access (NUMA): it is faster for a microprocessor to access memory that is directly attached than it ...
Also Published in:
ACM SIGPLAN Notices: Volume 49 Issue 8August 2014 - ArticleOctober 2013
High-Level GPU Multi-purpose Profiler
3PGCIC '13: Proceedings of the 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet ComputingOctober 2013, Pages 549–553https://doi.org/10.1109/3PGCIC.2013.94The graphics processing units (GPUs) have become an integral part of today's computing systems. They have risen and evolved over the last years, becoming a platform for parallel computation with a large number of scalar processors and abundant memory ...
- research-articleSeptember 2012
Visualizing transactional memory
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesSeptember 2012, Pages 159–170https://doi.org/10.1145/2370816.2370842This paper presents TMProf, a transactional memory (TM) profiler, based on three visualization principles. These principles are (i) the precise graphical representation of transaction interactions including cross-correlated information and source code, (...
- research-articleFebruary 2012
The profiling method in multicore processor for effective performance improvement
- Seung Hyun Yoon,
- Kyung Min Lee,
- Yong Seok Kim,
- Seong Jin Cho,
- Dong Won Choi,
- Key Ho Kwon,
- Kil Jae Kim,
- Jong Hyun Park,
- Jae Wook Jeon
ICUIMC '12: Proceedings of the 6th International Conference on Ubiquitous Information Management and CommunicationFebruary 2012, Article No.: 82, Pages 1–4https://doi.org/10.1145/2184751.2184848Today, multi-core processors are being used widely in mobile environments in addition to the existing PC-based environment. In order to use a multi-core processor efficiently, parallel programming skills are required. However, incorrect parallelization ...
- posterFebruary 2011
Kremlin: like gprof, but for parallelization
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingFebruary 2011, Pages 293–294https://doi.org/10.1145/1941553.1941595This paper overviews Kremlin, a software profiling tool designed to assist the parallelization of serial programs. Kremlin accepts a serial source code, profiles it, and provides a list of regions that should be considered in parallelization. Unlike a ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 8August 2011 - ArticleSeptember 2010
Generated Cycle-Accurate Profiler for C Language
DSD '10: Proceedings of the 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and ToolsSeptember 2010, Pages 263–268https://doi.org/10.1109/DSD.2010.39Application-specific instruction set processors used in embedded systems are highly optimized for a given task. On this type of processors runs a specific application. Therefore, the designer should have a tool which helps him or her in the task of ...
- posterOctober 2009
The observer effect of profiling on dynamic Java optimizations
OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applicationsOctober 2009, Pages 757–758https://doi.org/10.1145/1639950.1640000We show that the bytecode injection approach used in common Java profilers, such as HPROF and JProfiler, disables some program optimizations that are performed when the same program is running without a profiler. This behavior is present in both the ...
- research-articleOctober 2008
Profiler and compiler assisted adaptive I/O prefetching for shared storage caches
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniquesOctober 2008, Pages 112–121https://doi.org/10.1145/1454115.1454133I/O prefetching has been employed in the past as one of the mechanisms to hide large disk latencies. However, I/O prefetching in parallel applications is problematic when multiple CPUs share the same set of disks due to the possibility that prefetches ...
- ArticleApril 1997
Knowledge discovery from users Web-page navigation
RIDE '97: Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale ApplicationsApril 1997, Page 20The authors propose to detect users' navigation paths to the advantage of Web site owners. First, they explain the design and implementation of a profiler which captures a client's selected links and page order, accurate page viewing time and cache ...