Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Analyzing and tuning memory performance in sequential and parallel programs
Publisher:
  • Stanford University
  • 408 Panama Mall, Suite 217
  • Stanford
  • CA
  • United States
Order Number:UMI Order No. GAX94-22104
Reflects downloads up to 10 Oct 2024Bibliometrics
Skip Abstract Section
Abstract

Recent architecture and technology trends have led to a significant gap between processor and main memory speeds. Responding to this gap, architects have introduced cache memories that are placed between processors and memories to mask high latencies. If cache misses are common, however, memory stalls can still significantly degrade execution time. To help identify and fix such memory bottlenecks, this work presents techniques to efficiently collect detailed information about program memory performance and effectively organize the data collected. These techniques help guide programmers or compilers to memory bottlenecks. They apply to both sequential and parallel applications and are embodied in the MemSpy performance monitoring system.

Experiences performance tuning several programs have driven this research, leading to the following conclusions. First, this thesis contends that the natural interrelationship between program memory bottlenecks and program data structures mandates the use of data oriented statistics, a novel approach that associates program performance information with application data structures. Data oriented statistics, viewed alone or paired with traditional code oriented statistics, offer a powerful, new dimension for performance analysis. The dissertation develops techniques for aggregating statistics on similarly-used data structures and for extracting intuitive source-code names for statistics.

Second, this thesis also argues that detailed statistics on the frequency and causes of cache misses are crucial in understanding memory bottlenecks. Common memory performance bugs are most easily distinguished by noting the causes of their resulting cache misses. Offering such information, MemSpy's performance profiles have been invaluable in analyzing memory bottlenecks in several applications.

Third, since collecting such detailed information seems, at first glance, to require large execution time slowdowns, this dissertation also evaluates techniques to improve the performance of MemSpy's simulation-based monitoring. The first optimization, hit bypassing, improves simulation performance by specializing processing of cache hits. The second optimization, reference trace sampling, improves performance by simulating only sampled portions out of the full reference trace. Together, these optimizations reduce simulation time by nearly an order of magnitude. Overall, having used MemSpy to tune several applications, these experiences demonstrate that MemSpy generates effective memory performance profiles, at speeds competitive with previous, less detailed approaches.

Contributors
  • Princeton University

Recommendations