Professional Documents
Culture Documents
Gprof: A Call Graph Execution Profiler: Susan L. Graham Peter B. Kessler Marshall K. Mckusick
Gprof: A Call Graph Execution Profiler: Susan L. Graham Peter B. Kessler Marshall K. Mckusick
Peter B. Kessler
Marshall K. McKusick
ABSTRACT
RETROSPECTIVE
In the early 1980's, a group of us at the University of California at
Berkeley were involved in a project to build compiler
construction tools [1]. We were, more or less simultaneously,
rewriting pieces of the UNIX operating system [2]. For many of
us, these were the largest, and most complex, programs on which
we had ever worked. Of course we were interested in squeezing
the last bits of performance out of these programs.
The UNIX system comes with a profiling tool, prof [3], which
we had found adequate up until then. The profiler consists of
three parts: a kernel module that maintains a histogram of the
program counter as it is observed at every clock tick; a runtime
routine, a call to which is inserted by the compilers at the head of
every function compiled with a profiling option; and a postprocessing program that aggregates and presents the data. The
program counter histogram provides statistical sampling of where
time is spent during execution. The runtime routine gathers
precise call counts. These two sources of information are
combined by post-processing to produce a table of each function
listing the number of times it was called, the time spent in it, and
the average time per call.
As our programs became more complex, and as we became
better at structuring them into shared, reusable pieces, we noticed
that the profiles were becoming more diffuse and less useful. We
observed two sources of confusion: as we partitioned operations
across several functions to make them more general, the time for
an operation spread across the several functions; and as the
functions became more useful, they were used from many places,
so it wasn't always clear why a function was being called as many
times as it was. The difficulty we were having was that we
wanted to understand the abstractions used in our system, but the
function boundaries did not correspond to abstraction boundaries.
Not being afraid to hack on the kernel and the runtime
libraries, we set about building a better profiler [4]. Our ground
rules were to change only what we needed and to make sure we
preserved the efficiency of the tool.
In fact, except for fixing a few bugs, the program counter
histogram part of the profiler worked fine. Incrementing the
20 Years of the ACM/SIGPLAN Conference on Programming Languages
Design and Implementation (1979-1999): A Selection, 2003.
Copyright 2003 ACM 1-58113-623-4 $5.00
ACM SIGPLAN
49
ACM SIGPLAN
REFERENCES
[1] S. L. Graham, R. R. Henry, and R. A. Schulman, An
Experiment in Table Drive Code Generation, SIGPLAN '82
Symposium on Compiler Construction, June, 1982.
[2] M. K. McKusick, Twenty Years of Berkeley Unix: From
AT&T-Owned to Freely Redistributable, in Open Sources:
Voices from the Open Source Revolution, O'Reilly, January,
1999.
http://www.oreilly.com/catalog/opensources/book/
kirkmck.html
[3] prof, Unix Programmer's Manual, Section 1, Bell
Laboratories, Murray Hill, NJ, January 1979.
[4] S. L. Graham, P. B. Kessler, and M. K. McKusick, An
execution profiler for modular programs, Software - Practice
& Experience, 13(8), pp. 671 - 685, August 1983.
[5] R. E. Tarjan, Depth first search and linear graph algorithm,
SIAM Journal on Computing, Volume 1, Number 2, pp.
146-160, 1972.
[6] Sun Microsystems, Inc. "Program Performance Analysis
Tools", in Forte Developer 7 Manual, Part number
816-2458-10, May 2002 Revision A.
http://docs.sun.com/source/816-2458/index.html.
[7] GNU gprof,
http://www.gnu.org/manual/gprof-2.9.1/gprof.html, 1998.
50
ACM SIGPLAN
51
ACM SIGPLAN
52
ACM SIGPLAN
53
ACM SIGPLAN
54
ACM SIGPLAN
55
ACM SIGPLAN
56
ACM SIGPLAN
57