Devel::NYTProf v5 at YAPC::NA 201406

Devel::NYTProf
Perl Source Code Proﬁler
Tim Bunce - YAPC::NA - 2014

Devel::DProf Is Broken
$ perl -we 'print "sub s$_ { sqrt(42) for 1..100 };
s$_({});n" for 1..1000' > x.pl
$ perl -d:DProf x.pl
$ dprofpp -r
Total Elapsed Time = 0.108 Seconds
Real Time = 0.108 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
9.26 0.010 0.010 1 0.0100 0.0100 main::s76
9.26 0.010 0.010 1 0.0100 0.0100 main::s323
9.26 0.010 0.010 1 0.0100 0.0100 main::s626
9.26 0.010 0.010 1 0.0100 0.0100 main::s936
0.00 - -0.000 1 - - main::s77
0.00 - -0.000 1 - - main::s82

CPU Time Real Time
Subroutines
Statements
? ?
? ?
What To Measure?

Subroutine vs Statement
• Subroutine Profiling
- Measures time between subroutine entry and exit
- That’s the Inclusive time. Exclusive by subtraction.
- Reasonably fast, reasonably small data files
• Problems
- Can be confused by funky control flow (goto &sub)
- No insight into where time spent within large subs
- Doesn’t measure code outside of a sub

Subroutine vs Statement
• Line/Statement proﬁling
- Measure time from start of one statement to the start
of the next statement, whereever that might be
- Fine grained detail
• Problems
- Very expensive in CPU & I/O
- Assigns too much time in some cases
- Too much detail for large subs
- Hard to get overall subroutine times

CPU Time vs Real Time
• CPU Time
- Measures time the CPU sent executing your code
- Not (much) affected by other load on system
- Doesn’t include time spent waiting for i/o etc.
• Real Time
- Measures the elapsed time-of-day
- Your time is affected by other load on system
- Includes time spent waiting for i/o etc.

Public Service
Announcement!
The NYTProf name is an accident of history
I do not work for the New York Times
I have never worked for the New York Times
I have no afﬁliation with the New York Times
The New York Times last contributed in 2008

Running NYTProf
perl -d:NYTProf ...
perl -MDevel::NYTProf ...
Configure profiler via the NYTPROF env var
perldoc Devel::NYTProf for the details
To profile code that’s invoked elsewhere:
PERL5OPT=-d:NYTProf
NYTPROF=file=/tmp/nytprof.out:addpid=1:...

Reporting: KCachegrind
• KCachegrind call graph - new and cool
- contributed by C. L. Kao.
- requires KCachegrind
$ nytprofcg # generates nytprof.callgraph
$ kcachegrind # load the file via the gui

Reporting: HTML
• HTML report
- page per source ﬁle, annotated with times and links
- subroutine index table with sortable columns
- interactive Treemap of subroutine times
- generates Graphviz dot ﬁle of call graph
- -m (--minimal) faster generation but less detailed
$ nytprofhtml # writes HTML report in ./nytprof/...
$ nytprofhtml --file=/tmp/nytprof.out.793 --open

Devel::NYTProf v5 at YAPC::NA 201406

Summary
Links to annotated
source code
Timings for perl builtins
Link to sortable table
of all subs

Inclusive vs Exclusive Time
Inclusive
sub foo
Exclusive
sub bar
bar() bar()
foo()
Inclusive

Inclusive vs. Exclusive
• Inclusive Time is best for Top Down
- Overview of time spent “in and below this sub”
- Useful to prioritize structural optimizations
• Exclusive Time is best for Bottom Up
- Detail of time spent “in the code of this sub”
- Where the time actually gets spent
- Useful for localized (peephole) optimization

Overall time spent in and below this sub
(in + below)
Color coding based on
Median Average Deviation
relative to rest of this ﬁle
Timings for each location
that calls this subroutine
Time between starting this perl
statement and starting the next.
So includes overhead of calls to
perl subs.
Timings for each subroutine
called by each line

Boxes represent subroutines
Colors only used to show
packages (and aren’t pretty yet)
Hover over box to see details
Click to drill-down one level
in package hierarchy
Treemap showing relative
proportions of exclusive time

Do your own testing
With your own perl binary
On your own hardware
Beware My Examples!

Take care comparing code fragments!
Edge-effects at loop and scope boundaries.
Statement time includes time getting to the next
perl statement, wherever that may be.
Beware 2!

Consider effect of CPU-level data and code caching
Tends to make second case look faster!
Swap the order to double-check alternatives
Beware Your Examples!

“The First Rule of Program Optimization:
Don't do it.
The Second Rule of Program Optimization
(for experts only!): Don't do it yet.”
- Michael A. Jackson

“More computing sins are committed in the
name of efﬁciency (without necessarily
achieving it) than for any other single
reason - including blind stupidity.”
- W.A. Wulf

“We should forget about small efﬁciencies,
say about 97% of the time: premature
optimization is the root of all evil.”
- Donald Knuth

“We should forget about small efﬁciencies,
say about 97% of the time: premature
optimization is the root of all evil.
Yet we should not pass up our
opportunities in that critical 3%.”
- Donald Knuth

“Throw hardware at it!”
Hardware == Cheap
Programmers == Expensive (& error prone)
Hardware upgrades are usually much less
risky than software optimizations.

“Bottlenecks occur in surprising places, so
don't try to second guess and put in a
speed hack until you have proven that's
where the bottleneck is.”
- Rob Pike

“Measure twice, cut once.”
- Old Carpenter’s Maxim

Low Hanging Fruit
1. Proﬁle code running representative workload.
2. Look at Exclusive Time of subroutines.
3. Do they look reasonable?
4. Examine worst offenders.
5. Fix only simple local problems.
6. Proﬁle again.
7. Fast enough? Then STOP!
8. Rinse and repeat once or twice, then move on.

“Simple Local Fixes”
Changes unlikely to introduce bugs

Move invariant
expressions
out of loops

Avoid->repeated->chains
->of->accessors(...);
Avoid->repeated->chains
->of->accessors(...);
Use a temporary variable

Use faster accessors
Class::Accessor
-> Class::Accessor::Fast
--> Class::Accessor::Faster
---> Class::Accessor::Fast::XS
----> Class::XSAccessor
These aren’t all compatible so consider your actual usage.
(The list above is out of date.)

Avoid calling subs that
don’t do anything!
my $unused_variable = $self->get_foo;
my $is_logging = $log->info(...);
while (...) {
$log->info(...) if $is_logging;
...
}

Exit subs and loops early
Delay initializations
return if not ...a cheap test...;
return if not ...a more expensive test...;
my $foo = ...initializations...;
...body of subroutine...

Fix silly code
- return exists $nav_type{$country}{$key}
- ? $nav_type{$country}{$key}
- : undef;
+ return $nav_type{$country}{$key};

Beware pathological
regular expressions
Devel::NYTProf shows regular expression opcodes.
Consider using no feature 'unicode_strings';

Avoid unpacking args
in very hot subs
sub foo { shift->delegate(@_) }
sub bar {
return shift->{bar} unless @_;
return $_[0]->{bar} = $_[1];
}

Avoid unnecessary
(capturing parens)
in regex

Retest.
Fast enough?
STOP!
Put the proﬁler down and walk away

Proﬁle with a
known workload
E.g., 1000 identical requests

Check subroutine
call counts
Reasonable
for the workload?

Check Inclusive Times
(especially top-level subs)
Reasonable percentage
for the workload?

Add caching
if appropriate
to reduce calls
Remember cache invalidation!

Walk up call chain
to ﬁnd good spots
for caching
Remember cache invalidation!

Creating many objects
that don’t get used?
Try a lightweight proxy
e.g. DateTime::Tiny, DateTimeX::Lite, DateTime::LazyInit

Reconﬁgure your Perl
can yield useful gains with little effort
thread support costs ~2..30%
debugging support costs ~15%
Also consider: usemymalloc, use64bitint, use64bitall,
uselongdouble, optimize, disable taint mode.
Consider using a different compiler.

Upgrade your Perl
Newer versions often faster at some things
(though occasionally slower at others)
Sometimes have speciﬁc micro-optimizations
Many memory usage and performance
improvements from 5.8 thru 5.20

Retest.
Fast enough?
STOP!
Put the proﬁler down and walk away.

Push loops down
- $object->walk($_) for @dogs;
+ $object->walk_these(@dogs);

Use faster modules
sort ! Sort::Key
Storable ! Sereal
LWP ! HTTP::Tiny ! HTTP::Lite ! *::Curl ! Hijk
These aren’t all compatible or full-featured or ‘better’
Consider your actual needs
See http://neilb.org/reviews/

Change the data
structure
hashes <–> arrays

Change the algorithm
What’s the “Big O”?
O(n2) or O(logn) or ...

Rewrite hot-spots
in XS / C
Consider Inline::C
but beware of deployment issues

Small changes add up!
“I achieved my fast times by
multitudes of 1% reductions”
- Bill Raymond

See also “Top 10 Perl
Performance Tips”
• A presentation by Perrin Harkins
• Covers higher-level issues, including
- Good DBI usage
- Fastest modules for serialization, caching,
templating, HTTP requests etc.
• http://docs.google.com/present/view?id=dhjsvwmm_26dk9btn3g

Questions?
Tim.Bunce@pobox.com
http://blog.timbunce.org
@timbunce on twitter

Devel::NYTProf v5 at YAPC::NA 201406

More Related Content

Devel::NYTProf v5 at YAPC::NA 201406