ACM Applicative System Methodology 2016

System
Methodology

Holis0c
Performance
Analysis
on

Modern
Systems

Brendan Gregg
Senior Performance Architect
Jun,
2016
ACM Applicative 2016

Apollo LMGC
performance analysis
ERASABLE

MEMORY

CORE
SET

AREA

VAC
SETS

FIXED

MEMORY

History

•  System Performance Analysis up to the '90s:
–  Closed source UNIXes and applications
–  Vendor-created metrics and performance tools
–  Users interpret given metrics
•  Problems
–  Vendors may not provide the best metrics
–  Often had to infer, rather than measure
–  Given metrics, what do we do with them?
# ps alx
F S UID PID PPID CPU PRI NICE ADDR SZ WCHAN TTY TIME CMD
3 S 0 0 0 0 0 20 2253 2 4412 ? 186:14 swapper
1 S 0 1 0 0 30 20 2423 8 46520 ? 0:00 /etc/init
1 S 0 16 1 0 30 20 2273 11 46554 co 0:00 –sh
[…]

Today

1.  Open source
–  Operating systems: Linux, BSDs, illumos, etc.
–  Applications: source online (Github)
2.  Custom metrics
–  Can patch the open source, or,
–  Use dynamic tracing (open source helps)
3.  Methodologies
–  Start with the questions, then make metrics to answer them
–  Methodologies can pose the questions
Biggest problem with dynamic tracing has been what to do with it.
Methodologies guide your usage.

Street
Light
An#-‐Method

1.  Pick observability tools that are
–  Familiar
–  Found on the Internet
–  Found at random
2.  Run tools
3.  Look for obvious issues

Drunk
Man
An#-‐Method

•  Drink Tune things at random until the problem goes away

Blame
Someone
Else
An#-‐Method

1.  Find a system or environment component you are not
responsible for
2.  Hypothesize that the issue is with that component
3.  Redirect the issue to the responsible team
4.  When proven wrong, go to 1

Traﬃc
Light
An#-‐Method

1.  Turn all metrics into traffic lights
2.  Open dashboard
3.  Everything green? No worries, mate.
•  Type I errors: red instead of green
–  team wastes time
•  Type II errors: green instead of red
–  performance issues undiagnosed
–  team wastes more time looking elsewhere
Traffic lights are suitable for objective metrics (eg, errors),
not subjective metrics (eg, IOPS, latency).

Performance
Methodologies

System Methodologies:
–  Problem statement method
–  Functional diagram method
–  Workload analysis
–  Workload characterization
–  Resource analysis
–  USE method
–  Thread State Analysis
–  On-CPU analysis
–  CPU flame graph analysis
–  Off-CPU analysis
–  Latency correlations
–  Checklists
–  Static performance tuning
–  Tools-based methods
…
•  For system engineers:
–  ways to analyze unfamiliar
systems and applications
•  For app developers:
–  guidance for metric and
dashboard design
Collect your
own toolbox of
methodologies

Problem
Statement
Method

1.  What makes you think there is a performance problem?
2.  Has this system ever performed well?
3.  What has changed recently?
–  software? hardware? load?
4.  Can the problem be described in terms of latency?
–  or run time. not IOPS or throughput.
5.  Does the problem affect other people or applications?
6.  What is the environment?
–  software, hardware, instance types?
versions? config?

Func0onal
Diagram
Method

1.  Draw the functional diagram
2.  Trace all components in the data path
3.  For each component, check performance
Breaks up a bigger problem into
smaller, relevant parts
Eg, imagine throughput
between the UCSB 360 and
the UTAH PDP10 was slow…
ARPA
Network
1969

Workload
Analysis

•  Begin with application metrics & context
•  A drill-down methodology
•  Pros:
–  Proportional,
accurate metrics
–  App context
•  Cons:
–  App specific
–  Difficult to dig from
app to resource
Applica0on

System
Libraries

System
Calls

Kernel

Hardware

Workload

Analysis

Workload
Characteriza0on

•  Check the workload: who, why, what, how
–  not resulting performance
•  Eg, for CPUs:
1.  Who: which PIDs, programs, users
2.  Why: code paths, context
3.  What: CPU instructions, cycles
4.  How: changing over time
Target
Workload

Workload
Characteriza0on:
CPUs

Who
How What
Why
top CPU
sample

ﬂame
graphs

monitoring
PMCs

Resource
Analysis

•  Typical approach for system performance analysis:
begin with system tools & metrics
•  Pros:
–  Generic
–  Aids resource
perf tuning
•  Cons:
–  Uneven coverage
–  False positives
Applica0on

System
Libraries

System
Calls

Kernel

Hardware

Workload

Analysis

The
USE
Method

•  For every resource, check:
1.  Utilization: busy time
2.  Saturation: queue length or time
3.  Errors: easy to interpret (objective)
Starts with the questions, then finds the tools
Eg, for hardware, check every resource incl. busses:

http://www.brendangregg.com/USEmethod/use-rosetta.html

ERASABLE

MEMORY

CORE
SET

AREA

VAC
SETS

FIXED

MEMORY

Apollo Guidance
Computer

USE
Method:
SoZware

•  USE method can also work for software resources
–  kernel or app internals, cloud environments
–  small scale (eg, locks) to large scale (apps). Eg:
•  Mutex locks:
–  utilization à lock hold time
–  saturation à lock contention
–  errors à any errors
•  Entire application:
–  utilization à percentage of worker threads busy
–  saturation à length of queued work
–  errors à request errors
Resource

U0liza0on

(%)
X

RED
Method

•  For every service, check that:
1.  Request rate
2.  Error rate
3.  Duration (distribution)
are within SLO/A
Another exercise in posing questions
from functional diagrams
By Tom Wilkie: http://www.slideshare.net/weaveworks/monitoring-microservices
Load

Balancer

Web

Proxy

Web
Server

User

Database

Payments

Server

Asset

Server

Metrics

Database

Thread
State
Analysis

Identify & quantify
time in states
Narrows further
analysis to state
Thread states are
applicable to all apps
State transition diagram

TSA:
eg,
RSTS/E

RSTS: DEC OS
from the 1970's
TENEX (1969-72)
also had Control-T
for job states

TSA:
eg,
OS
X

Instruments:
Thread
States

On-‐CPU
Analysis

1.  Split into user/kernel states
–  /proc, vmstat(1)
2.  Check CPU balance
–  mpstat(1), CPU utilization heat map
3.  Profile software
–  User & kernel stack sampling (as a CPU flame graph)
4.  Profile cycles, caches, busses
–  PMCs, CPI flame graph
CPU
U0liza0on

Heat
Map

CPU
Flame
Graph
Analysis

1.  Take a CPU profile
2.  Render it as a flame graph
3.  Understand all software that is in >1% of samples
Discovers issues by their CPU usage
-  Directly: CPU consumers
-  Indirectly: initialization
of I/O, locks, times, ...
Narrows target of study
to only running code
-  See: "The Flame Graph",
CACM, June 2016
Flame
Graph

Java
Mixed-‐Mode
CPU
Flame
Graph

Java

JVM

Kernel

GC

•  eg, Linux perf_events, with:
•  Java –XX:+PreserveFramePointer
•  Java perf-map-agent

CPI
Flame
Graph

•  Profile cycle stack traces and instructions or stalls separately
•  Generate CPU flame graph (cycles) and color using other profile
•  eg, FreeBSD: pmcstat
red
==
instruc0ons

blue
==
stalls

Oﬀ-‐CPU
Analysis

Analyze off-CPU time
via blocking code path:
Off-CPU flame graph
Often need wakeup
code paths as well…

Off-‐CPU
Time
Flame
Graph

file
read

from
disk

directory
read

from
disk

pipe
write

path
read
from
disk

fstat
from
disk

Stack
depth
Off-‐CPU
0me
Trace blocking events with
kernel stacks & time blocked
(eg, using Linux BPF)

Wakeup
Time
Flame
Graph

… can also associate wake-up stacks with off-CPU stacks
(eg, Linux 4.6: samples/bpf/offwaketime*)
Who did the wakeup:

Associate more than
one waker: the full
chain of wakeups
With enough stacks,
all paths lead to metal
An approach for
analyzing all off-CPU
issues
Chain
Graphs

Latency
Correla0ons

1.  Measure latency
histograms at different
stack layers
2.  Compare histograms
to find latency origin
Even better, use latency
heat maps
•  Match outliers based on
both latency and time

Checklists:
eg,
Linux
Perf
Analysis
in
60s

1.  uptime
2.  dmesg | tail
3.  vmstat 1
4.  mpstat -P ALL 1
5.  pidstat 1
6.  iostat -xz 1
7.  free -m
8.  sar -n DEV 1
9.  sar -n TCP,ETCP 1
10.  top
load
averages

kernel
errors

overall
stats
by
0me

CPU
balance

process
usage

disk
I/O

memory
usage

network
I/O

TCP
stats

check
overview

http://techblog.netflix.com/2015/11/linux-performance-analysis-in-60s.html

1.
RPS,
CPU
2.
Volume

6.
Load
Avg

3.
Instances
4.
Scaling

5.
CPU/RPS

7.
Java
Heap
8.
ParNew

9.
Latency
10.
99th
0le

Checklists:
eg,
Neklix
perfvitals
Dashboard

Sta0c
Performance
Tuning:
eg,
Linux

Tools-‐Based
Method

1.  Try all the tools! May be an anti-pattern. Eg, OS X:

Other
Methodologies

•  Scientific method
•  5 Why's
•  Process of elimination
•  Intel's Top-Down Methodology
•  Method R

What
you
can
do

1.  Know what's now possible on modern systems
–  Dynamic tracing: efficiently instrument any software
–  CPU facilities: PMCs, MSRs (model specific registers)
–  Visualizations: flame graphs, latency heat maps, …
2.  Ask questions first: use methodologies to ask them
3.  Then find/build the metrics
4.  Build or buy dashboards to support methodologies

Dynamic
Tracing:
Efficient
Metrics

send

receive

tcpdump

Kernel

1.
read

2.
dump

Analyzer
1.
read

2.
process

3.
print

disks

Old way: packet capture
New way: dynamic tracing
Tracer
1.
configure

2.
read

tcp_retransmit_skb()

Eg, tracing TCP retransmits
file
system

buffer

Dynamic
Tracing:
Measure
Anything

Those are Solaris/DTrace tools. Now becoming possible on all OSes:
FreeBSD & OS X DTrace, Linux BPF, Windows ETW

Performance
Monitoring
Counters

Eg, FreeBSD PMC groups for Intel Sandy Bridge:

Visualiza0ons

Eg, Disk I/O latency as a heat map, quantized in kernel:

USE
Method:
eg,
Neklix
Vector

u0liza0on
satura0on
CPU:

u0liza0on
satura0on
Network:
load

u0liza0on
satura0on
Memory:

load
satura0on
Disk:
u0liza0on

USE
Method:
To
Do

Showing what is and is not commonly measured
U
S
E

U
S
E

U
S
E

U
S
E

U
S
E

U
S
E

U
S
E

U
S
E
U
S
E
U
S
E
U
S
E

CPU
Workload
Characteriza0on:
To
Do

Who
How What
Why
top,
htop perf record -g
ﬂame
Graphs

monitoring
perf stat -a -d
Showing what is and is not commonly measured

Summary

•  It is the crystal ball age of performance observability
•  What matters is the questions you want answered
•  Methodologies are a great way to pose questions

References
&
Resources

•  USE Method
–  http://queue.acm.org/detail.cfm?id=2413037
–  http://www.brendangregg.com/usemethod.html
•  TSA Method
–  http://www.brendangregg.com/tsamethod.html
•  Off-CPU Analysis
–  http://www.brendangregg.com/offcpuanalysis.html
–  http://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html
–  http://www.brendangregg.com/blog/2016-02-05/ebpf-chaingraph-prototype.html
•  Static Performance Tuning, Richard Elling, Sun blueprint, May 2000
•  RED Method: http://www.slideshare.net/weaveworks/monitoring-microservices
•  Other system methodologies
–  Systems Performance: Enterprise and the Cloud, Prentice Hall 2013
–  http://www.brendangregg.com/methodology.html
–  The Art of Computer Systems Performance Analysis, Jain, R., 1991
•  Flame Graphs
–  http://www.brendangregg.com/flamegraphs.html
–  http://techblog.netflix.com/2015/07/java-in-flames.html
•  Latency Heat Maps
–  http://www.brendangregg.com/HeatMaps/latency.html
•  ARPA Network: http://www.computerhistory.org/internethistory/1960s
•  RSTS/E System User's Guide, 1985, page 4-5
•  DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X, and FreeBSD, Prentice Hall 2011
•  Apollo: http://www.hq.nasa.gov/office/pao/History/alsj/a11 http://www.hq.nasa.gov/alsj/alsj-LMdocs.html

Feb

2016

•  Questions?
•  http://slideshare.net/brendangregg
•  http://www.brendangregg.com
•  bgregg@netflix.com
•  @brendangregg
Jun,
2016
ACM Applicative 2016

ACM Applicative System Methodology 2016

More Related Content

ACM Applicative System Methodology 2016