Talk for MacIT 2014. This talk is about systems performance on OS X, and introduces the USE Method to check for common performance bottlenecks and errors. This methodology can be used by beginners and experts alike, and begins by constructing a checklist of the questions we’d like to ask of the system, before reaching for tools to answer them. The focus is resources: CPUs, GPUs, memory capacity, network interfaces, storage devices, controllers, interconnects, as well as some software resources such as mutex locks. These areas are investigated by a wide variety of tools, including vm_stat, iostat, netstat, top, latency, the DTrace scripts in /usr/bin (which were written by Brendan), custom DTrace scripts, Instruments, and more. This is a tour of the tools needed to solve our performance needs, rather than understanding tools just because they exist. This talk will make you aware of many areas of OS X that you can investigate, which will be especially useful for the time when you need to get to the bottom of a performance issue.
Report
Share
Report
Share
1 of 68
Download to read offline
More Related Content
Analyzing OS X Systems Performance with the USE Method
1. Analyzing
OS X Systems Performance
with the
USE Method
Brendan Gregg, Senior Performance Architect, Netflix
March, 2014
2. Device Interconnect (PCIe/USB)
Interface
Transports
I/O Controller Network Controller
FSB
Memory!
Bus
CPU
Northbridge
Darwin Operating System HardwareXNUKernel
Applications
OSFMKBSD
Ethernet
IP
TCP/UDP
Block Devices
HFS+/...
VFS Sockets
System Libraries
System Call Interface Scheduler
Virtual!
Memory
I/O Kit
Find the Bottleneck
Disk Disk Port Port
DRAM
Southbridge
DMI
GPU
Other Devices
Device Drivers
3. This Talk
• Summarizes casual to serious performance analysis of OS X
• From the systems perspective, not the application
• Many application issues can be found easily this way
• Covering not just current tools, but suggestions for future work
• May change how you think about performance!
4. whoami
• Senior Performance Architect at Netflix
• Primary author of the DTrace book
• Wrote many DTrace scripts included with OS X.
Eg: dtruss, iosnoop, iotop, opensnoop, execsnoop,
procsystime, bitesize.d, seeksize.d, setuids.d, etc...
• These were ported and enhanced by Apple engineering (thanks!)
• Created the USE method and USE method checklist for OS X
5. • The Tools Method
• The USE Method
• Future work
Agenda
7. The Tools Method
• A tool-based performance analysis approach, commonly followed
today. For reference, I've called it the "Tools Method".
• 1. List available performance tools
• 2. For each tool, list its useful metrics
• 3. For each metric, list possible interpretation
• Simple, useful, but analysis is limited to what the tools provide easily
8. Tool Examples
• Activity Monitor
• atMonitor, Temperature Monitor Lite
• Command Line
• DTrace
• Instruments
9. Activity Monitor
• High level process and
system sumaries.A GUI
version of top(1)
• Table shows processes
by %CPU, memory
• CPU load over time
• Quit, info, and system
diagnosis buttons
10. Activity Monitor
Network
• Quick way to see current
and recent network
throughput
• Like the CPU summary,
shows aggregate device
stats, and not per-device
11. Activity Monitor
CPU Usage
• Per-CPU utilization from previous 0.5 - 5 seconds (tunable)
• Handy to leave running. Look for single hot CPUs/threads
12. Activity Monitor
Floating CPU Window
• Earlier OS X also had a compact version (gone in Mavericks)
• Was nice, but what I really want is a compact visualization for both
per-CPU and historical data
13. Activity Monitor
CPU/Disk Suggestion
• Could show both per-device and history using a utilization heat map:
• http://dtrace.org/blogs/brendan/2011/12/18/visualizing-device-utilization/
14. Activity Monitor
Sample Process
• The cog button ("System
diagnostics options") has a
"Sample process" option for
profiling CPU code paths
• Explains %CPU usage
• Although output usually very
long and time consuming to
read (see scroll bar):
15. Activity Monitor
Flame Graphs ?
• Suggestion: include a
Flame Graph view
• Visualizes entire
profile output in
one screen
• http://github.com/
brendangregg/
FlameGraph
16. atMonitor
• 3rd party app. Version 2.7b crashes for me if "Top Window" is visible.
• Shows many useful metrics: per-CPU, RAM, GPU, per-disk, and per-
network interface utilization perentages with histories.
• Currently the easiest way to see GPU, disk, and network utilization.
• Utilization is easy to interpret. I/O per second is not.
17. Temperature Monitor Lite
• Another 3rd party application
• Easy way to infer GPU utilization
• Normal:
• Video:
18. Command Line
• Accessed via the Terminal application
• Numerous performance tools available, from UNIX/BSD/OSX
• Eg, the uptime(1) command shows recent and historic CPU load:
$ uptime
14:36 up 43 days, 2:39, 30 users, load averages: 0.72 1.02 1.29
• There numbers are the 1, 5, and 15 minute load averages.Values are
really constants in an exponential decay moving sum.
• Interpret: if average > number of CPUs, then CPUs are overloaded
25. Device Interconnect (PCIe/USB)
Interface
Transports
I/O Controller Network Controller
FSB
Memory!
Bus
CPU
Northbridge
Darwin Operating System Hardware
XNUKernel
Applications
OSFMKBSD
Ethernet
IP
TCP/UDP
Block Devices
HFS+/...
VFS Sockets
System Libraries
System Call Interface Scheduler
Virtual!
Memory
Device Drivers
I/O Kit
Disk Disk Port Port
DRAM
Southbridge
DMI
GPU
Other Devices
Observability So Far...
iostat
netstat
netstat
ActivityMonitor
atMonitor
vm_stat
top
ActivityMonitor
atMonitor
top
ActivityMonitor
Temp.Monitor
tcpdump
26. DTrace
• Programmable, real-time, dynamic
and static tracing
• Write your own one-liners and
scripts, or use other people's;
including those in /usr/bin
• There is a great book about it...
27. DTrace: Scripts
$ man -k dtrace
bitesize.d(1m) - analyse disk I/O size by process. Uses DTrace
cpuwalk.d(1m) - Measure which CPUs a process runs on. Uses DTrace
creatbyproc.d(1m) - snoop creat()s by process name. Uses DTrace
dappprof(1m) - profile user and lib function usage. Uses DTrace
dapptrace(1m) - trace user and library function usage. Uses DTrace
diskhits(1m) - disk access by file offset. Uses DTrace
dispqlen.d(1m) - dispatcher queue length by CPU. Uses DTrace
dtrace(1) - generic front-end to the DTrace facility
dtruss(1m) - process syscall details. Uses DTrace
errinfo(1m) - print errno for syscall fails. Uses DTrace
execsnoop(1m) - snoop new process execution. Uses DTrace
[...]
• Over 40 DTrace scripts are shipped with OS X (which I mostly wrote
originally). Listing them:
28. DTrace: iosnoop
$ iosnoop
UID PID D BLOCK SIZE COMM PATHNAME
503 176 R 148471184 8192 SystemUIServer ??/vm/swapfile10
503 176 R 835310312 4096 SystemUIServer ??/vm/swapfile4
503 92489 W 746204600 61440 Google Chrome ??/Chrome/.com.google.Chrome.hw1Inp
503 92489 W 746204720 23472 Google Chrome ??/Default/.com.google.Chrome.76k4tG
0 19 W 425711304 4096 syslogd ??/DiagnosticMessages/2014.02.14.asl
0 19 W 57246896 4096 syslogd ??/DiagnosticMessages/StoreData
0 19 W 425710304 4096 syslogd ??/DiagnosticMessages/2014.02.14.asl
503 52617 W 214894232 4096 firefox ??/iw4rbel9.default/_CACHE_CLEAN_
0 19 W 57246896 4096 syslogd ??/DiagnosticMessages/StoreData
0 19 W 425710304 4096 syslogd ??/DiagnosticMessages/2014.02.14.asl
[...]
• iosnoop(1m): trace block device I/O
• Identify processes and files causing disk I/O
29. DTrace: hfsslower.d
$ ~/dtbook_scripts/Chap5/hfsslower.d 10
TIME PROCESS D KB ms FILE
2014 Feb 14 17:35:59 Terminal R 5751 16 data.data
2014 Feb 14 17:35:59 Terminal R 6166 17 data.data
2014 Feb 14 17:35:59 Terminal W 11921 15 data.data
[...]
• hfsslower.d: trace HFS calls slower than a threshold. Eg, 10 ms:
• Traces all application I/O to the file system, not just disk I/O
• Script is on http://www.dtracebook.com
30. DTrace: execsnoop
$ execsnoop -v
STRTIME UID PID PPID ARGS
2014 Feb 14 19:40:55 503 94835 551 man
2014 Feb 14 19:40:55 503 94835 551 man
2014 Feb 14 19:40:55 503 94841 94837 groff
2014 Feb 14 19:40:55 503 94839 94837 tbl
2014 Feb 14 19:40:55 503 94840 94838 cat
2014 Feb 14 19:40:56 503 94845 94841 grotty
2014 Feb 14 19:40:56 503 94844 94841 troff
2014 Feb 14 19:40:56 503 94843 94842 less
2014 Feb 14 19:40:58 503 94846 92489 Google Chrome He
2014 Feb 14 19:41:03 503 94847 92489 Google Chrome He
[...]
• execsnoop(1m): trace process execution
• Shows what programs are launched
32. DTrace: sotop
$ sotop
PROCESS PID READS WRITES READ_KB WRITE_KB CPU
kernel_task 0 0 0 0 0 475
firefox 52617 205 14 84 22 118
Terminal 165 0 0 0 0 35
WindowServer 89 0 0 0 0 34
SIDPLAY 51232 0 0 0 0 31
Google Chrome H 92513 6 12 0 1 14
Google Chrome H 94477 2 1 0 0 13
clear 94909 0 0 0 0 13
Google Chrome 92489 16 5 0 0 12
sh 94909 0 0 0 0 12
[...]
• sotop: summarize socket I/O by-process, top-style:
• Also from the DTrace book.
33. Instruments
• Advanced analysis GUI
• Includes many
"Instruments", which
profile applications
in different ways:
• Data sources include
DTrace, CPU counters
35. Instruments
Low Level CPU Counters
• Performance monitor
counter (PMC) and
performance monitor
interrupts can be
instrumented
• Hard work, but can be
used to understand
bus and interconnect
activity
36. Device Interconnect (PCIe/USB)
Interface
Transports
I/O Controller Network Controller
FSB
Memory!
Bus
CPU
Northbridge
XNUKernel
Applications
OSFMKBSD
Ethernet
IP
TCP/UDP
Block Devices
HFS+/...
VFS Sockets
System Libraries
System Call Interface Scheduler
Virtual!
Memory
Device Drivers
I/O Kit
Disk Disk Port Port
DRAM
Southbridge
DMI
GPU
Other Devices
Observability So Far...
iostat
netstat
netstat
ActivityMonitor
atMonitor
vm_stat
top
ActivityMonitor
atMonitor
top
ActivityMonitor
Temp.Monitor
dtruss
hfsslower
iosnoop
execsnoopsotop
tcpdump
Instruments
dtrace
Instruments
37. Tools Method in Practice
• Tools Method provides reasonable coverage
• Some observability gaps, some uneven coverage
• Can improve coverage by adding more tools: ps, ping, traceroute,
latency, df, sysctl, plockstat, opensnoop, dispqlen.d, runocc.d, nfsstat,
iopending, soconnect_mac.d, httpdstat.d, sc_usage, fs_usage, ...
• I could keep covering tools for the rest of this talk...
38. Device Interconnect (PCIe/USB)
Interface
Transports
I/O Controller Network Controller
FSB
Memory!
Bus
CPU
Northbridge
XNUKernel
Applications
OSFMKBSD
Ethernet
IP
TCP/UDP
Block Devices
HFS+/...
VFS Sockets
System Libraries
System Call Interface Scheduler
Virtual!
Memory
Disk Disk Port
DRAM
Southbridge
DMI
GPU
Other Devices
iostat
netstat
netstat
ActivityMonitor
atMonitor
vm_stat
top
ActivityMonitor
atMonitor
top,ps
ActivityMonitor
Temp.Monitor
opensnoop
hfsslower.d
df,nfstat
iosnoop
iopending
execsnoopsotop
tcpdump
dapptrace
ping
Port
traceroute
plockstat
dtruss,sc_usage
errinfo,kill.d
dispqlen.d
runocc.d
latency
soconnect_mac.d,soaccept_mac.d
httpdstat.d
fs_usage
priclass.d
pridist.d
bitesize.d
seeksize.d
maclife.d
macvfssnoop.d
Most DTrace scripts are in /usr/bin
Some are from my DTrace book
and are available online
dtrace
Instruments
Instruments
Custom Instruments using
CPU counters/interrupts can
be added for bus observability
I/O Kit
Device Drivers
39. The Focus on Tools
• Useful, however, learning tools & metrics becomes laborious.
• Still limited by what the tools provide, or provide easily.
• You can try to approach this in a different way...
42. The USE Method
• For every resource, check:
• 1. Utilization
• 2. Saturation
• 3. Errors
43. The USE Method
• For every resource, check:
• 1. Utilization: time resource was busy, or degree used
• 2. Saturation: degree of queued extra work
• 3. Errors: any errors
45. Hardware Resources
• CPUs
• Main Memory
• Network Interfaces
• Storage Devices
• Controllers, Interconnects
• Find the functional diagram and examine every item in the data path...
46. Device Interconnect (PCIe/USB)
Interface
Transports
I/O Controller Network Controller
FSB
Memory!
Bus
CPU
Northbridge
Hardware
Functional Diagram
Disk Disk Port Port
DRAM
Southbridge
DMI
GPU
Other Devices
• For each check:
• 1. Utilization
• 2. Saturation
• 3. Errors
47. USE Method Checklists
• Build a checklist for all combinations, identifying tools/metrics to use
49. OS X Checklist
Resource Type Metric
CPU Utilization
system-wide: iostat 1, "us" + "sy"; per-cpu: DTrace [1];Activity
Monitor → CPU Usage or Floating CPU Window; per-process:top
-o cpu, "%CPU";Activity Monitor → Activity Monitor, "%CPU"; ...
CPU Saturation
system-wide: uptime, "load averages" > CPU count; latency,
"SCHEDULER" and "INTERRUPTS"; per-cpu: dispqlen.d (DTT),
non-zero "value"; runocc.d (DTT), non-zero "%runocc"; per-
process: Instruments → Thread States, "On run queue"; DTrace [2]
CPU Errors
dmesg; /var/log/system.log; Instruments → Counters, for PMC and
whatever error counters are supported (eg, thermal throttling)
50. OS X Checklist
Resource Type Metric
CPU Utilization
system-wide: iostat 1, "us" + "sy"; per-cpu: DTrace [1];Activity
Monitor → CPU Usage or Floating CPU Window; per-process:top
-o cpu, "%CPU";Activity Monitor → Activity Monitor, "%CPU"; ...
CPU Saturation
system-wide: uptime, "load averages" > CPU count; latency,
"SCHEDULER" and "INTERRUPTS"; per-cpu: dispqlen.d (DTT),
non-zero "value"; runocc.d (DTT), non-zero "%runocc"; per-
process: Instruments → Thread States, "On run queue"; DTrace [2]
CPU Errors
dmesg; /var/log/system.log; Instruments → Counters, for PMC and
whatever error counters are supported (eg, thermal throttling)
51. OS X Checklist, cont.
Resource Type Metric
Memory
Capacity
Utilization
Memory
Capacity
Saturation
" Errors
52. OS X Checklist, cont.
Resource Type Metric
Memory
Capacity
Utilization
system-wide: vm_stat 1, main memory free = "free" + "inactive", in
units of pages;Activity Monitor → Activity Monitor → System
Memory, "Free" for main memory; per-process: top -o rsize,
"RSIZE" is resident main memory size, "VSIZE" is virtual memory
size; ps -alx, "RSS" is resident set size, "SZ" is virtual memory size;
ps aux similar (legacy format)
Memory
Capacity
Saturation
system-wide: vm_stat 1, "pageout"; per-process: anonpgpid.d
(DTT), DTrace vminfo:::anonpgin [3] (frequent anonpgin == pain);
Instruments → Memory Monitor, high rate of "Page Ins" and "Page
Outs"; sysctl vm.memory_pressure [4]
!
" Errors System Information → Hardware → Memory, "Status" for physical
failures; DTrace failed malloc()s
53. OS X Checklist, cont.
Resource Type Metric
Memory
Capacity
Utilization
system-wide: vm_stat 1, main memory free = "free" + "inactive", in
units of pages;Activity Monitor → Activity Monitor → System
Memory, "Free" for main memory; per-process: top -o rsize,
"RSIZE" is resident main memory size, "VSIZE" is virtual memory
size; ps -alx, "RSS" is resident set size, "SZ" is virtual memory size;
ps aux similar (legacy format)
Memory
Capacity
Saturation
system-wide: vm_stat 1, "pageout"; per-process: anonpgpid.d
(DTT), DTrace vminfo:::anonpgin [3] (frequent anonpgin == pain);
Instruments → Memory Monitor, high rate of "Page Ins" and "Page
Outs"; sysctl vm.memory_pressure [4]
!
" Errors System Information → Hardware → Memory, "Status" for physical
failures; DTrace failed malloc()s
54. OS X Checklist, cont.
• Full list: http://www.brendangregg.com/USEmethod/use-macosx.html
• Includes
references
from earlier
tables
55. Software Resources
• Can be studied using USE metrics as well, if possible
• OS X Checklist includes some example software resources:
• Processes, file descriptors, kernel mutexes, user-level mutexes
56. Mutex Lock
• Can you think of what these could mean for a mutex lock?:
• Utilization
• Saturation
• Errors
57. Mutex Lock
• Can you think of what these could mean for a mutex lock?:
• Utilization: held time per second
• Saturation: measure of contention time or waiters
• Errors: EDEADLK, EINVAL
60. USE Method Tools
• Tools can be developed to fetch USE metrics more easily
• Especially for busses and interconnects
• Would love to see USE metrics in Activity Monitor
61. USE Method New Uses
• Can be applied new areas, developing new metrics
• May not always work, but worth trying
• Find a functional diagram of your system, application, or environment,
and look for U.S.E. metrics for each component
62. Device Interconnect (PCIe/USB)
Interface
Transports
I/O Controller Network Controller
FSB
Memory!
Bus
CPU
Northbridge
Darwin Operating System HardwareXNUKernel
Disk Disk Port Port
DRAM
Southbridge
DMI
GPU
Other!
Devices
USE Metrics for all of:
Applications
OSFMKBSD
Ethernet
IP
TCP/UDP
Block Devices
HFS+/...
VFS Sockets
System Libraries
System Call Interface Scheduler
Virtual!
Memory
I/O Kit
Device Drivers
63. Stranger Example:TCP
$ netstat -s
tcp:
80444499 packets sent
28706719 data packets (3613656050 bytes)
76599 data packets (65712152 bytes) retransmitted
68 resends initiated by MTU discovery
41687640 ack-only packets (248964 delayed)
0 URG only packets
0 window probe packets
9286129 window update packets
707685 control packets
0 data packets sent after flow control
177149270 packets received
16296459 acks (for 3602941580 bytes)
556237 duplicate acks
0 acks for unsent data
154775303 packets (1214952475 bytes) received in-sequence
200501 completely duplicate packets (151553377 bytes)
1884 old duplicate packets
79 packets with some dup. data (17270 bytes duped)
6102493 out-of-order packets (4236017281 bytes)
67 packets (0 bytes) of data after window
0 window probes
14180 window update packets
72825 packets received after close
85 bad resets
0 discarded for bad checksums
0 discarded for bad header offset fields
0 discarded because packet too short
378961 connection requests
613 connection accepts
37 bad connection attempts
0 listen queue overflows
332688 connections established (including accepts)
381180 connections closed (including 13038 drops)
14527 connections updated cached RTT on close
14527 connections updated cached RTT variance on close
5495 connections updated cached ssthresh on close
1721 embryonic connections dropped
16204052 segments updated rtt (of 8674926 attempts)
374184 retransmit timeouts
4465 connections dropped by rexmit timeout
0 connections dropped after retransmitting FIN
91 persist timeouts
0 connections dropped by persist timeout
12784 keepalive timeouts
262 keepalive probes sent
1214 connections dropped by keepalive
1312411 correct ACK header predictions
152849516 correct data packet header predictions
17244 SACK recovery episodes
21329 segment rexmits in SACK recovery episodes
25852298 byte rexmits in SACK recovery episodes
180630 SACK options (SACK blocks) received
5682514 SACK options (SACK blocks) sent
0 SACK scoreboard overflow
[...]
• "netstat -s" output has over 50 metrics
for TCP
• Do you understand them all?
• Could USE metrics provide a high level
summary, treating TCP as a software
resource? (might be a stretch)
64. USE Method:TCP
• TCP as a software resource metrics:
• Utilization
• Saturation
• Errors
65. USE Method:TCP
• TCP as a software resource metrics:
• Utilization: time data was buffered per second
• Saturation: listen queue overflows
• Errors: bad connection attempts, bad resets, bad checksums, ...
• I think I'd classify retransmits and duplicates as errors.
66. Other Methodologies
• Other methodologies include:
• Drill Down Analysis Method
• Workload Characterization
• Thread State Analysis (TSA) Method
• These too can pose questions that tools then answer