Profilers find performance bottlenecks in your app but provide confusing information. Let's give you insights into how your profiler and your app are really interacting. What profiling APIs are available, how they work, and what their implementation on the JVM (OpenJDK) side looks like:
Stack sampling profilers: stop motion view of your app
GetCallTrace(JVisualVM case study): The official stack sampling API
Safepoints and safepoint sampling bias
AsyncGetCallTrace(Honest Profiler Case Study): The unofficial API
JVM Profilers vs System Profilers: No API needed?
1 of 64
Download to read offline
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
More Related Content
Jvm profiling under the hood
1. JVM Profiling
Under da Hood
Richard Warburton - @RichardWarburto
Nitsan Wakart - @nitsanw
8. CPU Profiling Limitations
● Finds CPU bound bottlenecks
● Many problems not CPU Bound
○ Networking
○ Database or External Service
○ I/O
○ Garbage Collection
○ Insufficient Parallelism
○ Blocking & Queuing Effects
15. Stack Trace Sampling
● JVMTI interface: GetCallTrace
○ Trigger a global safepoint(not on Zing)
○ Collect stack trace
● Large impact on application
● Samples only at safepoints
16. Example
private static void outer()
{
for (int i = 0; i < OUTER; i++)
{
hotMethod(i);
}
}
// https://github.com/RichardWarburton/profiling-samples
17. Example (2)
private static void hotMethod(final int i)
{
for (int k = 0; k < N; k++)
{
final int[] array = SafePointBias. array;
final int index = i % SIZE;
for (int j = index; j < SIZE; j++)
{
array[index] += array[j];
}
}
}
20. Whats a safepoint?
● Java threads poll global flag
○ At ‘uncounted’ loops back edge
○ At method exit/enter
● A safepoint poll can be delayed by:
○ Large methods
○ Long running ‘counted’ loops
○ BONUS: Page faults/thread suspension
30. AsyncGetCallTrace
● Used by Oracle Solaris Studio
● Adapted to open source prototype by
Google’s Jeremy Manson
● Unsupported, Undocumented …
Underestimated
31. SIGPROF - Interrupt Handlers
● OS Managed timing based interrupt
● Interrupts the thread and directly calls an
event handler
● Used by profilers we’ll be talking about
33. “You are in a maze of twisty little stack frames,
all alike”
34. AsyncGetCallTrace under the hood
● A Java thread is ‘possessed’
● You have the PC/FP/SP
● What is the call trace?
○ jmethodId - Java Method Identifier
○ bci - Byte Code Index -> used to find line number
35. Where Am I?
● Given a PC what is the current method?
● Is this a Java method?
○ Each method ‘lives’ in a range of addresses
● If not, what do we do?
36. Java Method? Which line?
● Given a PC, what is the current line?
○ Not all instructions map directly to a source line
● Given super-scalar CPUs what does PC
mean?
● What are the limits of PC accuracy?
37. “> I think Andi mentioned this to me last year --
> that instruction profiling was no longer reliable.
It never was.”
http://permalink.gmane.org/gmane.linux.kernel.perf.user/1948
Exchange between Brenden Gregg and Andi Kleen
38. Skid
● PC indicated will be >= to PC at sample time
● Known limitation of instruction profiling
● Leads to harder ‘blame analysis’
39. Limits of line number accuracy:
Line number (derived from BCI) is the closest
attributable BCI to the PC (-XX:+DebugNonSafepoint)
The PC itself is within some skid distance from
actual sampled instruction
40. ● Divided into frames
○ frame { sender*, stack*, pc }
● A single linked list:
root(null, s0, pc1) <- call1 (root, s1, pc2) <- call2(call1, s2, pc2)
● Convert to: (jmethodId,lineno)
The Stack
41. A typical stack
● JVM Thread runner infra:
○ JavaThread::run to JavaCalls::call_helper
● Interleaved Java frames:
○ Interpreted
○ Compiled
○ Java to Native and back
● Top frame may be Java or Native
42. Native frames
● Ignored, but need to navigate through
● Use a dedicated FP register to find sender
● But only if compiled to do so…
● Use a last remembered Java frame instead
See: http://duartes.org/gustavo/blog/post/journey-to-the-stack/
43. Java Compiled Frames
● C1/C2 produce native code
● No FP register: use set frame size
● Challenge: methods can move (GC)
● Challenge: methods can get recompiled
44. Java Interpreter frames
● Separately managed by the runtime
● Make an effort to look like normal frames
● Challenge: may be interrupted half-way
through construction...
45. Virtual Frames
● C1/C2 inline code (intrinsics/other methods)
● No data on stack
● Must use JVM debug info
46. AsyncGetCallTrace Limitations
● Only profiles running threads
● Accuracy of line info limited by reality
● Only reports Java frames/threads
● Must lookup debug info during call
47. Compilers: Friend or Fiend?
void safe_reset(void *start, size_t size) {
char *base = reinterpret_cast<char *>(start);
char *end = base + size;
for (char *p = base; p < end; p++) {
*p = 0;
}
}
48. Compilers: Friend or Fiend?
safe_reset(void*, unsigned long):
lea rdx, [rdi+rsi]
cmp rdi, rdx
jae .L3
sub rdx, rdi
xor esi, esi
jmp memset
.L3:
rep ret
49. Concurrency Bug
● Even simple concurrency bugs are hard to
spot
● Unspotted race condition in the ring buffer
● Spotted thanks to open source & Rajiv
Signal