Memory Tuning in Java
Memory Tuning in Java
TECHNICAL CONFERENCE
Bangalore, India
August 6 - 9, 2007
Rajeev Palanki
IBM Java Technology Center
rpalanki@in.ibm.com
High level overview of the Java Virtual Machine (JVM) and its key components.
Understanding of different Garbage Collection schemas in the Sovereign & J9 Virtual Machines and
their impact on JVM Runtime performance.
Knowledge about using verbosegc outputs effectively to improve application response times.
Familiarity with debugging and profiling tools available for JVM and their effective usage.
• Q&A
Java SDK
Class Libraries JIT
Virtual Machine
Big decimal
RAS
Java SDK
Class Libraries JIT
Security
ORB XML
GC
Virtual Machine
Native Code
Platform
• IBM have totally rewritten and redesigned our VM for Java 5.0 – You may have
heard this referred as J9
¾ GC algorithm
Critical to understand how it works so that tuning is done more intelligently.
•Key questions:
Most of the marking phase is done outside of ‘Stop the World’ when the ‘mutator’ threads are
still active giving a significant improvement in pause time.
The Mark and sweep workload is distributed across available processors resulting in a
significant improvement in pause times
• Incremental compaction
The expense of compaction is distributed across GCs leading to a reduction in (an occasional)
long pause time.
• Java 5 technologies
freelist
Native Heap
Size
Java Heap Next
Size
free storage
Next
Null
free storage
Motif structures
‘
Thread Stacks Xms - Active Area of Heap
Buffers
– Cache Allocation (for object allocations < 512 bytes), does not require Heap
Lock. Each thread has local storage in the heap (TLH – Thread Local Heap)
where the objects are allocated.
– Heap Lock Allocation (Heap Allocation occurs when the allocation request is
more than 512 bytes, requires Heap Lock.
Gotit:
Initialize object
Get out
Heap_UNLOCK
Users can specify the desired % of the Large Object Area using the Xloration option (where n determines the
fraction of heap designated for LOA.
(For example: Xloratio0.3 reserves 30% of the active heap for the Large Object Area)
• The subpool algorithm uses multiple free lists rather than the single free list
used by the default allocation scheme.
• While allocating objects on the heap, free chunks are chosen using a ″best
fit″ method, as against the ″first fit″ method used in other algorithms.
• It is enabled by the –Xgcpolicy:subpool option.
• Garbage Collection is Stop the World (All other application threads are suspended during
GC)
¾ Heap Lock
¾ Thread queue lock
¾ JIT lock
Mark phase
Process of identifying all objects reachable from the root set.
All “live” objects are marked by setting a mark bit in the mark bit vector.
Sweep phase
¾ Sweep phase identifies all the objects that have been allocated, but no longer
referenced.
Compaction (optional)
¾ Once garbage has been removed, we consider compacting the resulting set of
objects to remove spaces between them.
unreachable
object
Heap
• GC Helper threads
On a multiprocessor system with N CPUs, a JVM supporting parallel mode starts N-1
garbage collection helper threads at the time of initialization.
These threads remain idle at all times when the application code is running; they are
called into play only when garbage collection is active.
For a particular phase, work is divided between the thread driving the garbage
collection and the helper threads, making a total of N threads running in parallel on
an N-CPU machine.
The only way to disable the parallel mode is to use the -Xgcthreads parameter to
change the number of garbage collection helper threads being started.
• Parallel Mark
The basic idea is to augment object marking through the addition of helper threads
and a facility for sharing work between them.
Concurrent aims to complete the marking just before the before the heap is
full.
In the concurrent phase, the Garbage Collector scans the roots by asking each
thread to scan its own stack. These roots are then used to trace live objects
concurrently.
• Incremental compaction removes the dark matter from the heap and reduces pause times
significantly
• The fundamental idea behind incremental compaction is to split the heap up into sections and
compact each section just as during a full compaction.
• Incremental compaction was introduced in JDK 1.4.0; is enabled by default and triggered under
particular conditions. (Called Reasons)
Verbosegc redirection
-Xverbosegclog: <path to file> filename
• The first two lines are put out just before the beginning of STW phase of GC.
• Rest of messages are printed out after the STW phase ends and threads are
woken up. No messages are printed during GC.
• Heap shrinkage messages are printed before STW messages, but shrinkage
happens AFTER STW phase!
• What were the total and free heap sizes before GC?
Actions:
• Use verbosegc to guess ideal size of heap, and then tune using –Xmx and –Xms.
• Setting –Xms:
Should be big enough to avoid AFs from the time the application starts to the time it
becomes ‘ready’. (Should not be any bigger!)
• Setting –Xmx:
In the normal load condition, free heap space after each GC should be > minf (Default
is 30%).
There should not be any OutOfMemory errors.
In heaviest load condition, if free heap space after each GC is > maxf (Default is 70%),
heap size is too big.
GC is too frequent
<AF[25]: Allocation Failure. need 65552 bytes, 1 ms since last AF>
<AF[25]: managing allocation failure, action=2 (319456/10484224)>
<GC(25): GC cycle started Sat Mar 20 15:32:50 2004
<GC(25): freed 3968 bytes, 3% free (323424/10484224), in 11 ms>
<GC(25): mark: 5 ms, sweep: 0 ms, compact: 6 ms>
<GC(25): refs: soft 0 (age >= 32), weak 0, final 0, phantom 0>
<GC(25): moved 214 objects, 9352 bytes, reason=1>
<AF[25]: managing allocation failure, action=3 (323424/10484224)>
<AF[25]: managing allocation failure, action=4 (323424/10484224)>
<AF[25]: managing allocation failure, action=6 (323424/10484224)>
<AF[25]: warning! free memory getting short(1). (323424/10484224)>
<AF[25]: completed in 13 ms>
GC is too long
<AF[29]: Allocation Failure. need 2321688 bytes, 88925 ms since last AF>
<AF[29]: managing allocation failure, action=1 (3235443800/20968372736)
(3145728/3145728)>
<GC(29): GC cycle started Mon Nov 4 14:46:20 2002
<GC(29): freed 8838057824 bytes, 57% free (12076647352/20971518464), in
4749 ms>
<GC(29): mark: 4240 ms, sweep: 509 ms, compact: 0 ms>
<GC(29): refs: soft 0 (age >= 32), weak 0, final 1, phantom 0>
<AF[29]: completed in 4763 ms>
Too big heap = Too much GC pause time. (Irrespective of amount of physical memory on the
system)
Heap size > physical memory size = paging/swapping = bad for your application.
It is desirable to have the Xms much less than Xmx if you are encountering fragmentation issues.
This forces class allocations, thread and persistent objects to be allocated at the bottom of the
heap.
It may be good for a few apps which require constant high heap storage space.
• Caused by too many objects on the heap, especially deeply nested objects.
In the event of pause times being usually acceptable with the exception of a few
"abnormally high" spikes - we are likely to infer that the deviation was a result of
some system level activity (heavy paging for ex) outside of the Java process.
Consideration: How many clock ticks our process actually spent executing
instructions, not time spent waiting for I/O or time spent waiting for a CPU to
become available for the process to run on?
¾ No pinned/dosed objects
Stack Maps used to provide a level of indirection between references and heap
5.0 VM never pins arrays, it always makes a copy for the JNI code
Optthruput (default)
¾ Mark-sweep algorithm
¾ Fastest for many workloads
optavgpause
¾ Concurrent collection and concurrent sweep
¾ Small mean pause
¾ Throughput impact
gencon
¾ The Generational Hypothesis
¾ Fastest for transactional workloads
¾ Combines low pause times and high throughput
subpools
¾ Mark-sweep based, but with multiple freelists
¾ Avoids allocation contention on many-processor machines
¾ Available on all PPC and S390 platforms
Policy Considerations
optthruput I want my application to run to completion as quickly as possible.
JVM Heap
Nursery/Young Generation
Survivor Space
Allocate Space Allocate
Survivor Space
Space
• http://www.alphaworks.ibm.com/tech/gcdiag
Starting the tool and selecting verbosegc input file and appropriate JVM parser
Extract GCCollector.zip
Place jfreeChart-0.9.21.jar and jcommon-0.9.6.jar in the lib directory of the GCCollector folder
Execute GCCollector.bat (which will spawn a GUI)
Select verbosegc file for analysis
Select appropriate JVM parser
In the Tools
section of ISA,
select the Java
product plug-in to
display the
available tools
Click on the
name of a tool to
start that tool
The tool parses and plots verbose GC output and garbage collection traces (-Xtgc
output)
• The EVTK has built-in support for over forty different types of graphs
These are configured in the VGC Data menu
Options vary depending on the current dataset and the parsers and post-
processors that are enabled
• Some of the VGC graph types are:
• Used total heap • Free tenured heap (after collection)
• Pause times (mark-sweep-compact • Tenured heap size
collections) • Tenure age
• Pause times (totals, including exclusive • Free LOA (after collection)
access) • Free SOA (after collection)
• Compact times • Total LOA
• Weak references cleared • Total SOA
• Soft references cleared
• Note: Different graph types and a different menu are available for TGC
output
Heap occupancy
Pause times
2007 WebSphere Technical Conference (Bangalore, India)
55
© 2007 IBM Corporation
EVTK - Comparison & Advice
Compare runs…
Performance advisor…
2007 WebSphere Technical Conference (Bangalore, India)
56
© 2007 IBM Corporation
Java 5 Shared Classes
• Available on all platforms.
• Target: Server environments where multiple JVMs exist on the same box.
25
default
20
Seco n d s
15
-Xshareclasses
(Java 5.0)"
10
-Xshareclasses
5 (Java 6.0)
0
eclipse 3.2.2 tomcat 5.5 WAS 6.1
Lower is better
2007 WebSphere Technical Conference (Bangalore, India)
58
© 2007 IBM Corporation
What does real-time mean?
• Utilization
Percentage of time dedicated to the application in a
given window of time Time
Application
• Profiler Analyzer provides a powerful set of graphical and text-based views that
allow users to narrow down performance problems to a particular process,
thread, module, symbol, offset, instruction or source line.
– Supports time based system profiles
• http://www.alphaworks.ibm.com/tech/vpa
High level overview of the Java Virtual Machine (JVM) and its key
components.
Familiarity with debugging and profiling tools available for JVM and their
effective usage.
• www.ibm.com/developerworks/java/library/j-ibmjava2
• developers.sun.com/learning/javaoneonline/2007/pdf/TS-
2023.pdf
• http://www.ibm.com/developerworks/java/library/j-
ibmjava4/
• http://www-128.ibm.com/developerworks/java/library/j-
rtj1/index.html
• https://www-
950.ibm.com/events/IBMImpact/Impact2007/3977.pdf
Gracias
Merci
Russian Spanish
French
Obrigado
Thank You English
Brazilian Portuguese
Arabic
Danke
German
Japanese
Tamil Korean
Teşekkürler
turkish