How to Stop Worrying and Start Caching in Java

How to Stop Worrying and
Start Caching in Java
SriSatish Ambati , Azul Systems
sris@azulsystems.com
Manik Surtani, RedHat Inc
msurtani@redhat.com

The Trail
Examples
Elements of Cache Performance
Theory
Metrics
200GB Cache Design
Whodunit
Overheads in Java – Objects, GC
Locks,
Communication, Sockets, Queues (SEDA)
Serialization
Measure

Wie die wahl hat, hat die Qual!
He who has the choice has the agony!

Some example caches
Homegrown caches – Surprisingly work well.
(do it yourself! It’s a giant hash)
Infinispan, Coherence, Gemstone, GigaSpaces,
EhCache, etc
NoSQL stores (Apache Cassandra)
Non java alternatives: MemCached & clones.

Visualize Cache
Simple example
Visualize Cache
Replicated Cache Distributed Cache

Elements of Cache Performance
Hot or Not: The 80/20 rule.
A small set of objects are very popular!
Hit or Miss: Hit Ratio
How effective is your cache?
LRU, LFU, FIFO, LIRS.. Expiration
Long-lived objects, better locality
Spikes happen
Cascading events: Node load, Node(s) dead.
Cache Thrash: Full Table scan.

Elements of Cache Performance : Metrics
Inserts: Puts/sec, Latencies
Reads: Gets/sec, Latencies, Indexing
Updates: mods/sec, latencies (Locate, Modify & Notify)
Replication, Consistency, Persistence
Size of Objects
Number of Objects
Size of Cache
# of cacheserver Nodes (read only, read write)
# of clients

Partitioning & Distributed Caches
Near Cache/L1 Cache
Bring data close to the Logic that is using it.
Birds of feather flock together - related data live closer
Read-only nodes, Read-Write nodes
Management nodes
Communication Costs
Balancing (buckets)
Serialization (more later)

I/O considerations
Asynchronous
Sockets
Queues & Threads serving the sockets
Bandwidth
Persistence –
File, DB (CacheLoaders)
Write Behind
Data Access Patterns of Doom, ex:
“Death by a million cuts” – Batch your reads.

Buckets–Partitions, Hashing function
Birthdays, Hashmaps & Prime Numbers
Collisions, Chaining
Unbalanced HashMap
- behaves like a list O(n) retrieval
Partition-aware Hashmaps
Non-blocking Hashmaps
(see: locking)
Performance Degrades with 80% table density

How many nodes to get a 200G cache?
Who needs a 200G cache?
Disk is the new Tape!
200 nodes @ 1GB heap each
2 nodes @ 100GB heap each
(plus overhead)

SIDE ONE
Join together with the band
I don’t even know myself
SIDE TWO
Let’s see action
Relay
Don’t happen that way at all
The seeker

Java Limits: Objects are not cheap!
How many bytes for a 8 char String ?
(assume 32-bit)
How many objects in a Tomcat idle instance?
char[]
String
book keeping fields
12 bytes
JVM Overhead
16 bytes
Pointer
4 bytes
data
16 bytes
JVM Overhead
16 bytes
A. 64bytes
31% overhead
Size of String
Varies with JVM

Picking the right collection: Mozart or Bach?
100 elements of:
Treemap <Double, Double>
82% overhead, 88 bytes constant cost per
element
[pro: enables updates while maintaining order]
double[], double[] –
2% overhead, amortized
[con: load-then-use]
Sparse collections, empty
collections. wrong collections.
TreeMap
Fixed Overhead: 48 bytes
TreeMap$Entry
data
Per-entry Overhead: 40 bytes
Double doubl
e
*From one 32-bit JVM.
Varies with JVM Architecture
Double
JVM Overhead
16 bytes
data
8 bytes
doubl
e

JEE is not cheap either!
Class name Size (B) Count Avg (B)
Total 21,580,592 228,805 94.3
char[] 4,215,784 48,574 86.8
byte[] 3,683,984 5,024 733.3
Built-in VM methodKlass 2,493,064 16,355 152.4
Built-in VM constMethodKlass 1,955,696 16,355 119.6
Built-in VM constantPoolKlass 1,437,240 1,284 1,119.30
Built-in VM instanceKlass 1,078,664 1,284 840.1
java.lang.Class[] 922,808 45,354 20.3
Built-in VM constantPoolCacheKlass 903,360 1,132 798
java.lang.String 753,936 31,414 24
java.lang.Object[] 702,264 8,118 86.5
java.lang.reflect.Method 310,752 2,158 144
short[] 261,112 3,507 74.5
java.lang.Class 255,904 1,454 176
int[][] 184,680 2,032 90.9
java.lang.String[] 173,176 1,746 99.2
java.util.zip.ZipEntry 172,080 2,390 72
Apache Tomcat 6.0
Allocated
Class name Size (B) Count Avg (B)
Total 1,410,764,512 19,830,135 71.1
char[] 423,372,528 4,770,424 88.7
byte[] 347,332,152 1,971,692 176.2
int[] 85,509,280 1,380,642 61.9
java.lang.String 73,623,024 3,067,626 24
java.lang.Object[] 64,788,840 565,693 114.5
java.util.regex.Matcher 51,448,320 643,104 80
java.lang.reflect.Method 43,374,528 301,212 144
java.util.HashMap$Entry[] 27,876,848 140,898 197.9
java.util.TreeMap$Entry 22,116,136 394,931 56
java.util.HashMap$Entry 19,806,440 495,161 40
java.nio.HeapByteBuffer 17,582,928 366,311 48
java.nio.HeapCharBuffer 17,575,296 366,152 48
java.lang.StringBuilder 15,322,128 638,422 24
java.util.TreeMap$EntryIterator 15,056,784 313,683 48
java.util.ArrayList 11,577,480 289,437 40
java.util.HashMap 7,829,056 122,329 64
java.util.TreeMap 7,754,688 107,704 72
Million Objects
allocated live
JBoss 5.1 20 4
Apache Tomcat 6.0 0.25 0.1
Live
JBoss 5.1
Allocated

Java Limits: Garbage Collection
GC defines cache configuration
Pause Times: If stop_the_world_pause > time_to_live
⇒
node is declared dead
Allocation Rate: Write, Insertion Speed.
Live Objects (residency)
if residency > 50%. GC overheads dominate.
Increasing Heap Size only increases pause times.
64-bit is not going to rescue us either:
Increases object header, alignment & pointer overhead
40-50% increase in heap sizes for same workloads.
Overheads – cycles spent GC (vs. real work); space

Fragmentation, Generations
Fragmentation – compact often, uniform sized objects
[Finding seats for a gang-of-four
is easier in an empty theater!]
Face your fears, Face them Often!
Generational Hypothesis
Long-lived objects promote often,
inter-generational pointers, more old-gen collections.
Entropy: How many flags does it take to tune your GC ?
⇒
Avoid OOM, configure node death if OOM
⇒
Shameless plug: Azul’s Pauseless GC (now software edition) ,
Cooperative-Memory (swap space for your jvm under spike: No more
OOM!)

Locks: Why Amdahl’s law trumps Moore’s!
Schemes
Optimistic, Pessimistic
Consistency
Eventually vs. ACID
Contention, Waits
java.util.concurrent, critical sections: Use Lock Striping
MVCC, Lock-free, wait-free DataStructures. (NBHM)
Transactions are expensive
⇒
Reduce JTA abuse, Set the right isolation levels.

Inter-node communication
• TCP for mgmt & data
– Infinispan
• TCP for mgmt, UDP for data
– Coherence, Infinispan
• UDP for mgmt, TCP for data
– Cassandra, Infinispan
• Instrumentation
– EHCache/Terracotta
• Bandwidth & Latency considerations
⇒Ensure proper network configuration in the kernel
⇒Run Datagram tests
⇒Limit number of management nodes & nodes

Sockets, Queues, Threads, Stages
How many sockets?
gemstone (VMWare) : Multi-socket implementation,
infinispan
alt: increase ports, nodes, clients
How many threads per socket? Mux
Asynchronous IO/Events (apache mina, jgroups)
Curious Case of a single threaded Queue Manager

Reduce context switching

SEDA

Marshal Arts:
Serialization
/Deserialization
java.io.Serializable is S.L..O.…W
+ Use “transient”
+ jserial, avro, etc
+ Google Protocol Buffers,
PortableObjectFormat (Coherence)
+ JBossMarshalling
+ Externalizable + byte[]
+ Roll your own

Serialization + Deserialization uBench
http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2

Count what is countable, measure what is measurable, and
what is not measurable, make measurable
-Galileo

Latency:
Where have all the millis gone?
Measure. 90th
percentile. Look for consistency.
=> JMX is great! JMX is also very slow.
Reduced number of nodes means less MBeans!
Monitor (network, memory, cpu), ganglia,
Know thyself: Application Footprint, Trend data.

Q&A
References:
Making Sense of Large Heaps, Nick Mitchell, IBM
Oracle Coherence 3.5, Aleksandar Seovic
Large Pages in Java http://andrigoss.blogspot.com/2008/02/jvm-performance-tuning.html
Patterns of Doom http://3.latest.googtst23.appspot.com/
Infinispan Demos http://community.jboss.org/wiki/5minutetutorialonInfinispan
RTView, Tom Lubinski, http://www.sl.com/pdfs/SL-BACSIG-100429-final.pdf
Google Protocol Buffers, http://code.google.com/p/protobuf/
Azul’s Pauseless GC http://www.azulsystems.com/technology/zing-virtual-machine
Cliff Click’s Non-Blocking Hash Map http://sourceforge.net/projects/high-scale-lib/
JVM Serialization Benchmarks:
http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2

Optimization hinders evolution
– Alan Perlis

How to Stop Worrying and Start Caching in Java

More Related Content

How to Stop Worrying and Start Caching in Java

Editor's Notes