Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Java Collections
The Force Awakens
Darth @RaoulUK
Darth @RichardWarburto
#javaforceawakens
Evolution can be interesting ...
Java 1.2 Java 10?
Collection API Improvements
Persistent & Immutable Collections
Performance Improvements
Collection bugs
1. Element access (Off-by-one error, ArrayOutOfBound)
2. Concurrent modification
3. Check-then-Act
Scenario 1
List<String> jedis = new ArrayList<>(asList("Luke", "yoda"));
for (String jedi: jedis) {
if (Character.isLowerCase(jedi.charAt(0))) {
jedis.remove(jedi);
}
}
Scenario 2
Map<String, BigDecimal> movieViews = new HashMap<>();
BigDecimal views = movieViews.get(MOVIE);
if(views != null) {
movieViews.put(MOVIE, views.add(BigDecimal.ONE));
}
views != nullmoviesViews.get movieViews.put
Then
Check Act
Reducing scope for bugs
● ~280 bugs in 28 projects including Cassandra, Lucene
● ~80% check-then-act bugs discovered are put-if-absent
● Library designers can help by updating APIs as new idioms emerge
● Different data structures can provide alternatives by restricting reads &
updates to reduce scope for bugs
CHECK-THEN-ACT Misuse of Java Concurrent Collections
http://dig.cs.illinois.edu/papers/checkThenAct.pdf
Java 9 API updates
Collection factory methods
● Non-goal to provide persistent immutable collections
● http://openjdk.java.net/jeps/269
Live Demo using jShell
http://iteratrlearning.com/java9/2016/11/09/java9-collection-factory-methods
Collection API Improvements
Persistent & Immutable Collections
Performance Improvements
Categorising Collections
Mutable
Immutable
Non-Persistent Persistent
Unsynchronized Concurrent
Unmodifiable View
Available in
Core Library
Mutable
● Popular friends include ArrayList, HashMap, TreeSet
● Memory-efficient modification operations
● State can be accidentally modified
● Can be thread-safe, but requires careful design
Unmodifiable
List<String> jedis = new ArrayList<>();
jedis.add("Luke Skywalker");
List<String> cantChangeMe = Collections.unmodifiableList(jedis);
// java.lang.UnsupportedOperationException
//cantChangeMe.add("Darth Vader");
System.out.println(cantChangeMe); // [Luke Skywalker]
jedis.add("Darth Vader");
System.out.println(cantChangeMe); // [Luke Skywalker, Darth Vader]
Java collections  the force awakens
Immutable & Non-persistent
● No updates
● Flexibility to convert source in a more efficient representation
● No locking in context of concurrency
● Satisfies co-variant subtyping requirements
● Can be copied with modifications to create a new version (can be
expensive)
Immutable vs. Mutable hierarchy
ImmutableList MutableList
+ ImmutableList<T> toImmutable()
java.util.List
+ MutableList<T> toList()
Eclipse Collections (formaly GSCollections) https://projects.eclipse.org/projects/technology.collections/
ListIterable
Immutable and Persistent
● Changing source produces a new (version) of the collection
● Resulting collections shares structure with source to avoid full copying
on updates
LISP anyone?
Persistent List (aka Cons)
public final class Cons<T> implements ConsList<T> {
private final T head;
private final ConsList<T> tail;
public Cons(T head, ConsList<T> tail) {
this.head = head; this.tail = tail;
}
@Override
public ConsList<T> add(T e) {
return new Cons(e, this);
}
}
Updating Persistent List
A B C X Y Z
Before
Updating Persistent List
A B C X Y Z
Before
A B D
After
Blue nodes indicate new copies
Purple nodes indicates nodes we wish to update
Concatenating Two Persistent Lists
A B C
X Y Z
Before
Concatenating Two Persistent Lists
- Poor locality due to pointer chasing
- Copying of nodes
A B C
X Y Z
Before
A B C
After
Persistent List
● Structural sharing: no need to copy full structure
● Poor locality due to pointer chasing
● Copying becomes more expensive with larger lists
● Poor Random Access and thus Data Decomposition
Updating Persistent Binary Tree
Before
Updating Persistent Binary Tree
After
Persistent Array
How do we get the immutability benefits with performance of mutable
variants?
Trie
root
10 4520
3. Picking the right branch is done by using
parts of the key as a lookup
1. Branch factor
not limited to
binary
2. Leaf nodes
contain actual
values
a
a e
b
c
b c f
Persistent Array (Bitmapped Vector Trie)
... ...
... ...
... ...
... ...
.
.
.
.
.
.
1 31
0 1 31
Level 1 (root)
Level 2
Leaf nodes
Trade-offs
● Large branching factor facilitates iteration but hinders updates
● Small branching factor facilitates updates but hinders traversal
Java Persistent Collections
- Not available as part of Java Core Library
- Existing projects includes
- PCollections: https://github.com/hrldcpr/pcollections
- Port of Clojure DS: https://github.com/krukow/clj-ds
- Port of Scala DS: https://github.com/andrewoma/dexx
- Now also in Javaslang: http://javaslang.io
Memory usage survey
10,000,000 elements, heap < 32GB
int[] : 40MB
Integer[]: 160MB
ArrayList<Integer>: 215MB
PersistentVector<Integer>: 214MB (Clojure-DS)
Vector<Integer>: 206MB (Dexx, port of Scala-DS)
Data collected using Java Object Layout:
http://openjdk.java.net/projects/code-tools/jol/
Takeaways
● Immutable collections reduce the scope for bugs
● Always a compromise between programming safety and performance
● Performance of persistent data structure is improving
Collection API Improvements
Persistent & Immutable Collections
Performance Improvements
Java collections  the force awakens
O(N)
O(1)
O(HYPERSPACE)
Primitive specialised collections
● Collections often hold boxed representations of primitive values
● Java 8 introduced IntStream, LongStream, DoubleStream and
primitive specialised functional interfaces
● Other libraries, eg: Agrona, Koloboke and Eclipse-Collections provide
primitive specialised collections today.
● Valhalla investigates primitive specialised generics
Java 8 Lazy Collection Initialization
Many allocated HashMaps and ArrayLists never written to, eg Null object
pattern
Java 8 adds Lazy Initialization for the default initialization case
Typically 1-2% reduction in memory consumption
http://www.javamagazine.mozaicreader.com/MarApr2016/Twitter#&pageS
et=28&page=0
Java collections  the force awakens
HashMaps Basics
...
Han Solo
hash = 72309
Chewbacca
hash = 72309
Chaining Probing
HashMaps
a separate data
structure for
collision lookups
Store inline and
have a probing
sequence
Aliases: Palpatine vs Darth Sidious
Chaining Probing
HashMaps
aka Closed
Addressing
aka Open Hashing
aka Open
Addressing
aka Closed
Hashing
Chaining Probing
HashMaps
Linked List Based Tree Based
java.util.HashMap
Chaining Based HashMap
Historically maintained a LinkedList in the case of a collision
Problem: with high collision rates that the HashMap approaches O(N)
lookup
java.util.HashMap in Java 8
Starts by using a List to store colliding values.
Trees used when there are over 8 elements
Tree based nodes use about twice the memory
Make heavy collision lookup case O(log(N)) rather than O(N)
Relies on keys being Comparable
https://github.com/RichardWarburton/map-visualiser
So which HashMap is best?
Example Jar-Jar Benchmark
call get() on a single value for a map
of size 1
No model of the different factors that
affect things!
Tree Optimization - 60% Collisions
Tree Optimization - 10% Collisions
Probing vs Chaining
Probing Maps usually have lower memory consumption
Small Maps: Probing never has long clusters, can be up to 91% faster.
In large maps with high collision rates, probing scales poorly and can be
significantly slower.
Takeaways
There’s no clearcut “winner”.
JDK Implementations try to minimise worst case.
Linear Probing requires a good hashCode() distribution, Often hashmaps
“precondition” their hashes.
IdentityHashMap has low memory consumption and is fast, use it!
3rd Party libraries offer probing HashMaps, eg Koloboke & Eclipse-Collections.
Conclusions
Java collections  the force awakens
Any Questions?
www.iteratrlearning.com
● Modern Development with Java 8
● Reactive and Asynchronous Java
● Java Software Development Bootcamp
#javaforceawakens
Further reading
Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrays
https://infoscience.epfl.ch/record/64410/files/techlists.pdf
Smaller Footprint for Java Collections
http://www.lirmm.fr/~ducour/Doc-objets/ECOOP2012/ECOOP/ecoop/356.pdf
Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections
http://michael.steindorfer.name/publications/oopsla15.pdf
RRB-Trees: Efficient Immutable Vectors
https://infoscience.epfl.ch/record/169879/files/RMTrees.pdf
Further reading
Doug Lea’s Analysis of the HashMap implementation tradeoffs
http://www.mail-archive.com/core-libs-dev@openjdk.java.net/msg02147.html
Java Specialists HashMap article
http://www.javaspecialists.eu/archive/Issue235.html
Sample and Benchmark Code
https://github.com/RichardWarburton/Java-Collections-The-Force-Awakens

More Related Content

Java collections the force awakens

  • 1. Java Collections The Force Awakens Darth @RaoulUK Darth @RichardWarburto #javaforceawakens
  • 2. Evolution can be interesting ... Java 1.2 Java 10?
  • 3. Collection API Improvements Persistent & Immutable Collections Performance Improvements
  • 4. Collection bugs 1. Element access (Off-by-one error, ArrayOutOfBound) 2. Concurrent modification 3. Check-then-Act
  • 5. Scenario 1 List<String> jedis = new ArrayList<>(asList("Luke", "yoda")); for (String jedi: jedis) { if (Character.isLowerCase(jedi.charAt(0))) { jedis.remove(jedi); } }
  • 6. Scenario 2 Map<String, BigDecimal> movieViews = new HashMap<>(); BigDecimal views = movieViews.get(MOVIE); if(views != null) { movieViews.put(MOVIE, views.add(BigDecimal.ONE)); } views != nullmoviesViews.get movieViews.put Then Check Act
  • 7. Reducing scope for bugs ● ~280 bugs in 28 projects including Cassandra, Lucene ● ~80% check-then-act bugs discovered are put-if-absent ● Library designers can help by updating APIs as new idioms emerge ● Different data structures can provide alternatives by restricting reads & updates to reduce scope for bugs CHECK-THEN-ACT Misuse of Java Concurrent Collections http://dig.cs.illinois.edu/papers/checkThenAct.pdf
  • 8. Java 9 API updates Collection factory methods ● Non-goal to provide persistent immutable collections ● http://openjdk.java.net/jeps/269 Live Demo using jShell http://iteratrlearning.com/java9/2016/11/09/java9-collection-factory-methods
  • 9. Collection API Improvements Persistent & Immutable Collections Performance Improvements
  • 10. Categorising Collections Mutable Immutable Non-Persistent Persistent Unsynchronized Concurrent Unmodifiable View Available in Core Library
  • 11. Mutable ● Popular friends include ArrayList, HashMap, TreeSet ● Memory-efficient modification operations ● State can be accidentally modified ● Can be thread-safe, but requires careful design
  • 12. Unmodifiable List<String> jedis = new ArrayList<>(); jedis.add("Luke Skywalker"); List<String> cantChangeMe = Collections.unmodifiableList(jedis); // java.lang.UnsupportedOperationException //cantChangeMe.add("Darth Vader"); System.out.println(cantChangeMe); // [Luke Skywalker] jedis.add("Darth Vader"); System.out.println(cantChangeMe); // [Luke Skywalker, Darth Vader]
  • 14. Immutable & Non-persistent ● No updates ● Flexibility to convert source in a more efficient representation ● No locking in context of concurrency ● Satisfies co-variant subtyping requirements ● Can be copied with modifications to create a new version (can be expensive)
  • 15. Immutable vs. Mutable hierarchy ImmutableList MutableList + ImmutableList<T> toImmutable() java.util.List + MutableList<T> toList() Eclipse Collections (formaly GSCollections) https://projects.eclipse.org/projects/technology.collections/ ListIterable
  • 16. Immutable and Persistent ● Changing source produces a new (version) of the collection ● Resulting collections shares structure with source to avoid full copying on updates
  • 18. Persistent List (aka Cons) public final class Cons<T> implements ConsList<T> { private final T head; private final ConsList<T> tail; public Cons(T head, ConsList<T> tail) { this.head = head; this.tail = tail; } @Override public ConsList<T> add(T e) { return new Cons(e, this); } }
  • 19. Updating Persistent List A B C X Y Z Before
  • 20. Updating Persistent List A B C X Y Z Before A B D After Blue nodes indicate new copies Purple nodes indicates nodes we wish to update
  • 21. Concatenating Two Persistent Lists A B C X Y Z Before
  • 22. Concatenating Two Persistent Lists - Poor locality due to pointer chasing - Copying of nodes A B C X Y Z Before A B C After
  • 23. Persistent List ● Structural sharing: no need to copy full structure ● Poor locality due to pointer chasing ● Copying becomes more expensive with larger lists ● Poor Random Access and thus Data Decomposition
  • 26. Persistent Array How do we get the immutability benefits with performance of mutable variants?
  • 27. Trie root 10 4520 3. Picking the right branch is done by using parts of the key as a lookup 1. Branch factor not limited to binary 2. Leaf nodes contain actual values a a e b c b c f
  • 28. Persistent Array (Bitmapped Vector Trie) ... ... ... ... ... ... ... ... . . . . . . 1 31 0 1 31 Level 1 (root) Level 2 Leaf nodes
  • 29. Trade-offs ● Large branching factor facilitates iteration but hinders updates ● Small branching factor facilitates updates but hinders traversal
  • 30. Java Persistent Collections - Not available as part of Java Core Library - Existing projects includes - PCollections: https://github.com/hrldcpr/pcollections - Port of Clojure DS: https://github.com/krukow/clj-ds - Port of Scala DS: https://github.com/andrewoma/dexx - Now also in Javaslang: http://javaslang.io
  • 31. Memory usage survey 10,000,000 elements, heap < 32GB int[] : 40MB Integer[]: 160MB ArrayList<Integer>: 215MB PersistentVector<Integer>: 214MB (Clojure-DS) Vector<Integer>: 206MB (Dexx, port of Scala-DS) Data collected using Java Object Layout: http://openjdk.java.net/projects/code-tools/jol/
  • 32. Takeaways ● Immutable collections reduce the scope for bugs ● Always a compromise between programming safety and performance ● Performance of persistent data structure is improving
  • 33. Collection API Improvements Persistent & Immutable Collections Performance Improvements
  • 36. Primitive specialised collections ● Collections often hold boxed representations of primitive values ● Java 8 introduced IntStream, LongStream, DoubleStream and primitive specialised functional interfaces ● Other libraries, eg: Agrona, Koloboke and Eclipse-Collections provide primitive specialised collections today. ● Valhalla investigates primitive specialised generics
  • 37. Java 8 Lazy Collection Initialization Many allocated HashMaps and ArrayLists never written to, eg Null object pattern Java 8 adds Lazy Initialization for the default initialization case Typically 1-2% reduction in memory consumption http://www.javamagazine.mozaicreader.com/MarApr2016/Twitter#&pageS et=28&page=0
  • 39. HashMaps Basics ... Han Solo hash = 72309 Chewbacca hash = 72309
  • 40. Chaining Probing HashMaps a separate data structure for collision lookups Store inline and have a probing sequence
  • 41. Aliases: Palpatine vs Darth Sidious
  • 42. Chaining Probing HashMaps aka Closed Addressing aka Open Hashing aka Open Addressing aka Closed Hashing
  • 44. java.util.HashMap Chaining Based HashMap Historically maintained a LinkedList in the case of a collision Problem: with high collision rates that the HashMap approaches O(N) lookup
  • 45. java.util.HashMap in Java 8 Starts by using a List to store colliding values. Trees used when there are over 8 elements Tree based nodes use about twice the memory Make heavy collision lookup case O(log(N)) rather than O(N) Relies on keys being Comparable https://github.com/RichardWarburton/map-visualiser
  • 46. So which HashMap is best?
  • 47. Example Jar-Jar Benchmark call get() on a single value for a map of size 1 No model of the different factors that affect things!
  • 48. Tree Optimization - 60% Collisions
  • 49. Tree Optimization - 10% Collisions
  • 50. Probing vs Chaining Probing Maps usually have lower memory consumption Small Maps: Probing never has long clusters, can be up to 91% faster. In large maps with high collision rates, probing scales poorly and can be significantly slower.
  • 51. Takeaways There’s no clearcut “winner”. JDK Implementations try to minimise worst case. Linear Probing requires a good hashCode() distribution, Often hashmaps “precondition” their hashes. IdentityHashMap has low memory consumption and is fast, use it! 3rd Party libraries offer probing HashMaps, eg Koloboke & Eclipse-Collections.
  • 54. Any Questions? www.iteratrlearning.com ● Modern Development with Java 8 ● Reactive and Asynchronous Java ● Java Software Development Bootcamp #javaforceawakens
  • 55. Further reading Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrays https://infoscience.epfl.ch/record/64410/files/techlists.pdf Smaller Footprint for Java Collections http://www.lirmm.fr/~ducour/Doc-objets/ECOOP2012/ECOOP/ecoop/356.pdf Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections http://michael.steindorfer.name/publications/oopsla15.pdf RRB-Trees: Efficient Immutable Vectors https://infoscience.epfl.ch/record/169879/files/RMTrees.pdf
  • 56. Further reading Doug Lea’s Analysis of the HashMap implementation tradeoffs http://www.mail-archive.com/core-libs-dev@openjdk.java.net/msg02147.html Java Specialists HashMap article http://www.javaspecialists.eu/archive/Issue235.html Sample and Benchmark Code https://github.com/RichardWarburton/Java-Collections-The-Force-Awakens