Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
267 views

Parallel Functional Programming in Java 8

This document provides a 3-sentence summary of a presentation on parallel functional programming in Java 8: The presentation introduces functional programming concepts in Java 8 including lambda expressions, method references, and functional interfaces that allow functions as arguments; it describes the use of streams for bulk parallel data processing; and it provides examples of parallel array operations and higher-order functions using immutable data structures.

Uploaded by

Jacob Co
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
267 views

Parallel Functional Programming in Java 8

This document provides a 3-sentence summary of a presentation on parallel functional programming in Java 8: The presentation introduces functional programming concepts in Java 8 including lambda expressions, method references, and functional interfaces that allow functions as arguments; it describes the use of streams for bulk parallel data processing; and it provides examples of parallel array operations and higher-order functions using immutable data structures.

Uploaded by

Jacob Co
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Parallel Functional

Programming in Java 8

Peter Sestoft
IT University of Copenhagen

Chalmers Tekniska Högskola


Monday 2018-04-16

IT University of Copenhagen 1
The speaker
• MSc 1988 computer science and mathematics and
PhD 1991, DIKU, Copenhagen University
• KU, DTU, KVL and ITU; and Glasgow U, AT&T Bell
Labs, Microsoft Research UK, Harvard University
• Programming languages, software development, ...
• Open source software
– Moscow ML implementation, 1994…
– C5 Generic Collection Library, with Niels Kokholm, 2006…
– Funcalc spreadsheet implementation, 2014

1993 2002, 2005, 2016 2004 & 2012 2007 2012, 2017 2014
Plan
• Java 8 functional programming
– Package java.util.function
– Lambda expressions, method reference expressions
– Functional interfaces, targeted function type
• Java 8 streams for bulk data
– Package java.util.stream
• High-level parallel programming
– Streams: primes, queens, van der Corput, …
– Array parallel prefix operations
• Class java.util.Arrays static methods
• A multicore performance mystery

IT University of Copenhagen 3
Materials
• Java Precisely 3rd edition, MIT Press 2016
– 11.13: Lambda expressions
– 11.14: Method reference expressions
– 23: Functional interfaces
– 24: Streams for bulk data
– 25: Class Optional<T>

• Book examples are called Example154.java etc


– Get them from the book homepage
http://www.itu.dk/people/sestoft/javaprecisely/

IT University of Copenhagen 4
New in Java 8
• Lambda expressions
(String s) -> s.length
• Method reference expressions
String::length
• Functional interfaces
Function<String,Integer>
• Streams for bulk data
Stream<Integer> is = ss.map(String::length)
• Parallel streams
is = ss.parallel().map(String::length)
• Parallel array operations
Arrays.parallelSetAll(arr, i -> sin(i/PI/100.0))
Arrays.parallelPrefix(arr, (x, y) -> x+y)

IT University of Copenhagen 5
Functional programming in Java
• Immutable data instead of objects with state
• Recursion instead of loops
• Higher-order functions that either
– take functions as argument
– return functions as result
Immutable

Example154.java
class FunList<T> { list of T
final Node<T> first;
protected static class Node<U> {
public final U item;
public final Node<U> next;
public Node(U item, Node<U> next) { ... }
}
...
}

IT University of Copenhagen 6
Immutable data
• FunList<T>, linked lists of nodes
class FunList<T> {
final Node<T> first;
protected static class Node<U> {

Example154.java
public final U item;
public final Node<U> next;
public Node(U item, Node<U> next) { ... }
}

List of Integer
list1 9 13 0

Head Tail

7
Existing data do not change
FunList<Integer> empty = new FunList<>(null),
list1 = cons(9, cons(13, cons(0, empty))),

Example154.java
list2 = cons(7, list1),
list3 = cons(8, list1),
list4 = list1.insert(1, 12),
list5 = list2.removeAt(3);

list1 9 13 0
list2 7
list3
8
list4 9 12
list5 7 9 13
8
Recursion in insert
public FunList<T> insert(int i, T item) {

Example154.java
return new FunList<T>(insert(i, item, this.first));
}

static <T> Node<T> insert(int i, T item, Node<T> xs) {


return i == 0 ? new Node<T>(item, xs)
: new Node<T>(xs.item, insert(i-1, item, xs.next));
}

• “If i is zero, put item in a new node, and let


its tail be the old list xs”
• “Otherwise, put the first element of xs in a
new node, and let its tail be the result of
inserting item in position i-1 of the tail of xs”

IT University of Copenhagen 9
Immutable data: Bad and good
• Immutability leads to more allocation
– Takes time and space
– But modern garbage collectors are fast
• Immutable data can be safely shared
– May actually reduce amount of allocation
• Immutable data are automatically threadsafe
– No (other) thread can mess with it
– And also due to visibility effects of final modifier

Subtle point

IT University of Copenhagen 10
Lambda expressions 1

Example64.java
• One argument lambda expressions:
Function<String,Integer>
fsi1 = s -> Integer.parseInt(s);

... fsi1.apply("004711") ... Function that takes a string s


and parses it as an integer
Calling the function
Same, written
Function<String,Integer> in other ways
fsi2 = s -> { return Integer.parseInt(s); },
fsi3 = (String s) -> Integer.parseInt(s);

• Two-argument lambda expressions:


BiFunction<String,Integer,String>
fsis1 = (s, i) -> s.substring(i, Math.min(i+3, s.length()));

IT University of Copenhagen 11
Lambda expressions 2
• Zero-argument lambda expression:

Example64.java
Supplier<String>
now = () -> new java.util.Date().toString();

• One-argument result-less lambda (“void”):


Consumer<String>
show1 = s -> System.out.println(">>>" + s + "<<<”);

Consumer<String>
show2 = s -> { System.out.println(">>>" + s + "<<<"); };

IT University of Copenhagen 12
Method reference expressions
BiFunction<String,Integer,Character> charat

Example67.java
= String::charAt;
Same as (s,i) -> s.charAt(i)
System.out.println(charat.apply("ABCDEF", 1));

Function<String,Integer> parseint = Integer::parseInt;


Same as fsi1, fs2 and fs3

Function<Integer,Character> hex1
= "0123456789ABCDEF"::charAt;

Conversion to hex digit

Class and array constructors


Function<Integer,C> makeC = C::new;
Function<Integer,Double[]> make1DArray = Double[]::new;
13
Targeted function type (TFT)
• A lambda expression or method reference
expression does not have a type in itself
• Therefore must have a targeted function type
• Lambda or method reference must appear as
– Assignment right hand side:
• Function<String,Integer> f = Integer::parseInt;
– Argument to call: TFT
• stringList.map(Integer::parseInt)
– In a cast: map’s argument type is TFT
• (Function<String,Integer>)Integer::parseInt
– Argument to return statement: TFT
• return Integer::parseInt;
Enclosing method’s
return type is TFT 14
Functions as arguments: map
public <U> FunList<U> map(Function<T,U> f) {

Example154.java
return new FunList<U>(map(f, first));
}
static <T,U> Node<U> map(Function<T,U> f, Node<T> xs) {
return xs == null ? null
: new Node<U>(f.apply(xs.item), map(f, xs.next));
}

• Function map encodes general behavior


– Transform each list element to make a new list
– Argument f expresses the specific transformation
• Same effect as OO “template method pattern”

IT University of Copenhagen 15
Calling map
7 9 13

FunList<Double> list8 = list5.map(i -> 2.5 * i);


17.5 22.5 32.5

FunList<Boolean> list9 = list5.map(i -> i < 10);


true true false

IT University of Copenhagen 16
Functions as arguments: reduce
static <T,U> U reduce(U x0, BiFunction<U,T,U> op, Node<T> xs) {
return xs == null ? x0
: reduce(op.apply(x0, xs.item), op, xs.next);
}

• list.reduce(x0, op)
= x0vx1v...vxn
if we write op.apply(x,y) as xvy

Example154.java
• Example: list.reduce(0, (x,y) -> x+y)
= 0+x1+...+xn

IT University of Copenhagen 17
Calling reduce
17.5 22.5 32.5

Example154.java
double sum = list8.reduce(0.0, (res, item) -> res + item);

72.5

double product = list8.reduce(1.0, (res, item) -> res * item);

12796.875

boolean allBig
= list8.reduce(true, (res, item) -> res && item > 10);

true

IT University of Copenhagen 18
Tail recursion and loops
static <T,U> U reduce(U x0, BiFunction<U,T,U> op, Node<T> xs) {
return xs == null ? x0
: reduce(op.apply(x0, xs.item), op, xs.next);
}

Tail call

• A call that is the func’s last action is a tail call


• A tail-recursive func can be replaced by a loop
static <T,U> U reduce(U x0, BiFunction<U,T,U> op, Node<T> xs) {
while (xs != null) {
x0 = op.apply(x0, xs.item);

Example154.java
xs = xs.next;
} Loop version
return x0; of reduce
}

– The Java compiler does not do that automatically


19
Java 8 functional interfaces
• A functional interface has exactly one abstract
method
Type of functions
from T to R
interface Function<T,R> {
R apply(T x);
} C#: Func<T,R>
F#: T -> R

Type of functions
interface Consumer<T> { from T to void
void accept(T x);
} C#: Action<T>

F#: T -> unit

IT University of Copenhagen 20
(Too) many functional interfaces
interface IntFunction<R> {
R apply(int x);
}

Use instead of
Function<Integer,R>
to avoid (un)boxing

Primitive-type

Java Precisely page 125


specialized
interfaces

21
Primitive-type specialized interfaces
for int, double, and long
interface Function<T,R> {
R apply(T x);
}
Why
interface IntFunction<R> { both?
R apply(int x);
} What difference?
Function<Integer,String> f1 = i -> "#" + i;
IntFunction<String> f2 = i -> "#" + i;

• Calling f1.apply(i) will box i as Integer


– Allocating object in heap, takes time and memory
• Calling f2.apply(i) avoids boxing, is faster
• Purely a matter of performance
IT University of Copenhagen 22
Functions that return functions
• Conversion of n to English numeral, cases
n < 20 : one, two, ..., nineteen
Same pattern
n < 100: twenty-three, ...
n>=100: two hundred forty-three, ...
n>=1000: three thousand two hundred forty-three...
n >= 1 million: ... million …
n >= 1 billion: ... billion …
private static String less100(long n) {

Example158.java
return n<20 ? ones[(int)n] Convert n < 100
: tens[(int)n/10-2] + after("-", ones[(int)n%10]);
}
static LongFunction<String> less(long limit, String unit,
LongFunction<String> conv) {
return n -> n<limit ? conv.apply(n)
: conv.apply(n/limit) + " " + unit
+ after(" ", conv.apply(n%limit));
} 23
Functions that return functions
• Using the general higher-order function
static final LongFunction<String>

Example158.java
less1K = less( 100, "hundred", Example158::less100),
less1M = less( 1_000, "thousand", less1K),
less1B = less( 1_000_000, "million", less1M),
less1G = less(1_000_000_000, "billion", less1B);

• Converting to English numerals:


public static String toEnglish(long n) {
return n==0 ? "zero" : n<0 ? "minus " + less1G.apply(-n)
: less1G.apply(n);
}

toEnglish(2147483647)

two billion one hundred forty-seven million


four hundred eighty-three thousand six hundred forty-seven

IT University of Copenhagen 24
Streams for bulk data
• Stream<T> is a finite or infinite sequence of T
– Possibly lazily generated
– Possibly parallel
• Stream methods
– map, flatMap, reduce, filter, ...
– These take functions as arguments
– Can be combined into pipelines
– Java optimizes (and parallelizes) the pipelines well
• Similar to
– Java Iterators, but very different implementation
– The extension methods underlying .NET Linq

IT University of Copenhagen 25
Some stream operations
• Stream<Integer> s = Stream.of(2, 3, 5)
• s.filter(p) = the x where p.test(x) holds
s.filter(x -> x%2==0) gives 2
• s.map(f) = results of f.apply(x) for x in s
s.map(x -> 3*x) gives 6, 9, 15
• s.flatMap(f) = a flattening of the streams
created by f.apply(x) for x in s
s.flatMap(x -> Stream.of(x,x+1)) gives 2,3,3,4,5,6
• s.findAny() = some element of s, if any, or else
the absent Option<T> value
s.findAny() gives 2 or 3 or 5
• s.reduce(x0, op) = x0vs0v...vsn if we write
op.apply(x,y) as xvy
s.reduce(1, (x,y)->x*y) gives 1*2*3*5 = 30
26
Similar functions are everywhere
• Java stream map is called
– map in Haskell, Scala, F#, Clojure
– Select in C#
• Java stream flatMap is called
– concatMap in Haskell
– flatMap in Scala
– collect in F#
– SelectMany in C#
– mapcat in Clojure
• Java reduce is a special (assoc. op.) case of
– foldl in Haskell
– foldLeft in Scala
– fold in F#
– Aggregate in C#
– reduce in Clojure

IT University of Copenhagen 27
Counting primes on Java 8 streams
• Our old standard Java for loop:
int count = 0;
for (int i=0; i<range; i++) Classical efficient
imperative loop
if (isPrime(i))
count++;
• Sequential Java 8 stream:
IntStream.range(0, range)
Pure functional
.filter(i -> isPrime(i))
programming ...
.count()

• Parallel Java 8 stream:


IntStream.range(0, range) ... and thus
.parallel() parallelizable and
.filter(i -> isPrime(i)) thread-safe
.count()
28
Performance results (!!)
• Counting the primes in 0 ...99,999
Method Intel i7 (ms) AMD Opteron (ms)
Sequential for-loop 9.9 40.5
Sequential stream 9.9 40.8
Parallel stream 2.8 1.7
Best thread-parallel 3.0 4.9
Best task-parallel 2.6 1.9

• Functional streams give the simplest solution


• Nearly as fast as tasks and threads, or faster:
– Intel i7 (4 cores) speed-up: 3.6 x
– AMD Opteron (32 cores) speed-up: 24.2 x
– ARM Cortex-A7 (RP 2B) (4 cores) speed-up: 3.5 x
• The future is parallel – and functional J
IT University of Copenhagen 29
Side-effect freedom
• From the java.util.stream package docs:

This means
”catastrophic”

• Java compiler (type system) cannot enforce


side-effect freedom
• Java runtime cannot detect it

IT University of Copenhagen 30
Creating streams 1
• Explicitly or from array, collection or map:
IntStream is = IntStream.of(2, 3, 5, 7, 11, 13);

Example164.java
String[] a = { "Hoover", "Roosevelt", ...};
Stream<String> presidents = Arrays.stream(a);

Collection<String> coll = ...;


Stream<String> countries = coll.stream();

Map<String,Integer> phoneNumbers = ...;


Stream<Map.Entry<String,Integer>> phones
= phoneNumbers.entrySet().stream();

• Finite, ordered, sequential, lazily generated

31
Creating streams 2
• Useful special-case streams:
• IntStream.range(0, 10_000)
• random.ints(5_000)

Example164.java
• bufferedReader.lines()
• bitset.stream()
• Functional iterators for infinite streams
• Imperative generators for infinite streams
• StreamBuilder<T>: eager, only finite streams

IT University of Copenhagen 32
Creating streams 3: generators
• Generating 0, 1, 2, 3, ... Functional

Example165.java
IntStream nats1 = IntStream.iterate(0, x -> x+1);

Most efficient (!!), Object


and parallelizable imperative
IntStream nats2 = IntStream.generate(new IntSupplier() {
private int next = 0;
public int getAsInt() { return next++; }
});

Imperative, using final


array for mutable state
final int[] next = { 0 };
IntStream nats3 = IntStream.generate(() -> next[0]++);

IT University of Copenhagen 33
Creating streams 4: StreamBuilder
• Convert own linked IntList to an IntStream
class IntList {
public final int item;
public final IntList next;
...
public static IntStream stream(IntList xs) {
IntStream.Builder sb = IntStream.builder();
while (xs != null) {

Example182.java
sb.accept(xs.item);
xs = xs.next;
}
return sb.build();
}
}

• Eager: no stream element output until end


• Finite: does not work on cyclic or infinite lists
IT University of Copenhagen 34
Streams for backtracking
• Generate all n-permutations of 0, 1, ..., n-1
– Eg [2,1,0], [1,2,0], [2,0,1], [0,2,1], [0,1,2], [1,0,2]

Set of numbers An incomplete


not yet used permutation

public static Stream<IntList> perms(BitSet todo, IntList tail) {


if (todo.isEmpty())
return Stream.of(tail);

Example175.java
else
return todo.stream().boxed()
.flatMap(r -> perms(minus(todo, r), new IntList(r, tail)));
}

public static Stream<IntList> perms(int n) {


BitSet todo = new BitSet(n); todo.flip(0, n);
return perms(todo, null);
}
{ 0, ..., n-1 } Empty
permutation [ ] 35
A closer look at generation for n=3
({0,1,2}, [])
({1,2}, [0])
({2}, [1,0])
({}, [2,1,0]) Output to stream
({1}, [2,0])
({}, [1,2,0])
Output to stream
({0,2}, [1])
({2}, [0,1])
({}, [2,0,1]) Output to stream
({0}, [2,1])
({}, [0,2,1]) Output to stream
({0,1}, [2])
...
36
A permutation is a rook (tårn)
placement on a chessboard

n n n
n n n
n n n
[2, 1, 0] [1, 2, 0] [2, 0, 1]

n n n
n n n
n n n
[0, 2, 1] [0, 1, 2] [1, 0, 2]

IT University of Copenhagen 37
Solutions to the n-queens problem
• For queens, just take diagonals into account:
– consider only r that are safe for the partial solution
public static Stream<IntList> queens(BitSet todo, IntList tail) {
if (todo.isEmpty())

Example176.java
return Stream.of(tail); Diagonal
else check
return todo.stream()
.filter(r -> safe(r, tail)).boxed()
.flatMap(r -> queens(minus(todo, r), new IntList(r, tail)));
}
public static boolean safe(int mid, IntList tail) {
return safe(mid+1, mid-1, tail);
}
.parallel()
public static boolean safe(int d1, int d2, IntList tail) {
return tail==null || d1!=tail.item && d2!=tail.item && safe(d1+1, d2-1, tail.next);
}

• Simple, and parallelizable for free, 3.5 x faster


• Solve or generate sudokus: much the same
38
Versatility of streams
• Many uses of a stream of solutions
– Print the number of solutions
System.out.println(queens(8).count());

– Print all solutions


queens(8).forEach(System.out::println);
– Print an arbitrary solution (if there is one)

Example174.java
System.out.println(queens(8).findAny());
– Print the 20 first solutions
queens(8).limit(20).forEach(System.out::println);

• Much harder in an imperative version


• Separation of concerns (Dijkstra): production
of solutions versus consumption of solutions
IT University of Copenhagen 39
Streams for quasi-infinite sequences
• van der Corput numbers
– 1/2, 1/4, 3/4, 1/8, 5/8, 3/8, 7/8, 1/16, ...
– Dense and uniform in interval [0, 1]
– For simulation and finance, Black-Scholes options
• Trick: v d Corput numbers as base-2 fractions
0.1, 0.01, 0.11, 0.001, 0.101, 0.011, 0.111 ...
are bit-reversals of 1, 2, 3, 4, 5, 6, 7, ... in binary
public static DoubleStream vanDerCorput() {

Example183.java
return IntStream.range(1, 31).asDoubleStream()
.flatMap(b -> bitReversedRange((int)b));
}

private static DoubleStream bitReversedRange(int b) {


final long bp = Math.round(Math.pow(2, b));
return LongStream.range(bp/2, bp)
.mapToDouble(i -> (double)(bitReverse((int)i) >>> (32-b)) / bp);
} IT University of Copenhagen 40
Collectors: aggregation of streams
• To format an IntList as string “[2, 3, 5, 7]”
– Convert the list to an IntStream
– Convert each element to get Stream<String>
– Use a predefined Collector to build final result

Example182.java
public String toString() {
return stream(this).mapToObj(String::valueOf)
.collect(Collectors.joining(",", "[", "]"));
}

public static String toString(IntList xs) {


StringBuilder sb = new StringBuilder();
sb.append("[");
The alternative ”direct”
boolean first = true; solution requires care
while (xs != null) {
if (!first) and cleverness
sb.append(", ");
first = false;
sb.append(xs.item);
xs = xs.next;
}
return sb.append("]").toString();
}
41
Java 8 stream properties
• Some stream dimensions
– Finite vs infinite
– Lazily generated (by iterate, generate, ...)
vs eagerly generated (stream builders)
– Ordered (map, filter, limit ... preserve element
order) vs unordered
– Sequential (all elements processed on one thread)
vs parallel
• Java streams
– can be lazily generated, like Haskell lists
– but are use-once, unlike Haskell lists
• reduces risk of space leaks
• limits expressiveness, harder to compute average …
IT University of Copenhagen 42
How are Java streams implemented?
• Spliterators
interface Spliterator<T> {
long estimateSize();
void forEachRemaining(Consumer<T> action);
boolean tryAdvance(Consumer<T> action);
void Spliterator<T> trySplit();
}
– Many method calls (well inlined/fused by the JIT)
• Parallelization
– Divide stream into chunks using trySplit
– Process each chunk in a task (Haskell “spark”)
– Run on thread pool using work-stealing queues
– ... thus similar to Haskell parBuffer/parListChunk

IT University of Copenhagen 43
Parallel (functional) array operations
• Simulating random motion on a line
– Take n random steps of length at most [-1, +1]:

Example25.java
double[] a = new Random().doubles(n, -1.0, +1.0)
.toArray();

– Compute the positions at end of each step:


a[0], a[0]+a[1], a[0]+a[1]+a[2], ...
Arrays.parallelPrefix(a, (x,y) -> x+y); NB: Updates
array a

– Find the maximal absolute distance from start:


double maxDist = Arrays.stream(a).map(Math::abs)
.max().getAsDouble();

• A lot done, fast, without loops or assignments


– Just arrays and streams and functions
IT University of Copenhagen 44
Array and streams and parallel ...
• Associative array aggregation
Arrays.parallelPrefix(a, (x,y) -> x+y);

• Such operations can be parallelized well


– So-called prefix scans (Blelloch 1990)

• Streams and arrays complement each other


• Streams: lazy, possibly infinite,
non-materialized, use-once, parallel pipelines
• Array: eager, always finite, materialized,
use-many-times, parallel prefix scans

IT University of Copenhagen 45
Some problems with Java streams
• Streams are use-once & have other restrictions
– Probably to permit easy parallelization
• Hard to create lazy finite streams
– Probably to allow high-performance implementation
• Difficult to control resource consumption
• A single side-effect may mess all up completely
• Sometimes .parallel() hurts performance a lot
– See exercise
– And strange behavior, in parallel + limit in Sudoku generator
• Laziness in Java is subtle, easily goes wrong:
static Stream<String> getPageAsStream(String url) throws IOException {

Example216.java
try (BufferedReader in
= new BufferedReader(new InputStreamReader(
new URL(url).openStream()))) {
return in.lines();
} Closes the reader too early, so any
use of the Stream<String> causes
}
IOException: Stream closed Useless
46
2P
A multicore performance mystery
• K-means clustering 2P: Assign – Update –
Assign – Update … till convergence
while (!converged) { Pseudocode

TestKMeansSolution.java
let taskCount parallel tasks do { Assign
final int from = ..., to = ...;
for (int pi=from; pi<to; pi++)
myCluster[pi] = closest(points[pi], clusters);
}
let taskCount parallel tasks do { Update
final int from = ..., to = ...;
for (int pi=from; pi<to; pi++)
myCluster[pi].addToMean(points[pi]);
}
...
} Imperative
• Assign: writes a point to myCluster[pi]
• Update: calls addToMean on myCluster[pi]
47
2Q
A multicore performance mystery
• ”Improved” version 2Q:
– call addToMean directly on point
– instead of first writing it to myCluster array
while (!converged) {
let taskCount parallel tasks do {
final int from = ..., to = ...;
for (int pi=from; pi<to; pi++)
closest(points[pi], clusters).addToMean(points[pi]);
}
...
}

IT University of Copenhagen 48
Performance of k-means clustering
• Sequential: as you would expect, 5% speedup
• Parallel: surprisingly bad!
2P 2Q 2Q/2P
Sequential 4.240 4.019 0.95 Bad
4-core parallel 1.310 2.234 1.70
24-core parallel 0.852 6.587 7.70 Very
bad
Time in seconds for 200,000 points, 81 clusters, 1/8/48 tasks, 108 iterations

• Q: WHY is the “improved” code slower?


• A: Cache invalidation and false sharing

IT University of Copenhagen 49
The Point and Cluster classes
class Point {
public final double x, y;
}

static class Cluster extends ClusterBase {


private volatile Point mean;
private double sumx, sumy;
private int count;
public synchronized void addToMean(Point p) {
sumx += p.x;
sumy += p.y;
count++;
}
...
}

Cluster object
mean sumx sumy count layout (maybe)

50
KMeans 2P
• Assignment step
– Reads each Cluster’s mean field 200,000 times
– Writes only myCluster array segments, separately
– Takes no locks at all
• Update step
– Calls addToMean 200,000 times
– Writes the 81 clusters’ sumx, sumy, count fields
200,000 times in total
– Takes Cluster object locks 200,000 times

IT University of Copenhagen 51
KMeans 2Q
• Unified loop
– Reads each Cluster’s mean field 200,000 times
– Calls addToMean 200,000 times and writes the
sumx, sumy, count fields 200,000 times in total
– Takes Cluster object locks 200,000 times
• Problem in 2Q:
– mean reads are mixed with sumx, sumy, ... writes
– The writes invalidate the cached mean field
– The 200,000 mean field reads become slower
– False sharing: mean and sumx on same cache line
– (A problem on Intel i7, not on 20 x slower ARM A7)

• See http://www.itu.dk/people/sestoft/papers/cpucache-20170319.pdf

IT University of Copenhagen 52
3P
Parallel streams to the rescue, 3P
while (!converged) {
• fff
final Cluster[] clustersLocal = clusters;
Map<Cluster, List<Point>> groups = Assign
Arrays.stream(points).parallel()
.collect(Collectors.groupingBy(p -> closest(p,clustersLocal)));
clusters = groups.entrySet().stream().parallel()
.map(kv -> new Cluster(kv.getKey().getMean(), kv.getValue()))
.toArray(Cluster[]::new);
Cluster[] newClusters = Update
Arrays.stream(clusters).parallel()
.map(Cluster::computeMean).toArray(Cluster[]::new);
converged = Arrays.equals(clusters, newClusters);
clusters = newClusters;
} Functional

2P 2Q 3P
Sequential 4.240 4.019 5.353
4-core parallel i7 1.310 2.234 1.350
24-core parallel Xeon 0.852 6.587 0.553
Time in seconds for 200,000 points, 81 clusters, 1/8/48 tasks, 108 iterations

53
Exercise: Streams & floating-point
sum
• Compute series sum:
for N=999,999,999
• For-loop, forwards summation
double sum = 0.0;

TestStreamSums.java
for (int i=1; i<N; i++)
sum += 1.0/i;
Different
• For-loop, backwards summation results!
results?
double sum = 0.0;
for (int i=1; i<N; i++)
sum += 1.0/(N-i);

• Could make a DoubleStream, and use .sum()


• Or parallel DoubleStream and .sum() Different
results?
54
This week
• Reading
– Java Precisely 3rd ed. 11.13, 11.14, 23, 24, 25
– Optional:
• http://www.itu.dk/people/sestoft/papers/benchmarking.pdf
• http://www.itu.dk/people/sestoft/papers/cpucache-20170319.pdf

• Exercises
– Extend immutable list class with functional
programming; use parallel array operations; use
streams of words and streams of numbers
– Alternatively: Make a faster and more scalable k-
means clustering implementation, if possible, in
any language

IT University of Copenhagen 57

You might also like