Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
30 views

Part 4 - Easy Data Parallelism

This document discusses data parallelism in Java streams. It begins by explaining why parallelism is important due to multicore CPUs. It then defines data parallelism as distributing data over different processes to be processed simultaneously. The document provides examples of using parallel streams in Java to parallelize processing of data. It notes some pitfalls to avoid, such as interfering with data sources, misusing reduce, holding locks, or using mutable shared state. Overall it presents best practices for effectively parallelizing stream processing of data in Java.

Uploaded by

Ionut Negru
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Part 4 - Easy Data Parallelism

This document discusses data parallelism in Java streams. It begins by explaining why parallelism is important due to multicore CPUs. It then defines data parallelism as distributing data over different processes to be processed simultaneously. The document provides examples of using parallel streams in Java to parallelize processing of data. It notes some pitfalls to avoid, such as interfering with data sources, misusing reduce, holding locks, or using mutable shared state. Overall it presents best practices for effectively parallelizing stream processing of data in Java.

Uploaded by

Ionut Negru
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Easy Data Parallelism

Richard Warburton
Raoul-Gabriel Urma
Overview
● Why is parallelism Important?

● What is data parallelism?

● Parallelising your Streams

● Performance and Internals


Why is Parallelism important?
source: http://www.gotw.ca/images/CPU.png
Multicore
What is Data Parallelism?
Concurrency is not Parallelism!
● Concurrency
○ At least two threads are making progress
○ May not run at the same time
○ Eg: chrome and eclipse both running

● Parallelism
○ At least two threads are executing simultaneously
○ A specific case of concurrency
○ Eg: servlet container dealing with two users at
once on a multicore machine
Parallelism
● Task
○ Distribute execution processes over processes
○ Threads and Executors in Java
○ Eg: each thread services a user in JEE App

● Data
○ Distribute data over different processes
○ Support built on top of Streams
○ Eg: process a payroll and give each core 100
employee’s salary
What are good data parallel
problems?
● Big Batch Jobs

○ Transaction Processing

○ Analytics/Reporting

● Web crawlers / parsers

● Maths

○ Monte Carlo Simulations

○ Linear Algebra
What’s a good data parallel problem from your

workplace?
Parallelising your Streams
Data Parallelism
● Useful
○ a lot of data
○ want to process in a similar way

● API aims to be explicit, but unobtrusive


○ .parallelStream()
○ .parallel()

● Can flip between sequential and parallel


Data Parallelism

// Replace stream() with parallelStream()


Set<String> origins = musicians
.parallelStream()
.filter(artist -> artist.getName().startsWith("The"))
.map(artist -> artist.getNationality())
.collect(toSet());
Not all serial code works in parallel.
DON’T interfere with data sources

// add double each value into a list.

List<Integer> numbers = getNumbers();

numbers.parallelStream()
.forEach(i -> numbers.add(i * 2));
Referring to data sources fixed

// add double each value into a list.

List<Integer> numbers = getNumbers();

numbers = numbers.parallelStream()
.flatMap(i -> Stream.of(i, i * 2))
.collect(toList());
DON’T misuse reduce

int totalCost(List<Purchase> items) {


return items.parallelStream()
.reduce(DELIVERY_FEE,
(tally, item) -> tally + item.cost());
}
Associativity

“you can flip order around and things still work”

(4 + 2) + 1 = 4 + (2 + 1) = 7
(4 * 2) * 1 = 4 * (2 * 1) = 8
Identity

“the do nothing value”

0 + 5 = 5
1 * 5 = 5
How to fix reduce

int totalCost(List<Purchase> items) {


return DELIVERY_FEE
+ items.parallelStream()
.reduce(0,
(tally, item) -> tally + item.cost());
}
How to fix reduce (2)

int totalCost(List<Purchase> items) {


return DELIVERY_FEE
+ items.parallelStream()
.mapToInt(Purchase::getCost)
.sum();
}
DON’T hold locks

List<Integer> values = getValues();


CountDownLatch latch = new CountDownLatch(values.size());

values.parallelStream()
.forEach(i -> {
try {
doSomething(i);
// Potential Deadlock
latch.countdown();
} catch (Exception e ) {
e.printStackTrace();
}});
No mutable state!
public static long sideEffectParallelSum(long n) {
Accumulator accumulator = new Accumulator();
LongStream.rangeClosed(1,n).parallel()
.forEach(accumulator::add);
return accumulator.total;
}

public static class Accumulator {


private long total = 0;
public void add(long value) {
total += value;
}
}
Parallel Code Summary
● Very easy to make your code parallel,

but …

● Sometimes you can get away with things


sequentially that you can’t in parallel
○ sources
○ reduce
○ locks
○ unprotected mutable data
Performance and Internals
Under the hood

● Work distributed using Fork/Join framework

● Distributed by data

● New abstraction: Spliterator


Parallel Integer Sums

int sum =
values.parallelStream()
.mapToInt(i -> i)
.sum();
Spliterator
public interface Spliterator<T> {
/** Carve off a portion of the data
into a separate Spliterator */
Spliterator<T> trySplit();

/** Iterate the data described by this Spliterator */


void forEachRemaining(Consumer<? super T> action);

/** The size of the data described


by this Spliterator, if known */
long getExactSizeIfKnown();
}
Always a tradeoff ...
● Parallelism eats more CPU time
○ Thread communication
○ Distributing & Decomposing work
○ Potentially increased memory pressure
○ Competing for the CPU with other processes

● It can reduce wall time


○ Time from beginning to end of the processes’
execution
○ Ideally only need to wait for 1/N of the execution
time
Decomposition Performance
● Data Size

● Source Data Structure

● Packing

● Number of Cores

● Cost per Element


Data Structures
● Good
○ ArrayList / Intstream.range / Stream.of
○ Random Access + Easy to balance
● Meh
○ Hashset / Treeset
○ Usually good balance
● Bad
○ LinkedList / BufferedReader.lines() /
Streams.iterate()
○ Unknown length
○ bad random access performance
Stateful Operations
● Stateless
○ no need to keep state when evaluated
○ eg: map, reduce
○ superior parallel decomposition
○ bounded amounts of data

● Stateful
○ accumulate state during evaluation
○ eg: sorted
○ unbounded caching of data
Benchmarking and Testing
● Don’t assume parallel = faster, measure it
● Use jmh:
http://openjdk.java.net/projects/code-tools/jmh/

● Best Practices
○ Warmup
○ Repeatability
○ Evade the JIT
Summary
Lesson Summary

● Easy to obtain Data Parallelism

● Pick your situation well

● A lot of performance influencers

● Benchmark your parallel code


The End
Exercise
In: com.java_8_training.problems.data_parallelism

1. Looks at OptimisationExample
2. Try to improve the performance of this code
3. Measure performance using the benchmark harness
4. Don’t make the code uglier!
Exercise
In: com.java_8_training.problems.data_parallelism

1. Parallelise the sum of squares method


Question1Test

2. Fix the bug in the "multiplyThrough" method


Question2Test

3. Remove the locks and keep the code safe


Question3Test
Amdahl’s Law
● Defines upper bound for parallel speedup

● Time(n) = Time(1) * (s + 1/n * (1 - s))


○ n = number of cores
○ s = proportion of code that is strictly serial

● Speedup(n) = 1 / (s + 1/n * (1 - s))

● Example
○ 1024 cores, 50% serial
○ 1 / (0.5 + 1/1024 * (1 - 0.5)) ~= 2x speedup

You might also like