Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

1Pivotal Confidential–Internal Use Only 1Pivotal Confidential–Internal Use Only
Spark Architecture
A.Grishchenko

2

About me
Enterprise Architect @ Pivotal
 7 years in data processing
 5 years with MPP
 4 years with Hadoop
 Spark contributor
 http://0x0fff.com

3

Outline
 Spark Motivation
 Spark Pillars
 Spark Architecture
 Spark Shuffle
 Spark DataFrame

4

Outline
 Spark Motivation
 Spark Pillars
 Spark Architecture
 Spark Shuffle
 Spark DataFrame

5

Spark Motivation
 Difficultly of programming directly in Hadoop MapReduce

6

Spark Motivation
 Difficultly of programming directly in Hadoop MapReduce
 Performance bottlenecks, or batch not fitting use cases

7

Spark Motivation
 Difficultly of programming directly in Hadoop MapReduce
 Performance bottlenecks, or batch not fitting use cases
 Better support iterative jobs typical for machine learning

8

Difficulty of Programming in MR
Word Count implementations
 Hadoop MR – 61 lines in Java
 Spark – 1 line in interactive shell
sc.textFile('...').flatMap(lambda x: x.split())
.map(lambda x: (x, 1)).reduceByKey(lambda x, y: x+y)
.saveAsTextFile('...')
VS

9

Performance Bottlenecks
How many times the data is put to the HDD during a single
MapReduce Job?
 One
 Two
 Three
 More

10

Performance Bottlenecks
How many times the data is put to the HDD during a single
MapReduce Job?
 One
 Two
 Three
 More

11

Performance Bottlenecks
Consider Hive as main SQL tool

12

Performance Bottlenecks
Consider Hive as main SQL tool
 Typical Hive query is translated to 3-5 MR jobs

13

Performance Bottlenecks
Consider Hive as main SQL tool
 Typical Hive query is translated to 3-5 MR jobs
 Each MR would scan put data to HDD 3+ times

14

Performance Bottlenecks
Consider Hive as main SQL tool
 Typical Hive query is translated to 3-5 MR jobs
 Each MR would scan put data to HDD 3+ times
 Each put to HDD – write followed by read

15

Performance Bottlenecks
Consider Hive as main SQL tool
 Typical Hive query is translated to 3-5 MR jobs
 Each MR would scan put data to HDD 3+ times
 Each put to HDD – write followed by read
 Sums up to 18-30 scans of data during a single
Hive query

16

Performance Bottlenecks
Spark offers you
 Lazy Computations
– Optimize the job before executing

17

Performance Bottlenecks
Spark offers you
 Lazy Computations
– Optimize the job before executing
 In-memory data caching
– Scan HDD only once, then scan your RAM

18

Performance Bottlenecks
Spark offers you
 Lazy Computations
– Optimize the job before executing
 In-memory data caching
– Scan HDD only once, then scan your RAM
 Efficient pipelining
– Avoids the data hitting the HDD by all means

19

Outline
 Spark Motivation
 Spark Pillars
 Spark Architecture
 Spark Shuffle
 Spark DataFrame

20

Spark Pillars
Two main abstractions of Spark

21

Spark Pillars
Two main abstractions of Spark
 RDD – Resilient Distributed Dataset

22

Spark Pillars
Two main abstractions of Spark
 RDD – Resilient Distributed Dataset
 DAG – Direct Acyclic Graph

23

RDD
 Simple view
– RDD is collection of data items split into partitions and
stored in memory on worker nodes of the cluster

24

RDD
 Simple view
– RDD is collection of data items split into partitions and
stored in memory on worker nodes of the cluster
 Complex view
– RDD is an interface for data transformation

25

RDD
 Simple view
– RDD is collection of data items split into partitions and
stored in memory on worker nodes of the cluster
 Complex view
– RDD is an interface for data transformation
– RDD refers to the data stored either in persisted store
(HDFS, Cassandra, HBase, etc.) or in cache (memory,
memory+disks, disk only, etc.) or in another RDD

26

RDD
 Complex view (cont’d)
– Partitions are recomputed on failure or cache eviction

27

RDD
 Complex view (cont’d)
– Partitions are recomputed on failure or cache eviction
– Metadata stored for interface
▪ Partitions – set of data splits associated with this RDD

28

RDD
 Complex view (cont’d)
– Partitions are recomputed on failure or cache eviction
– Metadata stored for interface
▪ Partitions – set of data splits associated with this RDD
▪ Dependencies – list of parent RDDs involved in computation

29

RDD
 Complex view (cont’d)
– Partitions are recomputed on failure or cache eviction
– Metadata stored for interface
▪ Partitions – set of data splits associated with this RDD
▪ Dependencies – list of parent RDDs involved in computation
▪ Compute – function to compute partition of the RDD given the
parent partitions from the Dependencies

30

RDD
 Complex view (cont’d)
– Partitions are recomputed on failure or cache eviction
– Metadata stored for interface
▪ Partitions – set of data splits associated with this RDD
▪ Dependencies – list of parent RDDs involved in computation
▪ Compute – function to compute partition of the RDD given the
parent partitions from the Dependencies
▪ Preferred Locations – where is the best place to put
computations on this partition (data locality)

31

RDD
 Complex view (cont’d)
– Partitions are recomputed on failure or cache eviction
– Metadata stored for interface
▪ Partitions – set of data splits associated with this RDD
▪ Dependencies – list of parent RDDs involved in computation
▪ Compute – function to compute partition of the RDD given the
parent partitions from the Dependencies
▪ Preferred Locations – where is the best place to put
computations on this partition (data locality)
▪ Partitioner – how the data is split into partitions

32

RDD
 RDD is the main and only tool for data
manipulation in Spark
 Two classes of operations
– Transformations
– Actions

33

RDD
Lazy computations model
Transformation cause only metadata change

34

DAG
Direct Acyclic Graph – sequence of computations
performed on data

35

DAG
Direct Acyclic Graph – sequence of computations
performed on data
 Node – RDD partition

36

DAG
Direct Acyclic Graph – sequence of computations
performed on data
 Node – RDD partition
 Edge – transformation on top of data

37

DAG
Direct Acyclic Graph – sequence of computations
performed on data
 Node – RDD partition
 Edge – transformation on top of data
 Acyclic – graph cannot return to the older partition

38

DAG
Direct Acyclic Graph – sequence of computations
performed on data
 Node – RDD partition
 Edge – transformation on top of data
 Acyclic – graph cannot return to the older partition
 Direct – transformation is an action that transitions
data partition state (from A to B)

39

DAG
WordCount example

40

DAG
WordCount example
HDFSInputSplits
HDFS
RDDPartitions
RDD RDD RDD RDD RDD
sc.textFile(‘hdfs://…’) flatMap map reduceByKey foreach

41

Outline
 Spark Motivation
 Spark Pillars
 Spark Architecture
 Spark Shuffle
 Spark DataFrames

42

Spark Cluster
Driver Node
…
Worker Node Worker Node Worker Node

43

Spark Cluster
Driver Node
Driver
…
Worker Node
…
Executor Executor
Worker Node
…
Executor Executor
Worker Node
…
Executor Executor

44

Spark Cluster
Driver Node
Driver
Spark Context
…
Worker Node
…
Executor
Cache
Executor
Cache
Worker Node
…
Executor
Cache
Executor
Cache
Worker Node
…
Executor
Cache
Executor
Cache

45

Spark Cluster
Driver Node
Driver
Spark Context
…
Worker Node
…
Executor
Cache
Task
…
Task
Task
Executor
Cache
Task
…
Task
Task
Worker Node
…
Executor
Cache
Task
…
Task
Task
Executor
Cache
Task
…
Task
Task
Worker Node
…
Executor
Cache
Task
…
Task
Task
Executor
Cache
Task
…
Task
Task

46

Spark Cluster
 Driver
– Entry point of the Spark Shell (Scala, Python, R)

47

Spark Cluster
 Driver
– Entry point of the Spark Shell (Scala, Python, R)
– The place where SparkContext is created

48

Spark Cluster
 Driver
– Entry point of the Spark Shell (Scala, Python, R)
– The place where SparkContext is created
– Translates RDD into the execution graph

49

Spark Cluster
 Driver
– Entry point of the Spark Shell (Scala, Python, R)
– The place where SparkContext is created
– Translates RDD into the execution graph
– Splits graph into stages

50

Spark Cluster
 Driver
– Entry point of the Spark Shell (Scala, Python, R)
– The place where SparkContext is created
– Translates RDD into the execution graph
– Splits graph into stages
– Schedules tasks and controls their execution

51

Spark Cluster
 Driver
– Entry point of the Spark Shell (Scala, Python, R)
– The place where SparkContext is created
– Translates RDD into the execution graph
– Splits graph into stages
– Schedules tasks and controls their execution
– Stores metadata about all the RDDs and their partitions

52

Spark Cluster
 Driver
– Entry point of the Spark Shell (Scala, Python, R)
– The place where SparkContext is created
– Translates RDD into the execution graph
– Splits graph into stages
– Schedules tasks and controls their execution
– Stores metadata about all the RDDs and their partitions
– Brings up Spark WebUI with job information

53

Spark Cluster
 Executor
– Stores the data in cache in JVM heap or on HDDs

54

Spark Cluster
 Executor
– Stores the data in cache in JVM heap or on HDDs
– Reads data from external sources

55

Spark Cluster
 Executor
– Stores the data in cache in JVM heap or on HDDs
– Reads data from external sources
– Writes data to external sources

56

Spark Cluster
 Executor
– Stores the data in cache in JVM heap or on HDDs
– Reads data from external sources
– Writes data to external sources
– Performs all the data processing

57

Executor Memory

58

Spark Cluster – Detailed

59

Spark Cluster – PySpark

60

Application Decomposition
 Application
– Single instance of SparkContext that stores some data
processing logic and can schedule series of jobs,
sequentially or in parallel (SparkContext is thread-safe)

61

Application Decomposition
 Application
– Single instance of SparkContext that stores some data
processing logic and can schedule series of jobs,
sequentially or in parallel (SparkContext is thread-safe)
 Job
– Complete set of transformations on RDD that finishes
with action or data saving, triggered by the driver
application

62

Application Decomposition
 Stage
– Set of transformations that can be pipelined and
executed by a single independent worker. Usually it is
app the transformations between “read”, “shuffle”,
“action”, “save”

63

Application Decomposition
 Stage
– Set of transformations that can be pipelined and
executed by a single independent worker. Usually it is
app the transformations between “read”, “shuffle”,
“action”, “save”
 Task
– Execution of the stage on a single data partition. Basic
unit of scheduling

64

WordCount ExampleHDFSInputSplits
HDFS
RDDPartitions
RDD RDD RDD RDD RDD
sc.textFile(‘hdfs://…’) flatMap map reduceByKey foreach

65

Stage 1
WordCount ExampleHDFSInputSplits
HDFS
RDDPartitions
RDD RDD RDD RDD RDD
sc.textFile(‘hdfs://…’) flatMap map reduceByKey foreach

66

Stage 2Stage 1
WordCount ExampleHDFSInputSplits
HDFS
RDDPartitions
RDD RDD RDD RDD RDD
sc.textFile(‘hdfs://…’) flatMap map reduceByKey foreach

67

Stage 2Stage 1
WordCount ExampleHDFSInputSplits
HDFS
RDDPartitions
RDD RDD RDD RDD RDD
sc.textFile(‘hdfs://…’) flatMap map reduceByKey foreach
Task 1
Task 2
Task 3
Task 4

68

Stage 2Stage 1
WordCount ExampleHDFSInputSplits
HDFS
RDDPartitions
RDD RDD RDD RDD RDD
sc.textFile(‘hdfs://…’) flatMap map reduceByKey foreach
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Task 7
Task 8

69

Stage 2Stage 1
WordCount ExampleHDFSInputSplits
HDFS
pipeline
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Task 7
Task 8
partition shuffle pipeline

70

Persistence in Spark
Persistence Level Description
MEMORY_ONLY Store RDD as deserialized Java objects in the JVM. If the RDD does not fit in
memory, some partitions will not be cached and will be recomputed on the fly each
time they're needed. This is the default level.
MEMORY_AND_DISK Store RDD as deserialized Java objects in the JVM. If the RDD does not fit in
memory, store the partitions that don't fit on disk, and read them from there when
they're needed.
MEMORY_ONLY_SER Store RDD as serialized Java objects (one byte array per partition). This is
generally more space-efficient than deserialized objects, especially when using a
fast serializer, but more CPU-intensive to read.
MEMORY_AND_DISK_SER Similar to MEMORY_ONLY_SER, but spill partitions that don't fit in memory to disk
instead of recomputing them on the fly each time they're needed.
DISK_ONLY Store the RDD partitions only on disk.
MEMORY_ONLY_2,
DISK_ONLY_2, etc.
Same as the levels above, but replicate each partition on two cluster nodes.

71

Persistence in Spark
 Spark considers memory as a cache with LRU
eviction rules
 If “Disk” is involved, data is evicted to disks
rdd = sc.parallelize(xrange(1000))
rdd.cache().count()
rdd.persist(StorageLevel.MEMORY_AND_DISK_SER).count()
rdd.unpersist()

72

Outline
 Spark Motivation
 Spark Pillars
 Spark Architecture
 Spark Shuffle
 Spark DataFrame

73

Shuffles in Spark
 Hash Shuffle – default prior to 1.2.0

74

Shuffles in Spark
 Hash Shuffle – default prior to 1.2.0
 Sort Shuffle – default now

75

Shuffles in Spark
 Hash Shuffle – default prior to 1.2.0
 Sort Shuffle – default now
 Tungsten Sort – new optimized one!

76

Hash Shuffle

77

Executor JVM
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Hash Shuffle

78

Executor JVM
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map” task
“map” task
spark.executor.cores/
spark.task.cpus
…
Hash Shuffle

79

Local DirectoryExecutor JVM
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map” task
“map” task
spark.executor.cores/
spark.task.cpus
…
Output File
Output File
Output File
…
Numberof
“Reducers”
spark.local.dir
Hash Shuffle

80

Local DirectoryExecutor JVM
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map” task
“map” task
spark.executor.cores/
spark.task.cpus
…
Output File
Output File
Output File
…
Numberof
“Reducers”
Output File
Output File
Output File
…
Numberof
“Reducers”
…
spark.local.dir
Hash Shuffle
Numberof“map”tasks
executedbythisexecutor

81

Hash Shuffle with Consolidation

82

Executor JVM
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Hash Shuffle With Consolidation

83

Local DirectoryExecutor JVM
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map” task
…
Numberof
“Reducers”
Output File
Output File
Output File
spark.local.dir
Hash Shuffle With Consolidationspark.executor.cores/
spark.task.cpus

84

Local DirectoryExecutor JVM
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map” task
“map” task
…
…
…
Numberof
“Reducers”
spark.local.dir
Hash Shuffle With Consolidationspark.executor.cores/
spark.task.cpus
spark.executor.cores/
spark.task.cpus
…
Numberof
“Reducers”
Output File
Output File
Output File
Output File
Output File
Output File

85

Local DirectoryExecutor JVM
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map” task
…
…
…
Numberof
“Reducers”
spark.local.dir
Hash Shuffle With Consolidationspark.executor.cores/
spark.task.cpus
spark.executor.cores/
spark.task.cpus
…
Numberof
“Reducers”
Output File
Output File
Output File
Output File
Output File
Output File

86

Local DirectoryExecutor JVM
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map” task
“map” task
…
…
…
Numberof
“Reducers”
spark.local.dir
Hash Shuffle With Consolidationspark.executor.cores/
spark.task.cpus
spark.executor.cores/
spark.task.cpus
Output File
Output File
Output File
…
Numberof
“Reducers”
Output File
Output File
Output File

87

Local DirectoryExecutor JVM
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map” task
…
…
…
Numberof
“Reducers”
spark.local.dir
Hash Shuffle With Consolidationspark.executor.cores/
spark.task.cpus
spark.executor.cores/
spark.task.cpus
Output File
Output File
Output File
…
Numberof
“Reducers”
Output File
Output File
Output File

88

Local DirectoryExecutor JVM
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map” task
“map” task
…
…
…
Numberof
“Reducers”
spark.local.dir
Hash Shuffle With Consolidationspark.executor.cores/
spark.task.cpus
spark.executor.cores/
spark.task.cpus
…
Numberof
“Reducers”
Output File
Output File
Output File
Output File
Output File
Output File

89

Sort Shuffle

90

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
Partition
Partition

91

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
“map”
task
…
Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
Partition
Partition
spark.executor.cores/spark.task.cpus

92

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
“map”
task
…
Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
spark.shuffle.
[safetyFraction * memoryFraction]
Partition
Partition
AppendOnlyMap
…
AppendOnlyMap
spark.executor.cores/spark.task.cpus

93

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
“map”
task
…
Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
spark.shuffle.
[safetyFraction * memoryFraction]
Partition
Partition
AppendOnlyMap
…
AppendOnlyMap
sort &
spill
spark.executor.cores/spark.task.cpus

94

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
“map”
task
…
Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
spark.shuffle.
[safetyFraction * memoryFraction]
Partition
Partition
AppendOnlyMap
…
AppendOnlyMap
sort &
spill
Local
Directory
Output File
index
spark.executor.cores/spark.task.cpus

95

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
“map”
task
…
Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
spark.shuffle.
[safetyFraction * memoryFraction]
Partition
Partition
AppendOnlyMap
…
AppendOnlyMap
sort &
spill
sort &
spill
Local
Directory
Output File
index
Output File
index
spark.executor.cores/spark.task.cpus

96

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
“map”
task
…
Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
spark.shuffle.
[safetyFraction * memoryFraction]
Partition
Partition
AppendOnlyMap
…
AppendOnlyMap
sort &
spill
sort &
spill
sort &
spill
…
Local
Directory
…
Output File
index
Output File
index
Output File
index
spark.executor.cores/spark.task.cpus

97

MinHeap
Merge
MinHeap
Merge
Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
“map”
task
…
Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
spark.shuffle.
[safetyFraction * memoryFraction]
Partition
Partition
AppendOnlyMap
…
AppendOnlyMap
sort &
spill
sort &
spill
sort &
spill
…
Local
Directory
…
“reduce”task“reduce”task
…Output File
index
Output File
index
Output File
index
spark.executor.cores/spark.task.cpus

98

Tungsten Sort Shuffle

99

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Tungsten Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
Partition
Partition

100

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
Tungsten Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
Partition
Partition
spark.executor.cores/spark.task.cpus

101

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
Tungsten Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
spark.shuffle.
[safetyFraction * memoryFraction]
Partition
Partition
Serialized Data
LinkedList<MemoryBlock>
Array of data pointers and
Partition IDs, long[]
spark.executor.cores/spark.task.cpus

102

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
Tungsten Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
spark.shuffle.
[safetyFraction * memoryFraction]
Partition
Partition
Serialized Data
LinkedList<MemoryBlock>
Array of data pointers and
Partition IDs, long[]
sort &
spill
spark.executor.cores/spark.task.cpus

103

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
Tungsten Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
spark.shuffle.
[safetyFraction * memoryFraction]
Partition
Partition
Serialized Data
LinkedList<MemoryBlock>
Local Directory
Array of data pointers and
Partition IDs, long[]
sort &
spill
spark.local.dir
Output File
partition
partition
partition
partition
index
spark.executor.cores/spark.task.cpus

104

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
Tungsten Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
spark.shuffle.
[safetyFraction * memoryFraction]
Partition
Partition
Serialized Data
LinkedList<MemoryBlock>
Local Directory
Array of data pointers and
Partition IDs, long[]
sort &
spill
sort &
spill
…
spark.local.dir
Output File
partition
partition
partition
partition
index
Output File
partition
partition
partition
partition
index
spark.executor.cores/spark.task.cpus

105

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
“map”
task
…
Tungsten Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
spark.shuffle.
[safetyFraction * memoryFraction]
Partition
Partition
Serialized Data
LinkedList<MemoryBlock>
Local Directory
Array of data pointers and
Partition IDs, long[]
sort &
spill
sort &
spill
…
spark.local.dir
Output File
partition
partition
partition
partition
index
Output File
partition
partition
partition
partition
index
spark.executor.cores/spark.task.cpus

106

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
“map”
task
…
Tungsten Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
spark.shuffle.
[safetyFraction * memoryFraction]
Partition
Partition
Serialized Data
LinkedList<MemoryBlock>
…
Local Directory
Array of data pointers and
Partition IDs, long[]
Serialized Data
LinkedList<MemoryBlock>
Array of data pointers and
Partition IDs, long[]
sort &
spill
sort &
spill
…
spark.local.dir
Output File
partition
partition
partition
partition
index
Output File
partition
partition
partition
partition
index
spark.executor.cores/spark.task.cpus

107

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
“map”
task
…
Tungsten Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
spark.shuffle.
[safetyFraction * memoryFraction]
Partition
Partition
Serialized Data
LinkedList<MemoryBlock>
…
Local Directory
…
Array of data pointers and
Partition IDs, long[]
Serialized Data
LinkedList<MemoryBlock>
Array of data pointers and
Partition IDs, long[]
sort &
spill
sort &
spill
…
sort &
spill
spark.local.dir
Output File
partition
partition
partition
partition
index
Output File
partition
partition
partition
partition
index
Output File
partition
partition
partition
partition
index
spark.executor.cores/spark.task.cpus

108

Executor JVMPartition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
“map”
task
“map”
task
…
Tungsten Sort Shuffle
spark.storage.[safetyFraction * memoryFraction]
spark.shuffle.
[safetyFraction * memoryFraction]
Partition
Partition
Serialized Data
LinkedList<MemoryBlock>
…
Local Directory
…
Array of data pointers and
Partition IDs, long[]
Serialized Data
LinkedList<MemoryBlock>
Array of data pointers and
Partition IDs, long[]
sort &
spill
sort &
spill
…
sort &
spill
spark.local.dir
Output File
partition
partition
partition
partition
index
Output File
partition
partition
partition
partition
index
Output File
partition
partition
partition
partition
index
Output File
partition
partition
partition
partition
index
merge
spark.executor.cores/spark.task.cpus

109

Outline
 Spark Motivation
 Spark Pillars
 Spark Architecture
 Spark Shuffle
 Spark DataFrame

110

DataFrame Idea

111

DataFrame Implementation
 Interface
– DataFrame is an RDD with schema – field names, field
data types and statistics
– Unified transformation interface in all languages, all the
transformations are passed to JVM
– Can be accessed as RDD, in this case transformed to
the RDD of Row objects

112

DataFrame Implementation
 Internals
– Internally it is the same RDD
– Data is stored in row-columnar format, row chunk size
is set by spark.sql.inMemoryColumnarStorage.batchSize
– Each column in each partition stores min-max values
for partition pruning
– Allows better compression ratio than standard RDD
– Delivers faster performance for small subsets of
columns

113

113Pivotal Confidential–Internal Use Only 113Pivotal Confidential–Internal Use Only
Questions?

114

BUILT FOR THE SPEED OF BUSINESS

More Related Content

Apache Spark Architecture

Editor's Notes

  1. def printfunc (x): print 'Word "%s" occurs %d times' % (x[0], x[1]) infile = sc.textFile('hdfs://sparkdemo:8020/sparkdemo/textfiles/README.md', 4) rdd1 = infile.flatMap(lambda x: x.split()) rdd2 = rdd1.map(lambda x: (x, 1)).reduceByKey(lambda x, y: x+y) print rdd2.toDebugString() rdd2.foreach(printfunc)