Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
SparkSQL:
A Compiler from Queries to RDDs
Sameer Agarwal
Spark Summit | Boston | February 9th 2017
About Me
• Software Engineer at Databricks (Spark Core/SQL)
• PhD in Databases (AMPLab, UC Berkeley)
• Research on BlinkDB (Approximate Queries in Spark)
Background: What is an RDD?
• Dependencies
• Partitions
• Compute function: Partition => Iterator[T]
3
Background: What is an RDD?
• Dependencies
• Partitions
• Compute function: Partition => Iterator[T]
4
Opaque Computation
Background: What is an RDD?
• Dependencies
• Partitions
• Compute function: Partition => Iterator[T]
5
Opaque Data
RDD Programming Model
6
Constructexecution DAG using low level RDD operators.
RDD Programming Model
7
Constructexecution DAG using low level RDD operators.
RDD Programming Model
8
Constructexecution DAG using low level RDD operators.
SQL/Structured Programming Model
• High-level APIs (SQL, DataFrame/Dataset): Programs
describe what data operations are neededwithout
specifying how to executethese operations
• More efficient: An optimizer can automatically find out
the most efficient plan to executea query
9
10
SQL AST
DataFrame
Dataset
Query Plan
Optimized
Query Plan
RDDs
Transformations
Catalyst
Abstractionsof users’programs
(Trees)
Spark SQL Overview
Tungsten
11
How Catalyst Works: An Overview
SQL AST
DataFrame
Dataset
Query Plan
Optimized
Query Plan
RDDs
Transformations
Catalyst
Abstractions of users’ programs
(Trees)
12
Trees: Abstractions of Users’ Programs
SELECT sum(v)
FROM (
SELECT
t1.id,
1 + 2 + t1.value AS v
FROM t1 JOIN t2
WHERE
t1.id = t2.id AND
t2.id > 50 * 1000) tmp
13
Trees: Abstractions of Users’ Programs
SELECT sum(v)
FROM (
SELECT
t1.id,
1 + 2 + t1.value AS v
FROM t1 JOIN t2
WHERE
t1.id = t2.id AND
t2.id > 50 * 1000) tmp
Expression
• An expressionrepresentsa
new value, computed based
on input values
• e.g. 1 + 2 + t1.value
14
Trees: Abstractions of Users’ Programs
SELECT sum(v)
FROM (
SELECT
t1.id,
1 + 2 + t1.value AS v
FROM t1 JOIN t2
WHERE
t1.id = t2.id AND
t2.id > 50 * 1000) tmp
Query Plan
Scan
(t1)
Scan
(t2)
Join
Filter
Project
Aggregate sum(v)
t1.id,
1+2+t1.value
as v
t1.id=t2.id
t2.id>50*1000
Logical Plan
• A Logical Plan describescomputation
on datasets without defining how to
conductthe computation
15
Scan
(t1)
Scan
(t2)
Join
Filter
Project
Aggregate sum(v)
t1.id,
1+2+t1.value
as v
t1.id=t2.id
t2.id>50*1000
Physical Plan
• A Physical Plan describescomputation
on datasets with specific definitions on
how to conductthe computation
16
Parquet Scan
(t1)
JSONScan
(t2)
Sort-Merge
Join
Filter
Project
Hash-
Aggregate
sum(v)
t1.id,
1+2+t1.value
as v
t1.id=t2.id
t2.id>50*1000
17
How Catalyst Works: An Overview
SQL AST
DataFrame
Dataset
(Java/Scala)
Query Plan
Optimized
Query Plan
RDDs
Transformations
Catalyst
Abstractionsof users’programs
(Trees)
• A function associated with everytree used to
implement a single rule
Transform
18
Attribute
(t1.value)
Add
Add
Literal(1) Literal(2)
1 + 2 + t1.value
Attribute
(t1.value)
Add
Literal(3)
3+ t1.valueEvaluate 1 + 2 onceEvaluate 1 + 2
for every row
Transform
• A transform is defined as a Partial Function
• Partial Function: A function that is defined for a subset
of its possible arguments
19
val expression: Expression = ...
expression.transform {
case Add(Literal(x, IntegerType), Literal(y, IntegerType)) =>
Literal(x + y)
}
Case statement determineifthe partialfunction is definedfora given input
val expression: Expression = ...
expression.transform {
case Add(Literal(x, IntegerType), Literal(y, IntegerType)) =>
Literal(x + y)
}
Transform
20
Attribute
(t1.value)
Add
Add
Literal(1) Literal(2)
1 + 2 + t1.value
val expression: Expression = ...
expression.transform {
case Add(Literal(x, IntegerType), Literal(y, IntegerType)) =>
Literal(x + y)
}
Transform
21
Attribute
(t1.value)
Add
Add
Literal(1) Literal(2)
1 + 2 + t1.value
val expression: Expression = ...
expression.transform {
case Add(Literal(x, IntegerType), Literal(y, IntegerType)) =>
Literal(x + y)
}
Transform
22
Attribute
(t1.value)
Add
Add
Literal(1) Literal(2)
1 + 2 + t1.value
val expression: Expression = ...
expression.transform {
case Add(Literal(x, IntegerType), Literal(y, IntegerType)) =>
Literal(x + y)
}
Transform
23
Attribute
(t1.value)
Add
Add
Literal(1) Literal(2)
1 + 2 + t1.value
Attribute
(t1.value)
Add
Literal(3)
3+ t1.value
Combining Multiple Rules
24
Scan
(t1)
Scan
(t2)
Join
Filter
Project
Aggregate sum(v)
t1.id,
1+2+t1.value
as v
t1.id=t2.id
t2.id>50*1000
Predicate Pushdown
Scan
(t1)
Scan
(t2)
Join
Filter
Project
Aggregate sum(v)
t1.id,
1+2+t1.value
as v
t2.id>50*1000
t1.id=t2.id
Combining Multiple Rules
25
Constant Folding
Scan
(t1)
Scan
(t2)
Join
Filter
Project
Aggregate sum(v)
t1.id,
1+2+t1.value
as v
t2.id>50*1000
t1.id=t2.id
Scan
(t1)
Scan
(t2)
Join
Filter
Project
Aggregate sum(v)
t1.id,
3+t1.value as
v
t2.id>50000
t1.id=t2.id
Combining Multiple Rules
26
Column Pruning
Scan
(t1)
Scan
(t2)
Join
Filter
Project
Aggregate sum(v)
t1.id,
3+t1.value as
v
t2.id>50000
t1.id=t2.id
Scan
(t1)
Scan
(t2)
Join
Filter
Project
Aggregate sum(v)
t1.id,
3+t1.value as
v
t2.id>50000
t1.id=t2.id
Project Project
t1.id
t1.value t2.id
Combining Multiple Rules
27
Scan
(t1)
Scan
(t2)
Join
Filter
Project
Aggregate sum(v)
t1.id,
1+2+t1.value
as v
t1.id=t2.id
t2.id>50*1000
Scan
(t1)
Scan
(t2)
Join
Filter
Project
Aggregate sum(v)
t1.id,
3+t1.value as
v
t2.id>50000
t1.id=t2.id
Project Projectt1.id
t1.value
t2.id
Before transformations
After transformations
28
SQL AST
DataFrame
Dataset
Query Plan
Optimized
Query Plan
RDDs
Transformations
Catalyst
Abstractionsof users’programs
(Trees)
Spark SQL Overview
Tungsten
Scan
Filter
Project
Aggregate
select count(*) from store_sales
where ss_item_sk = 1000
G. Graefe, Volcano— An Extensible and Parallel Query Evaluation System,
In IEEE Transactions on Knowledge and Data Engineering 1994
Volcano Iterator Model
• Standard for 30 years:
almost all databases do it
• Each operator is an
“iterator” that consumes
records from its input
operator
class Filter(
child: Operator,
predicate: (Row => Boolean))
extends Operator {
def next(): Row = {
var current = child.next()
while (current == null ||predicate(current)) {
current = child.next()
}
return current
}
}
Downside of the Volcano Model
1. Too many virtual function calls
o at least 3 calls for each row in Aggregate
2. Extensive memory access
o “row” is a small segment in memory (or in L1/L2/L3 cache)
3. Can’t take advantage of modern CPU features
o SIMD, pipelining, prefetching, branch prediction, ILP, instruction
cache, …
Scan
Filter
Project
Aggregate
long count = 0;
for (ss_item_sk in store_sales) {
if (ss_item_sk == 1000) {
count += 1;
}
}
Whole-stage Codegen: Spark as a “Compiler”
Whole-stage Codegen
• Fusing operators together so the generated code looks like
hand optimized code:
- Identify chains of operators (“stages”)
- Compile each stage into a single function
- Functionality of a general purpose execution engine;
performance as if hand built system just to run your query
T Neumann, Efficiently compiling efficient query plans for modern hardware. InVLDB 2011
Putting it All Together
Operator Benchmarks: Cost/Row (ns)
5-30x
Speedups
Operator Benchmarks: Cost/Row (ns)
Radix Sort
10-100x
Speedups
Operator Benchmarks: Cost/Row (ns)
Shuffling
still the
bottleneck
Operator Benchmarks: Cost/Row (ns)
10x
Speedup
TPC-DS (Scale Factor 1500, 100 cores)
QueryTime
Query #
Spark 2.0 Spark 1.6
Lower is Better
What’s Next?
Spark 2.2 and beyond
1. SPARK-16026: Cost Based Optimizer
- Leverage table/column level statistics to optimize joins and aggregates
- Statistics Collection Framework (Spark 2.1)
- Cost Based Optimizer (Spark 2.2)
2. Boosting Spark’s Performance on Many-Core Machines
- In-memory/ single node shuffle
3. Improving quality of generated code and betterintegration
with the in-memory column format in Spark
Thank you.

More Related Content

SparkSQL: A Compiler from Queries to RDDs

  • 1. SparkSQL: A Compiler from Queries to RDDs Sameer Agarwal Spark Summit | Boston | February 9th 2017
  • 2. About Me • Software Engineer at Databricks (Spark Core/SQL) • PhD in Databases (AMPLab, UC Berkeley) • Research on BlinkDB (Approximate Queries in Spark)
  • 3. Background: What is an RDD? • Dependencies • Partitions • Compute function: Partition => Iterator[T] 3
  • 4. Background: What is an RDD? • Dependencies • Partitions • Compute function: Partition => Iterator[T] 4 Opaque Computation
  • 5. Background: What is an RDD? • Dependencies • Partitions • Compute function: Partition => Iterator[T] 5 Opaque Data
  • 6. RDD Programming Model 6 Constructexecution DAG using low level RDD operators.
  • 7. RDD Programming Model 7 Constructexecution DAG using low level RDD operators.
  • 8. RDD Programming Model 8 Constructexecution DAG using low level RDD operators.
  • 9. SQL/Structured Programming Model • High-level APIs (SQL, DataFrame/Dataset): Programs describe what data operations are neededwithout specifying how to executethese operations • More efficient: An optimizer can automatically find out the most efficient plan to executea query 9
  • 10. 10 SQL AST DataFrame Dataset Query Plan Optimized Query Plan RDDs Transformations Catalyst Abstractionsof users’programs (Trees) Spark SQL Overview Tungsten
  • 11. 11 How Catalyst Works: An Overview SQL AST DataFrame Dataset Query Plan Optimized Query Plan RDDs Transformations Catalyst Abstractions of users’ programs (Trees)
  • 12. 12 Trees: Abstractions of Users’ Programs SELECT sum(v) FROM ( SELECT t1.id, 1 + 2 + t1.value AS v FROM t1 JOIN t2 WHERE t1.id = t2.id AND t2.id > 50 * 1000) tmp
  • 13. 13 Trees: Abstractions of Users’ Programs SELECT sum(v) FROM ( SELECT t1.id, 1 + 2 + t1.value AS v FROM t1 JOIN t2 WHERE t1.id = t2.id AND t2.id > 50 * 1000) tmp Expression • An expressionrepresentsa new value, computed based on input values • e.g. 1 + 2 + t1.value
  • 14. 14 Trees: Abstractions of Users’ Programs SELECT sum(v) FROM ( SELECT t1.id, 1 + 2 + t1.value AS v FROM t1 JOIN t2 WHERE t1.id = t2.id AND t2.id > 50 * 1000) tmp Query Plan Scan (t1) Scan (t2) Join Filter Project Aggregate sum(v) t1.id, 1+2+t1.value as v t1.id=t2.id t2.id>50*1000
  • 15. Logical Plan • A Logical Plan describescomputation on datasets without defining how to conductthe computation 15 Scan (t1) Scan (t2) Join Filter Project Aggregate sum(v) t1.id, 1+2+t1.value as v t1.id=t2.id t2.id>50*1000
  • 16. Physical Plan • A Physical Plan describescomputation on datasets with specific definitions on how to conductthe computation 16 Parquet Scan (t1) JSONScan (t2) Sort-Merge Join Filter Project Hash- Aggregate sum(v) t1.id, 1+2+t1.value as v t1.id=t2.id t2.id>50*1000
  • 17. 17 How Catalyst Works: An Overview SQL AST DataFrame Dataset (Java/Scala) Query Plan Optimized Query Plan RDDs Transformations Catalyst Abstractionsof users’programs (Trees)
  • 18. • A function associated with everytree used to implement a single rule Transform 18 Attribute (t1.value) Add Add Literal(1) Literal(2) 1 + 2 + t1.value Attribute (t1.value) Add Literal(3) 3+ t1.valueEvaluate 1 + 2 onceEvaluate 1 + 2 for every row
  • 19. Transform • A transform is defined as a Partial Function • Partial Function: A function that is defined for a subset of its possible arguments 19 val expression: Expression = ... expression.transform { case Add(Literal(x, IntegerType), Literal(y, IntegerType)) => Literal(x + y) } Case statement determineifthe partialfunction is definedfora given input
  • 20. val expression: Expression = ... expression.transform { case Add(Literal(x, IntegerType), Literal(y, IntegerType)) => Literal(x + y) } Transform 20 Attribute (t1.value) Add Add Literal(1) Literal(2) 1 + 2 + t1.value
  • 21. val expression: Expression = ... expression.transform { case Add(Literal(x, IntegerType), Literal(y, IntegerType)) => Literal(x + y) } Transform 21 Attribute (t1.value) Add Add Literal(1) Literal(2) 1 + 2 + t1.value
  • 22. val expression: Expression = ... expression.transform { case Add(Literal(x, IntegerType), Literal(y, IntegerType)) => Literal(x + y) } Transform 22 Attribute (t1.value) Add Add Literal(1) Literal(2) 1 + 2 + t1.value
  • 23. val expression: Expression = ... expression.transform { case Add(Literal(x, IntegerType), Literal(y, IntegerType)) => Literal(x + y) } Transform 23 Attribute (t1.value) Add Add Literal(1) Literal(2) 1 + 2 + t1.value Attribute (t1.value) Add Literal(3) 3+ t1.value
  • 24. Combining Multiple Rules 24 Scan (t1) Scan (t2) Join Filter Project Aggregate sum(v) t1.id, 1+2+t1.value as v t1.id=t2.id t2.id>50*1000 Predicate Pushdown Scan (t1) Scan (t2) Join Filter Project Aggregate sum(v) t1.id, 1+2+t1.value as v t2.id>50*1000 t1.id=t2.id
  • 25. Combining Multiple Rules 25 Constant Folding Scan (t1) Scan (t2) Join Filter Project Aggregate sum(v) t1.id, 1+2+t1.value as v t2.id>50*1000 t1.id=t2.id Scan (t1) Scan (t2) Join Filter Project Aggregate sum(v) t1.id, 3+t1.value as v t2.id>50000 t1.id=t2.id
  • 26. Combining Multiple Rules 26 Column Pruning Scan (t1) Scan (t2) Join Filter Project Aggregate sum(v) t1.id, 3+t1.value as v t2.id>50000 t1.id=t2.id Scan (t1) Scan (t2) Join Filter Project Aggregate sum(v) t1.id, 3+t1.value as v t2.id>50000 t1.id=t2.id Project Project t1.id t1.value t2.id
  • 27. Combining Multiple Rules 27 Scan (t1) Scan (t2) Join Filter Project Aggregate sum(v) t1.id, 1+2+t1.value as v t1.id=t2.id t2.id>50*1000 Scan (t1) Scan (t2) Join Filter Project Aggregate sum(v) t1.id, 3+t1.value as v t2.id>50000 t1.id=t2.id Project Projectt1.id t1.value t2.id Before transformations After transformations
  • 28. 28 SQL AST DataFrame Dataset Query Plan Optimized Query Plan RDDs Transformations Catalyst Abstractionsof users’programs (Trees) Spark SQL Overview Tungsten
  • 29. Scan Filter Project Aggregate select count(*) from store_sales where ss_item_sk = 1000
  • 30. G. Graefe, Volcano— An Extensible and Parallel Query Evaluation System, In IEEE Transactions on Knowledge and Data Engineering 1994
  • 31. Volcano Iterator Model • Standard for 30 years: almost all databases do it • Each operator is an “iterator” that consumes records from its input operator class Filter( child: Operator, predicate: (Row => Boolean)) extends Operator { def next(): Row = { var current = child.next() while (current == null ||predicate(current)) { current = child.next() } return current } }
  • 32. Downside of the Volcano Model 1. Too many virtual function calls o at least 3 calls for each row in Aggregate 2. Extensive memory access o “row” is a small segment in memory (or in L1/L2/L3 cache) 3. Can’t take advantage of modern CPU features o SIMD, pipelining, prefetching, branch prediction, ILP, instruction cache, …
  • 33. Scan Filter Project Aggregate long count = 0; for (ss_item_sk in store_sales) { if (ss_item_sk == 1000) { count += 1; } } Whole-stage Codegen: Spark as a “Compiler”
  • 34. Whole-stage Codegen • Fusing operators together so the generated code looks like hand optimized code: - Identify chains of operators (“stages”) - Compile each stage into a single function - Functionality of a general purpose execution engine; performance as if hand built system just to run your query
  • 35. T Neumann, Efficiently compiling efficient query plans for modern hardware. InVLDB 2011
  • 36. Putting it All Together
  • 37. Operator Benchmarks: Cost/Row (ns) 5-30x Speedups
  • 38. Operator Benchmarks: Cost/Row (ns) Radix Sort 10-100x Speedups
  • 39. Operator Benchmarks: Cost/Row (ns) Shuffling still the bottleneck
  • 40. Operator Benchmarks: Cost/Row (ns) 10x Speedup
  • 41. TPC-DS (Scale Factor 1500, 100 cores) QueryTime Query # Spark 2.0 Spark 1.6 Lower is Better
  • 43. Spark 2.2 and beyond 1. SPARK-16026: Cost Based Optimizer - Leverage table/column level statistics to optimize joins and aggregates - Statistics Collection Framework (Spark 2.1) - Cost Based Optimizer (Spark 2.2) 2. Boosting Spark’s Performance on Many-Core Machines - In-memory/ single node shuffle 3. Improving quality of generated code and betterintegration with the in-memory column format in Spark