Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
© 2017 IBM Corporation
Ingesting Data at Blazing Speed with Apache ORC
Gustavo Arocena
IBM Toronto Lab
© 2017 IBM Corporation2
IBM Canada Lab
Toronto
© 2017 IBM Corporation3
Acknowledgements and Disclaimers
Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries
in which IBM operates.
The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are
provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice
to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is
provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of,
or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the
effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the
applicable license agreement governing the use of IBM software.
All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may
have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials
is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales,
revenue growth or other results.
© Copyright IBM Corporation 2017. All rights reserved.
U.S. Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM, the IBM logo, ibm.com, BigInsights, and Big SQL are trademarks or registered trademarks of International Business Machines Corporation in the
United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a
trademark symbol (® or TM), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was
published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the
Web at
▪“Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
▪TPC Benchmark, TPC-DS, and QphDS are trademarks of Transaction Processing Performance Council
▪Cloudera, the Cloudera logo, Cloudera Impala are trademarks of Cloudera.
▪Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the United States and other countries.
▪Other company, product, or service names may be trademarks or service marks of others.
© 2017 IBM Corporation4
Agenda
What is Big SQL?
From Parquet to ORC
Reading ORC Files Fast
Scaling Up
Tuning Big SQL for ORC
© 2017 IBM Corporation5
Big SQL Background
© 2017 IBM Corporation6
What is Big SQL?
Data Security
Metastore
Cluster Mgmt.
Administration
Runs on
Data Platform
© 2017 IBM Corporation7
What’s the Big Deal?
Grace Under Pressure
© 2017 IBM Corporation8
The Big SQL
Advantages
Scale
• Only engine to run
TPC-DS at 100TB
scale
Complex SQL
• Capable of running all
99 TPC-DS queries
since 2014
• Complex queries
optimized with IBM
Cost-based optimizer
Concurrency
• Handles highly concurrent
workloads gracefully
• 12 stream TPC-DS
Efficient Resource
Utilization
• Memory
• CPU
• IO
What’s the Big Deal?
© 2017 IBM Corporation9
Metrics for Big SQL 4.2.5 vs. Spark SQL 2.1
▪ Hadoop DS @ 100TB, 4 streams
13.7
43.2
BIG SQL SPARK SQL
Hours
Elapsed Time
76.4
88.2
BIG SQL SPARK SQL
%
CPU Utilization
107
388
BIG SQL SPARK SQL
MB/Sec
Disk Reads
25
237
BIG SQL SPARK SQL
MB/Sec
Disk Writes
- 15%
1/3
1/3 1/9
https://developer.ibm.com/hadoop/2017/02/07/experiences-comparing-big-sql-and-spark-sql-at-100tb/
© 2017 IBM Corporation10
From Parquet to ORC
© 2017 IBM Corporation11
Big SQL Architecture (as of 2016)
Head Node
Worker Node
Worker Node
Worker Node
Parquet
IO
Hive
Compat.
IO
Hive
Metastore
HDFS
HDFS
NN
© 2017 IBM Corporation12
2017: Big SQL on HDP
Most popular
data format
on HDP is
ORC
ORC
performance
becomes top
priority
© 2017 IBM Corporation13
0
5000
10000
15000
20000
25000
30000
35000
1 Stream 4 Streams
ElapsedTime(sec)
Parquet vs ORC
1TB TPC-DS
Parquet ORC v0
70% Slower
with ORC
ORC vs Parquet in Big SQL 4.2 > 50000
315% Slower
with ORC
© 2017 IBM Corporation14
Limitations of Hive Compatibility IO Engine
Slow
Ingestion
Single row at a
time
JIT unfriendly
Data values as
Java objects
Low
Scalability
Large memory
footprint per
scan
Excessive
CPU use
Overloaded
disks
© 2017 IBM Corporation15
The Roadmap Towards Fast ORC ingestion
1st Phase
• Big SQL 4.2.5, Dec ‘16
• Fast ORC Ingestion
2nd Phase
• Big SQL 5.0.1, Aug ‘17
• ORC at Scale
© 2017 IBM Corporation16
1st Phase – Fast Ingestion using Apache ORC
0
5000
10000
15000
20000
25000
1 Stream 4 Streams
ElapsedTime(sec)
Parquet vs ORC
1TB TPC-DS
Parquet ORC v1
Apache ORC libs key benefits
▪ Many-row-at-a-time API
▪ Enable JIT-friendly code
▪ Represent data using
primitive Java types
▪ Make projection and
selection pushdown
very easy
2% Faster
with ORC
65% Slower
with ORC
© 2017 IBM Corporation17
2nd Phase - Managing Resources
0
2000
4000
6000
8000
10000
12000
14000
1 Stream 4 Streams
ElapsedTime(sec)
Parquet vs ORC
1TB TPC-DS
Parquet ORC v2
Resource Manager has global
oversight over
▪ Total number of threads
▪ Overall JVM heap
consumption
▪ Degree of parallelism
per scan
15% Faster
with ORC
3.4% Faster
with ORC
© 2017 IBM Corporation18
ORC as a First Class Citizen in 5.0.1
Head Node
Worker Node
Worker Node
Worker Node
Parquet
IO
Hive
Compat.
IO
Hive
Metastore
HDFS
HDFS
NN
ORC
IO
© 2017 IBM Corporation19
ORC Background
© 2017 IBM Corporation20
What is Apache ORC?
ORC = efficient
storage + fast
ingestion
Compression
• Type-specific
encodings (RLE for
numbers, dictionary
for strings, etc)
• Generic compression
(Zlib, Snappy)
Data skipping
• Column skipping
based on data layout
• Row skipping based
on MIN/MAX stats
and bloom filters
JIT friendly
• Vectorized APIs
(retrieve data as
arrays of primitive
values)
▪ Engines leverage all
these features
▪ Apache ORC libs allow
applications to leverage
them too
© 2017 IBM Corporation21
ORC Physical Data Layout
CREATE HADOOP TABLE SALES(id INTEGER,
quantity INTEGER,
amount DOUBLE)
Stripe stats
Stripe stats
Stripe stats
File stats
Stripe (HDFS block)
Row group (10K rows)
10K id values
10K quantity values
10K amount values
Row group stats
© 2017 IBM Corporation22
Leveraging the Apache ORC Libraries
© 2017 IBM Corporation23
Dependencies and Classes
▪ Java Dependencies (orc.apache.org group id in Maven)
orc-core-1.4.0-nohive.jar
aircompressor-0.3.jar
▪ Java Classes for “vectorized” processing
import org.apache.orc.OrcFile;
import org.apache.orc.Reader;
import org.apache.orc.RecordReader;
import org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch;
import org.apache.orc.storage.ql.exec.vector.DoubleColumnVector;
import org.apache.orc.storage.ql.io.sarg.SearchArgument;
© 2017 IBM Corporation24
Using Vectorized ORC APIs
Reader r = OrcFile.createReader(path, OrcFile.readerOptions(conf));
RecordReader rr = r.rows();
VectorizedRowBatch batch = r.getSchema().createRowBatch(1000);
1000 Values
ID
QUANTITY
AMOUNT
long quantity[1000];
long id[1000];
double amount[1000];
▪ A vectorized row batch is a Java object that contains 1000 decoded rows
© 2017 IBM Corporation25
JIT friendly code
// Compute sum(amount)
double sum = 0;
while (rr.nextBatch(batch)) {
long[] qty = ((LongColumnVector) batch.cols[1]).vector;
double[] amt = ((DoubleColumnVector) batch.cols[2]).vector;
for (int i=0; i < batch.size; i++)
if (qty[i] < 500)
sum += amt[i];
}
▪ Get the total for sales involving less than 500 items
SELECT sum(amount)
FROM sales
WHERE quantity < 500
• No objects
• No method calls
• Tight loop compiles
to machine code
© 2017 IBM Corporation26
Column Skipping/Pruning (a.k.a. Projection Pushdown)
▪ If we don’t say otherwise, ORC will read all the columns
▪ But our query is using only two columns
ID
QUANTITY
AMOUNT
SELECT sum(amount)
FROM sales
WHERE quantity < 500
© 2017 IBM Corporation27
Column Skipping/Pruning (a.k.a. Projection Pushdown)
// Projection
boolean projection[] = new boolean[] {false, true, true});
// ORC RecordReader with projection pushdown
RecordReader rr = r.rows(
new Reader.Options()
.include(projection));
// Compute sum(amount)
double sum = 0;
while (rr.nextBatch(batch)) { … }
SELECT sum(amount)
FROM sales
WHERE quantity < 500
© 2017 IBM Corporation28
Column Skipping/Pruning (a.k.a. Projection Pushdown)
create external hadoop table web_sales (
ws_sold_date_sk int, ws_sold_time_sk int, ws_ship_date_sk int, ws_item_sk int not null, ws_bill_customer_sk int,
ws_bill_cdemo_sk int, ws_bill_hdemo_sk int, ws_bill_addr_sk int, ws_ship_customer_sk int, ws_ship_cdemo_sk int,
ws_ship_hdemo_sk int, ws_ship_addr_sk int, ws_web_page_sk int, ws_web_site_sk int, ws_ship_mode_sk int, ws_warehouse_sk int,
ws_promo_sk int, ws_order_number bigint not null, ws_quantity bigint, ws_wholesale_cost double, ws_list_price double,
ws_sales_price double, ws_ext_discount_amt double, ws_ext_sales_price double, ws_ext_wholesale_cost double,
ws_ext_list_price double, ws_ext_tax double, ws_coupon_amt double, ws_ext_ship_cost double, ws_net_paid double,
ws_net_paid_inc_tax double, ws_net_paid_inc_ship double, ws_net_paid_inc_ship_tax double, ws_net_profit double
)
STORED AS ORC;
0
5
10
15
20
25
30
With Projection Pushdown Without Projection Pushdown
Elapsedtime(seconds)
SELECT max(ws_sold_date_sk)
FROM tpcds_1tb.web_sales
3.5
24.9
© 2017 IBM Corporation29
Row Skipping/Pruning (a.k.a. Predicate Pushdown)
▪ Row skipping leverages the MIN/MAX stats
quantity
MIN:274,MAX:590
quantity
MIN:603,MAX:3000
quantity
MIN:510, Max:540
quantity
MIN:330, Max:420
Stripes Row groups
SELECT sum(amount)
FROM sales
WHERE quantity < 500
• Data must be sorted for
pruning to be effective!
© 2017 IBM Corporation30
Row Skipping/Pruning (a.k.a. Predicate Pushdown)
// Predicates
SearchArgument selection = SearchArgumentFactory.newBuilder()
.lessThan("quantity", PredicateLeaf.Type.LONG, 500L).build();
// ORC RecordReader with projection and selection pushdown
RecordReader rr = r.rows(
new Reader.Options()
.include(projection)
.searchArgument(selection, new String[] {}));
// Compute sum(amount)
double sum =0;
while (rr.nextBatch(batch)) { ... }
SELECT sum(amount)
FROM sales
WHERE quantity < 500
© 2017 IBM Corporation31
Scaling Up
© 2017 IBM Corporation32
Scaling Up
▪ Reading too many files hurts performance and can cause OOM
▪ How Many?
 Number of disks
 Java Heap size
▪ Must have multiple disks AND the files must be evenly distributed
across the disks
▪ But number of threads must be limited by the Java heap size!
 Need 30 to 80 MB of Java Heap per open ORC file/split
 Biggest consumers
• RecordReader
• VectorizedRowBatch
© 2017 IBM Corporation33
Scaling Up In an Engine
▪ An engine must handle concurrency (multiple queries/users)
▪ Fixed # of open files per scan leads to OOMs
▪ Must gracefully degrade performance instead of OOM
▪ Need to limit total open files
▪ Multiple issues to deal with
 Starvation (all queries must make reasonable progress)
 Stragglers (work for a scan must be evenly balanced across nodes)
 Adapting parallelism to concurrency
• Single query must take full advantage of the available resources
• As concurrency increases, parallelism decreases for ongoing scans
• When concurrency decreases, parallelism per scan increases
© 2017 IBM Corporation34
Big SQL Tuning for ORC
• The “fundamentals” (regardless of the data format)
 Ensure Big SQL has enough resources
• % memory
• % CPU
• Temp storage spread across multiple disks (e.g. same disks as HDFS)
 Use Partitioning
 Run ANALYZE on all your tables (or ensure AUTO ANALYZE is enabled)
▪ ORC-specific tuning (Big SQL 5.0.1 & up)
 Property bigsql.java.orc.preftp.size controls the max number of open ORC files
▪ ORC file creation
 Data sorted by filtering columns
 Stripe and row group size
 Bloom filters
▪ For more Big SQL tuning tips, see
https://developer.ibm.com/hadoop/2016/11/16/top-6-big-sql-v4-2-performance-tips/
© 2017 IBM Corporation35
Summary
▪ ORC = Storage Efficiency + Fast Ingestion
▪ Fast ingestion using Vectorized APIs in Apache ORC
▪ Big SQL performs best with data in ORC format!
© 2017 IBM Corporation36
Questions?
© 2017 IBM Corporation37
Thank you!

More Related Content

Ingesting Data at Blazing Speed Using Apache Orc

  • 1. © 2017 IBM Corporation Ingesting Data at Blazing Speed with Apache ORC Gustavo Arocena IBM Toronto Lab
  • 2. © 2017 IBM Corporation2 IBM Canada Lab Toronto
  • 3. © 2017 IBM Corporation3 Acknowledgements and Disclaimers Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. © Copyright IBM Corporation 2017. All rights reserved. U.S. Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM, the IBM logo, ibm.com, BigInsights, and Big SQL are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or TM), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at ▪“Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml ▪TPC Benchmark, TPC-DS, and QphDS are trademarks of Transaction Processing Performance Council ▪Cloudera, the Cloudera logo, Cloudera Impala are trademarks of Cloudera. ▪Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the United States and other countries. ▪Other company, product, or service names may be trademarks or service marks of others.
  • 4. © 2017 IBM Corporation4 Agenda What is Big SQL? From Parquet to ORC Reading ORC Files Fast Scaling Up Tuning Big SQL for ORC
  • 5. © 2017 IBM Corporation5 Big SQL Background
  • 6. © 2017 IBM Corporation6 What is Big SQL? Data Security Metastore Cluster Mgmt. Administration Runs on Data Platform
  • 7. © 2017 IBM Corporation7 What’s the Big Deal? Grace Under Pressure
  • 8. © 2017 IBM Corporation8 The Big SQL Advantages Scale • Only engine to run TPC-DS at 100TB scale Complex SQL • Capable of running all 99 TPC-DS queries since 2014 • Complex queries optimized with IBM Cost-based optimizer Concurrency • Handles highly concurrent workloads gracefully • 12 stream TPC-DS Efficient Resource Utilization • Memory • CPU • IO What’s the Big Deal?
  • 9. © 2017 IBM Corporation9 Metrics for Big SQL 4.2.5 vs. Spark SQL 2.1 ▪ Hadoop DS @ 100TB, 4 streams 13.7 43.2 BIG SQL SPARK SQL Hours Elapsed Time 76.4 88.2 BIG SQL SPARK SQL % CPU Utilization 107 388 BIG SQL SPARK SQL MB/Sec Disk Reads 25 237 BIG SQL SPARK SQL MB/Sec Disk Writes - 15% 1/3 1/3 1/9 https://developer.ibm.com/hadoop/2017/02/07/experiences-comparing-big-sql-and-spark-sql-at-100tb/
  • 10. © 2017 IBM Corporation10 From Parquet to ORC
  • 11. © 2017 IBM Corporation11 Big SQL Architecture (as of 2016) Head Node Worker Node Worker Node Worker Node Parquet IO Hive Compat. IO Hive Metastore HDFS HDFS NN
  • 12. © 2017 IBM Corporation12 2017: Big SQL on HDP Most popular data format on HDP is ORC ORC performance becomes top priority
  • 13. © 2017 IBM Corporation13 0 5000 10000 15000 20000 25000 30000 35000 1 Stream 4 Streams ElapsedTime(sec) Parquet vs ORC 1TB TPC-DS Parquet ORC v0 70% Slower with ORC ORC vs Parquet in Big SQL 4.2 > 50000 315% Slower with ORC
  • 14. © 2017 IBM Corporation14 Limitations of Hive Compatibility IO Engine Slow Ingestion Single row at a time JIT unfriendly Data values as Java objects Low Scalability Large memory footprint per scan Excessive CPU use Overloaded disks
  • 15. © 2017 IBM Corporation15 The Roadmap Towards Fast ORC ingestion 1st Phase • Big SQL 4.2.5, Dec ‘16 • Fast ORC Ingestion 2nd Phase • Big SQL 5.0.1, Aug ‘17 • ORC at Scale
  • 16. © 2017 IBM Corporation16 1st Phase – Fast Ingestion using Apache ORC 0 5000 10000 15000 20000 25000 1 Stream 4 Streams ElapsedTime(sec) Parquet vs ORC 1TB TPC-DS Parquet ORC v1 Apache ORC libs key benefits ▪ Many-row-at-a-time API ▪ Enable JIT-friendly code ▪ Represent data using primitive Java types ▪ Make projection and selection pushdown very easy 2% Faster with ORC 65% Slower with ORC
  • 17. © 2017 IBM Corporation17 2nd Phase - Managing Resources 0 2000 4000 6000 8000 10000 12000 14000 1 Stream 4 Streams ElapsedTime(sec) Parquet vs ORC 1TB TPC-DS Parquet ORC v2 Resource Manager has global oversight over ▪ Total number of threads ▪ Overall JVM heap consumption ▪ Degree of parallelism per scan 15% Faster with ORC 3.4% Faster with ORC
  • 18. © 2017 IBM Corporation18 ORC as a First Class Citizen in 5.0.1 Head Node Worker Node Worker Node Worker Node Parquet IO Hive Compat. IO Hive Metastore HDFS HDFS NN ORC IO
  • 19. © 2017 IBM Corporation19 ORC Background
  • 20. © 2017 IBM Corporation20 What is Apache ORC? ORC = efficient storage + fast ingestion Compression • Type-specific encodings (RLE for numbers, dictionary for strings, etc) • Generic compression (Zlib, Snappy) Data skipping • Column skipping based on data layout • Row skipping based on MIN/MAX stats and bloom filters JIT friendly • Vectorized APIs (retrieve data as arrays of primitive values) ▪ Engines leverage all these features ▪ Apache ORC libs allow applications to leverage them too
  • 21. © 2017 IBM Corporation21 ORC Physical Data Layout CREATE HADOOP TABLE SALES(id INTEGER, quantity INTEGER, amount DOUBLE) Stripe stats Stripe stats Stripe stats File stats Stripe (HDFS block) Row group (10K rows) 10K id values 10K quantity values 10K amount values Row group stats
  • 22. © 2017 IBM Corporation22 Leveraging the Apache ORC Libraries
  • 23. © 2017 IBM Corporation23 Dependencies and Classes ▪ Java Dependencies (orc.apache.org group id in Maven) orc-core-1.4.0-nohive.jar aircompressor-0.3.jar ▪ Java Classes for “vectorized” processing import org.apache.orc.OrcFile; import org.apache.orc.Reader; import org.apache.orc.RecordReader; import org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch; import org.apache.orc.storage.ql.exec.vector.DoubleColumnVector; import org.apache.orc.storage.ql.io.sarg.SearchArgument;
  • 24. © 2017 IBM Corporation24 Using Vectorized ORC APIs Reader r = OrcFile.createReader(path, OrcFile.readerOptions(conf)); RecordReader rr = r.rows(); VectorizedRowBatch batch = r.getSchema().createRowBatch(1000); 1000 Values ID QUANTITY AMOUNT long quantity[1000]; long id[1000]; double amount[1000]; ▪ A vectorized row batch is a Java object that contains 1000 decoded rows
  • 25. © 2017 IBM Corporation25 JIT friendly code // Compute sum(amount) double sum = 0; while (rr.nextBatch(batch)) { long[] qty = ((LongColumnVector) batch.cols[1]).vector; double[] amt = ((DoubleColumnVector) batch.cols[2]).vector; for (int i=0; i < batch.size; i++) if (qty[i] < 500) sum += amt[i]; } ▪ Get the total for sales involving less than 500 items SELECT sum(amount) FROM sales WHERE quantity < 500 • No objects • No method calls • Tight loop compiles to machine code
  • 26. © 2017 IBM Corporation26 Column Skipping/Pruning (a.k.a. Projection Pushdown) ▪ If we don’t say otherwise, ORC will read all the columns ▪ But our query is using only two columns ID QUANTITY AMOUNT SELECT sum(amount) FROM sales WHERE quantity < 500
  • 27. © 2017 IBM Corporation27 Column Skipping/Pruning (a.k.a. Projection Pushdown) // Projection boolean projection[] = new boolean[] {false, true, true}); // ORC RecordReader with projection pushdown RecordReader rr = r.rows( new Reader.Options() .include(projection)); // Compute sum(amount) double sum = 0; while (rr.nextBatch(batch)) { … } SELECT sum(amount) FROM sales WHERE quantity < 500
  • 28. © 2017 IBM Corporation28 Column Skipping/Pruning (a.k.a. Projection Pushdown) create external hadoop table web_sales ( ws_sold_date_sk int, ws_sold_time_sk int, ws_ship_date_sk int, ws_item_sk int not null, ws_bill_customer_sk int, ws_bill_cdemo_sk int, ws_bill_hdemo_sk int, ws_bill_addr_sk int, ws_ship_customer_sk int, ws_ship_cdemo_sk int, ws_ship_hdemo_sk int, ws_ship_addr_sk int, ws_web_page_sk int, ws_web_site_sk int, ws_ship_mode_sk int, ws_warehouse_sk int, ws_promo_sk int, ws_order_number bigint not null, ws_quantity bigint, ws_wholesale_cost double, ws_list_price double, ws_sales_price double, ws_ext_discount_amt double, ws_ext_sales_price double, ws_ext_wholesale_cost double, ws_ext_list_price double, ws_ext_tax double, ws_coupon_amt double, ws_ext_ship_cost double, ws_net_paid double, ws_net_paid_inc_tax double, ws_net_paid_inc_ship double, ws_net_paid_inc_ship_tax double, ws_net_profit double ) STORED AS ORC; 0 5 10 15 20 25 30 With Projection Pushdown Without Projection Pushdown Elapsedtime(seconds) SELECT max(ws_sold_date_sk) FROM tpcds_1tb.web_sales 3.5 24.9
  • 29. © 2017 IBM Corporation29 Row Skipping/Pruning (a.k.a. Predicate Pushdown) ▪ Row skipping leverages the MIN/MAX stats quantity MIN:274,MAX:590 quantity MIN:603,MAX:3000 quantity MIN:510, Max:540 quantity MIN:330, Max:420 Stripes Row groups SELECT sum(amount) FROM sales WHERE quantity < 500 • Data must be sorted for pruning to be effective!
  • 30. © 2017 IBM Corporation30 Row Skipping/Pruning (a.k.a. Predicate Pushdown) // Predicates SearchArgument selection = SearchArgumentFactory.newBuilder() .lessThan("quantity", PredicateLeaf.Type.LONG, 500L).build(); // ORC RecordReader with projection and selection pushdown RecordReader rr = r.rows( new Reader.Options() .include(projection) .searchArgument(selection, new String[] {})); // Compute sum(amount) double sum =0; while (rr.nextBatch(batch)) { ... } SELECT sum(amount) FROM sales WHERE quantity < 500
  • 31. © 2017 IBM Corporation31 Scaling Up
  • 32. © 2017 IBM Corporation32 Scaling Up ▪ Reading too many files hurts performance and can cause OOM ▪ How Many?  Number of disks  Java Heap size ▪ Must have multiple disks AND the files must be evenly distributed across the disks ▪ But number of threads must be limited by the Java heap size!  Need 30 to 80 MB of Java Heap per open ORC file/split  Biggest consumers • RecordReader • VectorizedRowBatch
  • 33. © 2017 IBM Corporation33 Scaling Up In an Engine ▪ An engine must handle concurrency (multiple queries/users) ▪ Fixed # of open files per scan leads to OOMs ▪ Must gracefully degrade performance instead of OOM ▪ Need to limit total open files ▪ Multiple issues to deal with  Starvation (all queries must make reasonable progress)  Stragglers (work for a scan must be evenly balanced across nodes)  Adapting parallelism to concurrency • Single query must take full advantage of the available resources • As concurrency increases, parallelism decreases for ongoing scans • When concurrency decreases, parallelism per scan increases
  • 34. © 2017 IBM Corporation34 Big SQL Tuning for ORC • The “fundamentals” (regardless of the data format)  Ensure Big SQL has enough resources • % memory • % CPU • Temp storage spread across multiple disks (e.g. same disks as HDFS)  Use Partitioning  Run ANALYZE on all your tables (or ensure AUTO ANALYZE is enabled) ▪ ORC-specific tuning (Big SQL 5.0.1 & up)  Property bigsql.java.orc.preftp.size controls the max number of open ORC files ▪ ORC file creation  Data sorted by filtering columns  Stripe and row group size  Bloom filters ▪ For more Big SQL tuning tips, see https://developer.ibm.com/hadoop/2016/11/16/top-6-big-sql-v4-2-performance-tips/
  • 35. © 2017 IBM Corporation35 Summary ▪ ORC = Storage Efficiency + Fast Ingestion ▪ Fast ingestion using Vectorized APIs in Apache ORC ▪ Big SQL performs best with data in ORC format!
  • 36. © 2017 IBM Corporation36 Questions?
  • 37. © 2017 IBM Corporation37 Thank you!