Apache Iceberg Presentation for the St. Louis Big Data IDEA

2
© 2021 Cloudera, Inc. All rights reserved.
What is Apache Iceberg?
• Eﬃcient Table Format
– Hidden Partitioning
– Schema Evolution
– Time Travel
• Presto, Hive, Spark
• Created at Netﬂix (2017).
• Used at Adobe, Apple, LinkedIn,
Experian

3
What are the Challenges?
• Data Scalability
• Atomicity
• Performance Degradation
• Complexity
• Object Stores
• Storage and Compute
• File System (Listing)

5
Architecture
Spark Presto
HDFS Object Store
Iceberg

6
Architecture
Snapshot (01)
Manifest List
Manifest
Files
Manifest
Manifest List
Snapshot (02)
Files Files

8
Initial Setup
• Catalogs
– Working with SQL
– System Information

9
Spark
spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.11.0
--conf
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog

--conf spark.sql.catalog.spark_catalog.type=hive
--conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog
--conf spark.sql.catalog.local.type=hadoop
--conf spark.sql.catalog.local.warehouse=$PWD/warehouse
Adding a Catalog
Creating a Table
CREATE TABLE local.db.table (id bigint, data string) USING iceberg

10
Hive
add jar /path/to/iceberg-hive-runtime.jar;
Add the jar ﬁle
Create an External Table
CREATE EXTERNAL TABLE table_a
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION 'hdfs://some_bucket/some_path/table_a';

12
References
Apache Iceberg: https://iceberg.apache.org/
Project Nessie: https://projectnessie.org/
Hive/Iceberg Integration: https://github.com/ExpediaGroup/hiveberg
Partitioning:
https://developer.ibm.com/technologies/artiﬁcial-intelligence/articles/the-why-and-how-of-partitioning-in-apache-iceberg/?utm_source=the
newstack&utm_medium=website&utm_campaign=platform
Iceberg Explained: https://thenewstack.io/apache-iceberg-a-different-table-design-for-big-data/

Apache Iceberg Presentation for the St. Louis Big Data IDEA

Apache Iceberg Presentation for the St. Louis Big Data IDEA

Related slideshows

More Related Content

Apache Iceberg Presentation for the St. Louis Big Data IDEA