Spark SQL
Spark SQL
“SQL is a highly sought-after technical skill due to its ability to work with
nearly all databases.”
Ibro Palic, CEO of Resumes Templates
History and Evolution of Big Data
Technologies
Procedural
Programing
interface
Declarative
Queries Automatic
Optimization
So Far…
•Advanced
2
Analytics
Introducing
Keep Track of
Hashtags ##
# A Lazy Computation
Data Model and DataFrame
Operations
Spark SQL uses a nested data model based on Hive
It supports all major SQL data types, including boolean, integer, double,
decimal, string, date, timestamp and also User Defined Data types
#Heterogeneous
Data Sources
Schema Inference
Purposes
#Trees
#Rules
Catalyst Optimization Cont.
Data Sources
Examples :
CSV
Avro
Parquet
JDBC
Extension Points Cont.
User Defined Types (UDTs)
SQL Performance
Evaluation Cont.