Microsoft Azure Databricks

Ease of Use
Generality
Runs Everywhere
Logistic Regression
140
120
100
80
40
20
0
60
Hadoop
Spark
0.9

Speed
Generality
Runs Everywhere
text_file = spark.textFile("hdfs://...")
text_file.flatMap(lambda line: line.split())
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a+b)
Word count in Spark's Python API

Speed
Generality
Runs Everywhere

Speed
Ease of Use
Runs Everywhere
Spark Core Engine
Spark SQL
Interactive
Queries
Spark MLlib
Machine
Learning
Spark
Streaming
Stream
Processing
GraphX
Graph
Computation

Speed
Ease of Use
Generality
Runs Everywhere
Spark Core Engine
Spark SQL
Interactive
Queries
Spark MLlib
Machine
Learning
Spark
Streaming
Stream
Processing
GraphX
Graph
Computation
Yarn Mesos
Standalone
Scheduler

Read from
HDFS
Write to
HDFS
Read from
HDFS
Write to
HDFS
Read from
HDFS

RDD
RDD
RDD
RDD
RDD
Transformations
ValueActions

Data Sources (HDFS, SQL, NoSQL, …)
Cluster Manager
Worker Nodes
Driver Program
SparkContext

Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits

CONTROL EASE OF USE
Azure Data Lake
Analytics
Azure Data Lake Store
Azure Storage
Any Hadoop technology,
any distribution
Workload optimized,
managed clusters
Data Engineering in a
Job-as-a-service model
Azure Marketplace
HDP | CDH | MapR
Azure Data Lake
Analytics
IaaS Clusters Managed Clusters Big Data as-a-service
Azure HDInsight
Frictionless & Optimized
Spark clusters
Azure Databricks
BIGDATA
STORAGE
BIGDATA
ANALYTICS
ReducedAdministration

Microsoft Azure Databricks

More Related Content

Microsoft Azure Databricks

Editor's Notes