Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Microsoft Azure Databricks
Microsoft Azure Databricks
Ease of Use
Generality
Runs Everywhere
Logistic Regression
140
120
100
80
40
20
0
60
Hadoop
Spark
0.9
Speed
Generality
Runs Everywhere
text_file = spark.textFile("hdfs://...")
text_file.flatMap(lambda line: line.split())
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a+b)
Word count in Spark's Python API
Speed
Generality
Runs Everywhere
Speed
Ease of Use
Runs Everywhere
Spark Core Engine
Spark SQL
Interactive
Queries
Spark MLlib
Machine
Learning
Spark
Streaming
Stream
Processing
GraphX
Graph
Computation
Speed
Ease of Use
Generality
Runs Everywhere
Spark Core Engine
Spark SQL
Interactive
Queries
Spark MLlib
Machine
Learning
Spark
Streaming
Stream
Processing
GraphX
Graph
Computation
Yarn Mesos
Standalone
Scheduler
Read from
HDFS
Write to
HDFS
Read from
HDFS
Write to
HDFS
Read from
HDFS
RDD
RDD
RDD
RDD
RDD
Transformations
ValueActions
Data Sources (HDFS, SQL, NoSQL, …)
Cluster Manager
Worker Nodes
Driver Program
SparkContext
Microsoft Azure Databricks
Microsoft Azure Databricks
Microsoft Azure
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
CONTROL EASE OF USE
Azure Data Lake
Analytics
Azure Data Lake Store
Azure Storage
Any Hadoop technology,
any distribution
Workload optimized,
managed clusters
Data Engineering in a
Job-as-a-service model
Azure Marketplace
HDP | CDH | MapR
Azure Data Lake
Analytics
IaaS Clusters Managed Clusters Big Data as-a-service
Azure HDInsight
Frictionless & Optimized
Spark clusters
Azure Databricks
BIGDATA
STORAGE
BIGDATA
ANALYTICS
ReducedAdministration
Microsoft Azure Databricks
Microsoft Azure Databricks

More Related Content

Microsoft Azure Databricks

  • 3. Ease of Use Generality Runs Everywhere Logistic Regression 140 120 100 80 40 20 0 60 Hadoop Spark 0.9
  • 4. Speed Generality Runs Everywhere text_file = spark.textFile("hdfs://...") text_file.flatMap(lambda line: line.split()) .map(lambda word: (word, 1)) .reduceByKey(lambda a, b: a+b) Word count in Spark's Python API
  • 6. Speed Ease of Use Runs Everywhere Spark Core Engine Spark SQL Interactive Queries Spark MLlib Machine Learning Spark Streaming Stream Processing GraphX Graph Computation
  • 7. Speed Ease of Use Generality Runs Everywhere Spark Core Engine Spark SQL Interactive Queries Spark MLlib Machine Learning Spark Streaming Stream Processing GraphX Graph Computation Yarn Mesos Standalone Scheduler
  • 8. Read from HDFS Write to HDFS Read from HDFS Write to HDFS Read from HDFS
  • 10. Data Sources (HDFS, SQL, NoSQL, …) Cluster Manager Worker Nodes Driver Program SparkContext
  • 14. Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative Workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Enhance Productivity Deploy Production Jobs & Workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST Build on secure & trusted cloud Scale without limits
  • 15. CONTROL EASE OF USE Azure Data Lake Analytics Azure Data Lake Store Azure Storage Any Hadoop technology, any distribution Workload optimized, managed clusters Data Engineering in a Job-as-a-service model Azure Marketplace HDP | CDH | MapR Azure Data Lake Analytics IaaS Clusters Managed Clusters Big Data as-a-service Azure HDInsight Frictionless & Optimized Spark clusters Azure Databricks BIGDATA STORAGE BIGDATA ANALYTICS ReducedAdministration

Editor's Notes

  1. Follow the instructions in insidesqlr\readme.txt