Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Introduction to Spark
Matei Zaharia
Databricks Intern Event, August 2015
What is Apache Spark?
Fast and general computing engine for clusters
Makes it easy and fast to process large datasets
• APIs in Java, Scala, Python, R
• Libraries for SQL, streaming, machine learning, …
• 100x faster than Hadoop MapReduce for some apps
About Databricks
Founded by creators of Spark in 2013
Offers a hosted cloud service built on Spark
• Interactive workspace with notebooks, dashboards, jobs
0
20
40
60
80
100
120
140
160
2010 2011 2012 2013 2014 2015
Contributors
Contributors / Month to Spark
Community Growth
Most active open source project in
big data
Spark Programming Model
Write programs in terms of transformations on
distributed datasets
Resilient Distributed Datasets (RDDs)
• Collections of objects stored in memory or disk across a cluster
• Built via parallel transformations (map, filter, …)
• Automatically rebuilt on failure
Example: Log Mining
Load error messages from a log into memory, then
interactively search for various patterns
lines	
  =	
  spark.textFile(“hdfs://...”)	
  
errors	
  =	
  lines.filter(lambda	
  s:	
  s.startswith(“ERROR”))	
  
messages	
  =	
  errors.map(lambda	
  s:	
  s.split(‘t’)[2])	
  
messages.cache()	
  
Block 1
Block 2
Block 3
Worker
Worker
Worker
Driver
messages.filter(lambda	
  s:	
  “MySQL”	
  in	
  s).count()	
  
messages.filter(lambda	
  s:	
  “Redis”	
  in	
  s).count()	
  
.	
  .	
  .	
  
tasks
results
Cache 1
Cache 2
Cache 3
Base RDD
Transformed RDD
Action
Result: full-text search of Wikipedia in
0.5 sec (vs 20s for on-disk data)
Example: Logistic Regression
0
500
1000
1500
2000
2500
3000
3500
4000
1 5 10 20 30
RunningTime(s)
Number of Iterations
Hadoop
Spark
110 s / iteration
first iteration 80 s
further iterations 1 s
Iterative algorithm used in machine learning
Source: Daytona GraySort benchmark, sortbenchmark.org
2100 machines2013 Record:
Hadoop
72 minutes
2014 Record:
Spark
207 machines
23 minutes
On-Disk Performance
Time to sort 100TB
Higher-Level Libraries
Spark
Spark
Streaming
real-time
Spark SQL
structured data
MLlib
machine
learning
GraphX
graph
Higher-Level Libraries
//	
  Load	
  data	
  using	
  SQL	
  
points	
  =	
  ctx.sql(“select	
  latitude,	
  longitude	
  from	
  tweets”)	
  
//	
  Train	
  a	
  machine	
  learning	
  model	
  
model	
  =	
  KMeans.train(points,	
  10)	
  
//	
  Apply	
  it	
  to	
  a	
  stream	
  
sc.twitterStream(...)	
  
	
  	
  .map(lambda	
  t:	
  (model.predict(t.location),	
  1))	
  
	
  	
  .reduceByWindow(“5s”,	
  lambda	
  a,	
  b:	
  a	
  +	
  b)	
  
Demo
Over 1000 production users, clusters up to 8000 nodes
Many talks online at spark-summit.org
Spark Community
Introduction to Spark (Intern Event Presentation)
Ongoing Work
Speeding up Spark through code generation and
binary processing (Project Tungsten)
R interface to Spark (SparkR)
Real-time machine learning library
Frontend and backend work in Databricks
(visualization, collaboration, auto-scaling, …)
Thank you.
We’re hiring!

More Related Content

What's hot

Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
Pietro Michiardi
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Anastasios Skarlatidis
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Simplilearn
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
Knoldus Inc.
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
sudhakara st
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
Joud Khattab
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
Modern Data Stack France
 
Spark overview
Spark overviewSpark overview
Spark overview
Lisa Hua
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
Databricks
 
Apache spark
Apache sparkApache spark
Apache spark
shima jafari
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
Whizlabs
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
Jaemun Jung
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
 
Catalyst optimizer
Catalyst optimizerCatalyst optimizer
Catalyst optimizer
Ayub Mohammad
 

What's hot (20)

Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Spark overview
Spark overviewSpark overview
Spark overview
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
Apache spark
Apache sparkApache spark
Apache spark
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Catalyst optimizer
Catalyst optimizerCatalyst optimizer
Catalyst optimizer
 

Viewers also liked

Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Databricks
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
Databricks
 
2016 Spark Summit East Keynote: Matei Zaharia
2016 Spark Summit East Keynote: Matei Zaharia2016 Spark Summit East Keynote: Matei Zaharia
2016 Spark Summit East Keynote: Matei Zaharia
Databricks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
datamantra
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
Databricks
 
Parallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkRParallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkR
Databricks
 
Internship presentation
Internship presentationInternship presentation
Internship presentation
samcrosier
 
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of Databricks
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of DatabricksBig Data Day LA 2015 - Spark after Dark by Chris Fregly of Databricks
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of Databricks
Data Con LA
 
Spark - The beginnings
Spark -  The beginningsSpark -  The beginnings
Spark - The beginnings
Daniel Leon
 
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Chris Fregly
 
Apache Spark
Apache SparkApache Spark
Apache Spark
Mahdi Esmailoghli
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
Cloudera, Inc.
 
Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedin
Yukti Kaura
 
New directions for Apache Spark in 2015
New directions for Apache Spark in 2015New directions for Apache Spark in 2015
New directions for Apache Spark in 2015
Databricks
 
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
StampedeCon
 
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
Chris Fregly
 
Extreme-scale Ad-Tech using Spark and Databricks at MediaMath
Extreme-scale Ad-Tech using Spark and Databricks at MediaMathExtreme-scale Ad-Tech using Spark and Databricks at MediaMath
Extreme-scale Ad-Tech using Spark and Databricks at MediaMath
Spark Summit
 
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and REnabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Databricks
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
Martin Zapletal
 
Spark After Dark: Real time Advanced Analytics and Machine Learning with Spark
Spark After Dark:  Real time Advanced Analytics and Machine Learning with SparkSpark After Dark:  Real time Advanced Analytics and Machine Learning with Spark
Spark After Dark: Real time Advanced Analytics and Machine Learning with Spark
Chris Fregly
 

Viewers also liked (20)

Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
 
2016 Spark Summit East Keynote: Matei Zaharia
2016 Spark Summit East Keynote: Matei Zaharia2016 Spark Summit East Keynote: Matei Zaharia
2016 Spark Summit East Keynote: Matei Zaharia
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
Parallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkRParallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkR
 
Internship presentation
Internship presentationInternship presentation
Internship presentation
 
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of Databricks
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of DatabricksBig Data Day LA 2015 - Spark after Dark by Chris Fregly of Databricks
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of Databricks
 
Spark - The beginnings
Spark -  The beginningsSpark -  The beginnings
Spark - The beginnings
 
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedin
 
New directions for Apache Spark in 2015
New directions for Apache Spark in 2015New directions for Apache Spark in 2015
New directions for Apache Spark in 2015
 
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
 
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
Madrid Spark Big Data Bluemix Meetup - Spark Versus Hadoop @ 100 TB Daytona G...
 
Extreme-scale Ad-Tech using Spark and Databricks at MediaMath
Extreme-scale Ad-Tech using Spark and Databricks at MediaMathExtreme-scale Ad-Tech using Spark and Databricks at MediaMath
Extreme-scale Ad-Tech using Spark and Databricks at MediaMath
 
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and REnabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and R
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
 
Spark After Dark: Real time Advanced Analytics and Machine Learning with Spark
Spark After Dark:  Real time Advanced Analytics and Machine Learning with SparkSpark After Dark:  Real time Advanced Analytics and Machine Learning with Spark
Spark After Dark: Real time Advanced Analytics and Machine Learning with Spark
 

Similar to Introduction to Spark (Intern Event Presentation)

Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
C4Media
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
Databricks
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
Databricks
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
Vienna Data Science Group
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
 
Artigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdfArtigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdf
WalmirCouto3
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferret
Andrii Gakhov
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
Databricks
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
YahooTechConference
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
DataStax Academy
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
Amir Sedighi
 
Apache Spark RDDs
Apache Spark RDDsApache Spark RDDs
Apache Spark RDDs
Dean Chen
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
MapR Technologies
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
Ned Shawa
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study Notes
Richard Kuo
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You Think
Ike Ellis
 
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
IT Event
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
Inhacking
 

Similar to Introduction to Spark (Intern Event Presentation) (20)

Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache SparkEnd-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Artigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdfArtigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdf
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferret
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Apache Spark RDDs
Apache Spark RDDsApache Spark RDDs
Apache Spark RDDs
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study Notes
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You Think
 
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Dombivli @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai Available
Dombivli @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai AvailableDombivli @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai Available
Dombivli @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai Available
cristine510
 
Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...
Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...
Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple StepsSeamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Estuary Flow
 
@Call @Girls in Kolkata 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Best High Class Kolkata Avaulable
 @Call @Girls in Kolkata 🐱‍🐉  XXXXXXXXXX 🐱‍🐉  Best High Class Kolkata Avaulable @Call @Girls in Kolkata 🐱‍🐉  XXXXXXXXXX 🐱‍🐉  Best High Class Kolkata Avaulable
@Call @Girls in Kolkata 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Best High Class Kolkata Avaulable
DiyaSharma6551
 
Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)
miso_uam
 
dachnug51 - HCL Domino Roadmap .pdf
dachnug51 - HCL Domino Roadmap      .pdfdachnug51 - HCL Domino Roadmap      .pdf
dachnug51 - HCL Domino Roadmap .pdf
DNUG e.V.
 
@Call @Girls in Tirunelveli 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Cla...
 @Call @Girls in Tirunelveli 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Cla... @Call @Girls in Tirunelveli 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Cla...
@Call @Girls in Tirunelveli 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Cla...
JoyaBansal
 
ANSYS Mechanical APDL Introductory Tutorials.pdf
ANSYS Mechanical APDL Introductory Tutorials.pdfANSYS Mechanical APDL Introductory Tutorials.pdf
ANSYS Mechanical APDL Introductory Tutorials.pdf
sachin chaurasia
 
Migrate your Infrastructure to the AWS Cloud
Migrate your Infrastructure to the AWS CloudMigrate your Infrastructure to the AWS Cloud
Migrate your Infrastructure to the AWS Cloud
Ortus Solutions, Corp
 
Chennai @Call @Girls 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real Meet
Chennai @Call @Girls 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real MeetChennai @Call @Girls 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real Meet
Chennai @Call @Girls 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real Meet
lovelykumarilk789
 
ENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentationENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentation
sofiafernandezon
 
@Call @Girls in Ahmedabad 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Best High Class Ahmedabad Ava...
 @Call @Girls in Ahmedabad 🐱‍🐉  XXXXXXXXXX 🐱‍🐉  Best High Class Ahmedabad Ava... @Call @Girls in Ahmedabad 🐱‍🐉  XXXXXXXXXX 🐱‍🐉  Best High Class Ahmedabad Ava...
@Call @Girls in Ahmedabad 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Best High Class Ahmedabad Ava...
DiyaSharma6551
 
@Call @Girls in Saharanpur 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Clas...
 @Call @Girls in Saharanpur 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Clas... @Call @Girls in Saharanpur 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Clas...
@Call @Girls in Saharanpur 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Clas...
AlinaDevecerski
 
@Call @Girls in Solapur 🤷‍♂️ XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class S...
 @Call @Girls in Solapur 🤷‍♂️  XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class S... @Call @Girls in Solapur 🤷‍♂️  XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class S...
@Call @Girls in Solapur 🤷‍♂️ XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class S...
Mona Rathore
 
₹Call ₹Girls Andheri West 09967584737 Deshi Chori Near You
₹Call ₹Girls Andheri West 09967584737 Deshi Chori Near You₹Call ₹Girls Andheri West 09967584737 Deshi Chori Near You
₹Call ₹Girls Andheri West 09967584737 Deshi Chori Near You
shristi verma
 
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
avufu
 
Kolkata @Call @Girls 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real Meet
Kolkata @Call @Girls 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real MeetKolkata @Call @Girls 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real Meet
Kolkata @Call @Girls 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real Meet
lovelykumarilk789
 
Ghatkopar @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai Available
Ghatkopar @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai AvailableGhatkopar @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai Available
Ghatkopar @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai Available
aviva54
 
Top 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your WebsiteTop 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your Website
e-Definers Technology
 
Folding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a seriesFolding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a series
Philip Schwarz
 

Recently uploaded (20)

Dombivli @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai Available
Dombivli @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai AvailableDombivli @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai Available
Dombivli @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai Available
 
Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...
Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...
Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...
 
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple StepsSeamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
 
@Call @Girls in Kolkata 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Best High Class Kolkata Avaulable
 @Call @Girls in Kolkata 🐱‍🐉  XXXXXXXXXX 🐱‍🐉  Best High Class Kolkata Avaulable @Call @Girls in Kolkata 🐱‍🐉  XXXXXXXXXX 🐱‍🐉  Best High Class Kolkata Avaulable
@Call @Girls in Kolkata 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Best High Class Kolkata Avaulable
 
Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)
 
dachnug51 - HCL Domino Roadmap .pdf
dachnug51 - HCL Domino Roadmap      .pdfdachnug51 - HCL Domino Roadmap      .pdf
dachnug51 - HCL Domino Roadmap .pdf
 
@Call @Girls in Tirunelveli 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Cla...
 @Call @Girls in Tirunelveli 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Cla... @Call @Girls in Tirunelveli 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Cla...
@Call @Girls in Tirunelveli 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Cla...
 
ANSYS Mechanical APDL Introductory Tutorials.pdf
ANSYS Mechanical APDL Introductory Tutorials.pdfANSYS Mechanical APDL Introductory Tutorials.pdf
ANSYS Mechanical APDL Introductory Tutorials.pdf
 
Migrate your Infrastructure to the AWS Cloud
Migrate your Infrastructure to the AWS CloudMigrate your Infrastructure to the AWS Cloud
Migrate your Infrastructure to the AWS Cloud
 
Chennai @Call @Girls 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real Meet
Chennai @Call @Girls 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real MeetChennai @Call @Girls 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real Meet
Chennai @Call @Girls 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real Meet
 
ENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentationENISA Threat Landscape 2023 documentation
ENISA Threat Landscape 2023 documentation
 
@Call @Girls in Ahmedabad 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Best High Class Ahmedabad Ava...
 @Call @Girls in Ahmedabad 🐱‍🐉  XXXXXXXXXX 🐱‍🐉  Best High Class Ahmedabad Ava... @Call @Girls in Ahmedabad 🐱‍🐉  XXXXXXXXXX 🐱‍🐉  Best High Class Ahmedabad Ava...
@Call @Girls in Ahmedabad 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Best High Class Ahmedabad Ava...
 
@Call @Girls in Saharanpur 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Clas...
 @Call @Girls in Saharanpur 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Clas... @Call @Girls in Saharanpur 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Clas...
@Call @Girls in Saharanpur 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Tanisha Sharma Best High Clas...
 
@Call @Girls in Solapur 🤷‍♂️ XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class S...
 @Call @Girls in Solapur 🤷‍♂️  XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class S... @Call @Girls in Solapur 🤷‍♂️  XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class S...
@Call @Girls in Solapur 🤷‍♂️ XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class S...
 
₹Call ₹Girls Andheri West 09967584737 Deshi Chori Near You
₹Call ₹Girls Andheri West 09967584737 Deshi Chori Near You₹Call ₹Girls Andheri West 09967584737 Deshi Chori Near You
₹Call ₹Girls Andheri West 09967584737 Deshi Chori Near You
 
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
 
Kolkata @Call @Girls 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real Meet
Kolkata @Call @Girls 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real MeetKolkata @Call @Girls 🐱‍🐉  XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real Meet
Kolkata @Call @Girls 🐱‍🐉 XXXXXXXXXX 🐱‍🐉 Genuine WhatsApp Number for Real Meet
 
Ghatkopar @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai Available
Ghatkopar @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai AvailableGhatkopar @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai Available
Ghatkopar @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai Available
 
Top 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your WebsiteTop 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your Website
 
Folding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a seriesFolding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a series
 

Introduction to Spark (Intern Event Presentation)

  • 1. Introduction to Spark Matei Zaharia Databricks Intern Event, August 2015
  • 2. What is Apache Spark? Fast and general computing engine for clusters Makes it easy and fast to process large datasets • APIs in Java, Scala, Python, R • Libraries for SQL, streaming, machine learning, … • 100x faster than Hadoop MapReduce for some apps
  • 3. About Databricks Founded by creators of Spark in 2013 Offers a hosted cloud service built on Spark • Interactive workspace with notebooks, dashboards, jobs
  • 4. 0 20 40 60 80 100 120 140 160 2010 2011 2012 2013 2014 2015 Contributors Contributors / Month to Spark Community Growth Most active open source project in big data
  • 5. Spark Programming Model Write programs in terms of transformations on distributed datasets Resilient Distributed Datasets (RDDs) • Collections of objects stored in memory or disk across a cluster • Built via parallel transformations (map, filter, …) • Automatically rebuilt on failure
  • 6. Example: Log Mining Load error messages from a log into memory, then interactively search for various patterns lines  =  spark.textFile(“hdfs://...”)   errors  =  lines.filter(lambda  s:  s.startswith(“ERROR”))   messages  =  errors.map(lambda  s:  s.split(‘t’)[2])   messages.cache()   Block 1 Block 2 Block 3 Worker Worker Worker Driver messages.filter(lambda  s:  “MySQL”  in  s).count()   messages.filter(lambda  s:  “Redis”  in  s).count()   .  .  .   tasks results Cache 1 Cache 2 Cache 3 Base RDD Transformed RDD Action Result: full-text search of Wikipedia in 0.5 sec (vs 20s for on-disk data)
  • 7. Example: Logistic Regression 0 500 1000 1500 2000 2500 3000 3500 4000 1 5 10 20 30 RunningTime(s) Number of Iterations Hadoop Spark 110 s / iteration first iteration 80 s further iterations 1 s Iterative algorithm used in machine learning
  • 8. Source: Daytona GraySort benchmark, sortbenchmark.org 2100 machines2013 Record: Hadoop 72 minutes 2014 Record: Spark 207 machines 23 minutes On-Disk Performance Time to sort 100TB
  • 10. Higher-Level Libraries //  Load  data  using  SQL   points  =  ctx.sql(“select  latitude,  longitude  from  tweets”)   //  Train  a  machine  learning  model   model  =  KMeans.train(points,  10)   //  Apply  it  to  a  stream   sc.twitterStream(...)      .map(lambda  t:  (model.predict(t.location),  1))      .reduceByWindow(“5s”,  lambda  a,  b:  a  +  b)  
  • 11. Demo
  • 12. Over 1000 production users, clusters up to 8000 nodes Many talks online at spark-summit.org Spark Community
  • 14. Ongoing Work Speeding up Spark through code generation and binary processing (Project Tungsten) R interface to Spark (SparkR) Real-time machine learning library Frontend and backend work in Databricks (visualization, collaboration, auto-scaling, …)

Editor's Notes

  1. Add “variables” to the “functions” in functional programming
  2. 100 GB of data on 50 m1.xlarge EC2 machines
  3. Alibab, tenzent At Berkeley, we have been working on a solution since 2009. This solution consists of a software stack for data analytics, called the Berkeley Data Analytics Stack. The centerpiece of this stack is Spark. Spark has seen significant adoption with hundreds of companies using it, out of which around sixteen companies have contributed back the code. In addition, Spark has been deployed on clusters that exceed 1,000 nodes.