Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Apache Spark
Agenda
Hadoop vs Spark: Big ‘Big Data’ question
Spark Ecosystem
What is RDD
Operations on RDD: Actions vs
Transformations
Running in cluster
Task schedulers
Spark Streaming
Dataframes API
Let’s remember: MapReduce
Apache Hadoop MapReduce
Hadoop VS/AND Spark
Hadoop: DFS
Spark: Speed (RAM)
Spark ecosystem
Glossary
Job
RDD
Stages
Tasks
DAG
Executor
Driver
Simple Example
RDD: Resilient Distributed Dataset
Represents an immutable, partitioned collection of elements that can be
operated in parallel with failure recovery possibilities.
Example
Hadoop RDD
getPartitions = HDFS blocks
getDependencies = None
compute = load block in memory
getPrefferedLocations = HDFS block locations
partitioner = None
MapPartitions RDD
getPartitions = same as parent
getDependencies = parent RDD
compute = compute parent and apply map()
getPrefferedLocations = same as parent
partitioner = None
RDD: Resilient Distributed Dataset
RDD Example
RDD Example
RDD Operations
● Transformations
○ Apply user function to every element in a partition
○ Apply aggregation function to a whole dataset
(groupBy, sortBy)
○ Provide functionality for repartitioning (repartition,
partitionBy)
● Actions
○ Materialize computation results (collect, count,
take)
○ Store RDDs in memory or on disk (cache, persist)
RDD Dependencies
DAG: Directed Acyclic Graph
All the operators in a job
are used to construct a
DAG (Directed Acyclic
Graph). The DAG is
optimized by rearranging
and combining operators
where possible.
DAG Example
DAG Scheduler
The DAG scheduler divides
operators into stages of
tasks. A stage is comprised
of tasks based on partitions
of the input data. Pipelines
operators together.
DAG Scheduler example
RDD Persistence: persist() & cache()
When you persist an RDD, each node stores any partitions of it that it computes in memory
and reuses them in other actions on that dataset (or datasets derived from it).
Storage levels: MEMORY_ONLY (default), MEMORY_AND_DISK,
MEMORY_ONLY_SER, MEMORY_AND_DISK_SER, DISK_ONLY,
MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc.
Removing data: least-recently-used (LRU) fashion or RDD.unpersist() method.
Job execution
Task Schedulers
Standalone
Default
FIFO strategy
Controls number of CPU
cores and executor
memory
YARN
Hadoop oriented
Takes all available
resources
Was designed for
stateless batch jobs
that can be restarted
easily if they fail.
Mesos
Resource oriented
Dynamic sharing or CPU
cores
Less predictive latency
Spark Driver (application)
Running in cluster
Memory usage
• Execution memory
• Storage for data needed during tasks execution
• Shuffle-related data
• Storage memory
• Cached RDDs
• Possible to borrow from execution memory
• User memory
• User data structures and internal metadata
• Safeguarding against OOM
• Reserved memory
• Memory needed for running executor itself
Spark Streaming
Spark Streaming: Basic Concept
Spark Streaming: Architecture
Spark Streaming receives live input data streams and divides the data into
batches, which are then processed by the Spark engine to generate the final
stream of results in batches.
Discretized Streams (DStreams)
Windowed computations
Spark Streaming checkpoints
• Create heavy objects in foreachRDD
• Default persistence level of DStreams keeps the data serialized in memory.
• Checkpointing (metadata and received data)
• Automatic restart (task manager)
• Max receiving rate
• Level of Parallelism
• Kryo serialization
Spark Streaming Example
Spark Dataframes
(SQL)
Apache Hive
• Hadoop product
• Stores metadata in the relational database, but data only in HDFS
• Is not suited for real time data processing
• Best used for batch jobs over large datasets of immutable data (web logs)
Is a good choice if you:
• Want to query the data
• When you’re familiar with SQL
About Spark SQL
Part of Spark core since April 2014
Works with structured data
Mixes SQL queries with Spark programs
Connect to any datasource (files, Hive
tables, external databases, RDDs)
Spark Dataframes
Spark Dataframes
Spark SQL
Spark SQL with schema
Dataframes benchmark
Q&A

More Related Content

What's hot

Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Sachin Aggarwal
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Edureka!
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
Joud Khattab
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
GauravBiswas9
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
sudhakara st
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
datamantra
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
Abdullah Çetin ÇAVDAR
 
What Is RDD In Spark? | Edureka
What Is RDD In Spark? | EdurekaWhat Is RDD In Spark? | Edureka
What Is RDD In Spark? | Edureka
Edureka!
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
Girish Khanzode
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
Databricks
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Spark overview
Spark overviewSpark overview
Spark overview
Lisa Hua
 

What's hot (20)

Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
 
What Is RDD In Spark? | Edureka
What Is RDD In Spark? | EdurekaWhat Is RDD In Spark? | Edureka
What Is RDD In Spark? | Edureka
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Spark overview
Spark overviewSpark overview
Spark overview
 

Viewers also liked

Qa talk-test manager-Oksana Kharchuk
Qa talk-test manager-Oksana KharchukQa talk-test manager-Oksana Kharchuk
Qa talk-test manager-Oksana Kharchuk
DataArt
 
Propiedad intelectual del soft ware
Propiedad intelectual del soft warePropiedad intelectual del soft ware
Propiedad intelectual del soft ware
Joel Quintana
 
The Rental Policies You Need to Know About
The Rental Policies You Need to Know AboutThe Rental Policies You Need to Know About
The Rental Policies You Need to Know About
UrbanBound
 
IR
IRIR
IR
MAK
 
Роман Еникеев - PHP или откуда взялся слон
Роман Еникеев - PHP или откуда взялся слонРоман Еникеев - PHP или откуда взялся слон
Роман Еникеев - PHP или откуда взялся слон
DataArt
 
Андрей Беляев "Мыслить как заказчик"
Андрей Беляев "Мыслить как заказчик"Андрей Беляев "Мыслить как заказчик"
Андрей Беляев "Мыслить как заказчик"
DataArt
 
photos
photosphotos
photos
diakxr
 
Visiting unpleasent places
Visiting unpleasent placesVisiting unpleasent places
Visiting unpleasent places
Arpanasa
 
Mapas etiquetas
Mapas etiquetasMapas etiquetas
Mapas etiquetas
Diego Rojas
 
Estrategika nuevos productos proteccion
Estrategika nuevos productos proteccionEstrategika nuevos productos proteccion
Estrategika nuevos productos proteccion
JUAN CARLOS CALDERON
 
Bit trade labs sovereign identity fintech summit 2016
Bit trade labs sovereign identity   fintech summit 2016Bit trade labs sovereign identity   fintech summit 2016
Bit trade labs sovereign identity fintech summit 2016
Glen Frost
 
Арсений Жижелев «Наблюдение за игровым миром Аллодов (Play+Scala+Slick+Postgr...
Арсений Жижелев «Наблюдение за игровым миром Аллодов (Play+Scala+Slick+Postgr...Арсений Жижелев «Наблюдение за игровым миром Аллодов (Play+Scala+Slick+Postgr...
Арсений Жижелев «Наблюдение за игровым миром Аллодов (Play+Scala+Slick+Postgr...
DataArt
 
นิทาน
นิทานนิทาน
นิทาน
ExitOfLove
 
Reader’s theater (1)
Reader’s theater (1)Reader’s theater (1)
Reader’s theater (1)
IIPCONX
 
Игорь Савка "Как выжить в безнадежном проекте. Личный опыт"
Игорь Савка "Как выжить в безнадежном проекте. Личный опыт"Игорь Савка "Как выжить в безнадежном проекте. Личный опыт"
Игорь Савка "Как выжить в безнадежном проекте. Личный опыт"
DataArt
 
Uses and gratification theory
Uses and gratification theoryUses and gratification theory
Uses and gratification theory
Abbey Cotterill
 
Joint venture
Joint ventureJoint venture
Joint venture
Shlagha Nayyar
 
Bio pharma vessels & tanks
Bio pharma vessels & tanksBio pharma vessels & tanks
Bio pharma vessels & tanks
Akshar Engineering Works
 
Android wear, Alexey Rybakov DataArt Kharkov
Android wear, Alexey Rybakov DataArt KharkovAndroid wear, Alexey Rybakov DataArt Kharkov
Android wear, Alexey Rybakov DataArt Kharkov
DataArt
 

Viewers also liked (20)

Qa talk-test manager-Oksana Kharchuk
Qa talk-test manager-Oksana KharchukQa talk-test manager-Oksana Kharchuk
Qa talk-test manager-Oksana Kharchuk
 
Propiedad intelectual del soft ware
Propiedad intelectual del soft warePropiedad intelectual del soft ware
Propiedad intelectual del soft ware
 
The Rental Policies You Need to Know About
The Rental Policies You Need to Know AboutThe Rental Policies You Need to Know About
The Rental Policies You Need to Know About
 
IR
IRIR
IR
 
Роман Еникеев - PHP или откуда взялся слон
Роман Еникеев - PHP или откуда взялся слонРоман Еникеев - PHP или откуда взялся слон
Роман Еникеев - PHP или откуда взялся слон
 
Андрей Беляев "Мыслить как заказчик"
Андрей Беляев "Мыслить как заказчик"Андрей Беляев "Мыслить как заказчик"
Андрей Беляев "Мыслить как заказчик"
 
photos
photosphotos
photos
 
Visiting unpleasent places
Visiting unpleasent placesVisiting unpleasent places
Visiting unpleasent places
 
Mapas etiquetas
Mapas etiquetasMapas etiquetas
Mapas etiquetas
 
Estrategika nuevos productos proteccion
Estrategika nuevos productos proteccionEstrategika nuevos productos proteccion
Estrategika nuevos productos proteccion
 
Biblioterapia
BiblioterapiaBiblioterapia
Biblioterapia
 
Bit trade labs sovereign identity fintech summit 2016
Bit trade labs sovereign identity   fintech summit 2016Bit trade labs sovereign identity   fintech summit 2016
Bit trade labs sovereign identity fintech summit 2016
 
Арсений Жижелев «Наблюдение за игровым миром Аллодов (Play+Scala+Slick+Postgr...
Арсений Жижелев «Наблюдение за игровым миром Аллодов (Play+Scala+Slick+Postgr...Арсений Жижелев «Наблюдение за игровым миром Аллодов (Play+Scala+Slick+Postgr...
Арсений Жижелев «Наблюдение за игровым миром Аллодов (Play+Scala+Slick+Postgr...
 
นิทาน
นิทานนิทาน
นิทาน
 
Reader’s theater (1)
Reader’s theater (1)Reader’s theater (1)
Reader’s theater (1)
 
Игорь Савка "Как выжить в безнадежном проекте. Личный опыт"
Игорь Савка "Как выжить в безнадежном проекте. Личный опыт"Игорь Савка "Как выжить в безнадежном проекте. Личный опыт"
Игорь Савка "Как выжить в безнадежном проекте. Личный опыт"
 
Uses and gratification theory
Uses and gratification theoryUses and gratification theory
Uses and gratification theory
 
Joint venture
Joint ventureJoint venture
Joint venture
 
Bio pharma vessels & tanks
Bio pharma vessels & tanksBio pharma vessels & tanks
Bio pharma vessels & tanks
 
Android wear, Alexey Rybakov DataArt Kharkov
Android wear, Alexey Rybakov DataArt KharkovAndroid wear, Alexey Rybakov DataArt Kharkov
Android wear, Alexey Rybakov DataArt Kharkov
 

Similar to Apache Spark overview

Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
Synergetics Learning and Cloud Consulting
 
Spark architechure.pptx
Spark architechure.pptxSpark architechure.pptx
Spark architechure.pptx
SaiSriMadhuriYatam
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
Robert Sanders
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
clairvoyantllc
 
Geek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and ScalaGeek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and Scala
Atif Akhtar
 
Big data overview
Big data overviewBig data overview
Big data overview
beCloudReady
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
Rahul Borate
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
cdmaxime
 
Apache Spark™ is a multi-language engine for executing data-S5.ppt
Apache Spark™ is a multi-language engine for executing data-S5.pptApache Spark™ is a multi-language engine for executing data-S5.ppt
Apache Spark™ is a multi-language engine for executing data-S5.ppt
bhargavi804095
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the Surface
Josi Aranda
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Olalekan Fuad Elesin
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
MaheshPandit16
 
Study Notes: Apache Spark
Study Notes: Apache SparkStudy Notes: Apache Spark
Study Notes: Apache Spark
Gao Yunzhong
 
TriHUG talk on Spark and Shark
TriHUG talk on Spark and SharkTriHUG talk on Spark and Shark
TriHUG talk on Spark and Shark
trihug
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
datastack
 
Apache Spark Fundamentals Meetup Talk
Apache Spark Fundamentals Meetup TalkApache Spark Fundamentals Meetup Talk
Apache Spark Fundamentals Meetup Talk
Eren Avşaroğulları
 
Apache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource ManagerApache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource Manager
haridasnss
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
Arjen de Vries
 
Apache Spark
Apache SparkApache Spark
Apache Spark
SugumarSarDurai
 

Similar to Apache Spark overview (20)

Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
Spark architechure.pptx
Spark architechure.pptxSpark architechure.pptx
Spark architechure.pptx
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Geek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and ScalaGeek Night - Functional Data Processing using Spark and Scala
Geek Night - Functional Data Processing using Spark and Scala
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
 
Apache Spark™ is a multi-language engine for executing data-S5.ppt
Apache Spark™ is a multi-language engine for executing data-S5.pptApache Spark™ is a multi-language engine for executing data-S5.ppt
Apache Spark™ is a multi-language engine for executing data-S5.ppt
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the Surface
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
 
Study Notes: Apache Spark
Study Notes: Apache SparkStudy Notes: Apache Spark
Study Notes: Apache Spark
 
TriHUG talk on Spark and Shark
TriHUG talk on Spark and SharkTriHUG talk on Spark and Shark
TriHUG talk on Spark and Shark
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Apache Spark Fundamentals Meetup Talk
Apache Spark Fundamentals Meetup TalkApache Spark Fundamentals Meetup Talk
Apache Spark Fundamentals Meetup Talk
 
Apache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource ManagerApache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource Manager
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 

More from DataArt

DataArt Custom Software Engineering with a Human Approach
DataArt Custom Software Engineering with a Human ApproachDataArt Custom Software Engineering with a Human Approach
DataArt Custom Software Engineering with a Human Approach
DataArt
 
DataArt Healthcare & Life Sciences
DataArt Healthcare & Life SciencesDataArt Healthcare & Life Sciences
DataArt Healthcare & Life Sciences
DataArt
 
DataArt Financial Services and Capital Markets
DataArt Financial Services and Capital MarketsDataArt Financial Services and Capital Markets
DataArt Financial Services and Capital Markets
DataArt
 
About DataArt HR Partners
About DataArt HR PartnersAbout DataArt HR Partners
About DataArt HR Partners
DataArt
 
Event management в IT
Event management в ITEvent management в IT
Event management в IT
DataArt
 
Digital Marketing from inside
Digital Marketing from insideDigital Marketing from inside
Digital Marketing from inside
DataArt
 
What's new in Android, Igor Malytsky ( Google Post I|O Tour)
What's new in Android, Igor Malytsky ( Google Post I|O Tour)What's new in Android, Igor Malytsky ( Google Post I|O Tour)
What's new in Android, Igor Malytsky ( Google Post I|O Tour)
DataArt
 
DevOps Workshop:Что бывает, когда DevOps приходит на проект
DevOps Workshop:Что бывает, когда DevOps приходит на проектDevOps Workshop:Что бывает, когда DevOps приходит на проект
DevOps Workshop:Что бывает, когда DevOps приходит на проект
DataArt
 
IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt
IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArtIT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt
IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt
DataArt
 
«Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...
 «Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han... «Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...
«Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...
DataArt
 
Communication in QA's life
Communication in QA's lifeCommunication in QA's life
Communication in QA's life
DataArt
 
Нельзя просто так взять и договориться, или как мы работали со сложными людьми
Нельзя просто так взять и договориться, или как мы работали со сложными людьмиНельзя просто так взять и договориться, или как мы работали со сложными людьми
Нельзя просто так взять и договориться, или как мы работали со сложными людьми
DataArt
 
Знакомьтесь, DevOps
Знакомьтесь, DevOpsЗнакомьтесь, DevOps
Знакомьтесь, DevOps
DataArt
 
DevOps in real life
DevOps in real lifeDevOps in real life
DevOps in real life
DataArt
 
Codeless: автоматизация тестирования
Codeless: автоматизация тестированияCodeless: автоматизация тестирования
Codeless: автоматизация тестирования
DataArt
 
Selenoid
SelenoidSelenoid
Selenoid
DataArt
 
Selenide
SelenideSelenide
Selenide
DataArt
 
A. Sirota "Building an Automation Solution based on Appium"
A. Sirota "Building an Automation Solution based on Appium"A. Sirota "Building an Automation Solution based on Appium"
A. Sirota "Building an Automation Solution based on Appium"
DataArt
 
Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...
Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...
Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...
DataArt
 
IT talk: Как я перестал бояться и полюбил TestNG
IT talk: Как я перестал бояться и полюбил TestNGIT talk: Как я перестал бояться и полюбил TestNG
IT talk: Как я перестал бояться и полюбил TestNG
DataArt
 

More from DataArt (20)

DataArt Custom Software Engineering with a Human Approach
DataArt Custom Software Engineering with a Human ApproachDataArt Custom Software Engineering with a Human Approach
DataArt Custom Software Engineering with a Human Approach
 
DataArt Healthcare & Life Sciences
DataArt Healthcare & Life SciencesDataArt Healthcare & Life Sciences
DataArt Healthcare & Life Sciences
 
DataArt Financial Services and Capital Markets
DataArt Financial Services and Capital MarketsDataArt Financial Services and Capital Markets
DataArt Financial Services and Capital Markets
 
About DataArt HR Partners
About DataArt HR PartnersAbout DataArt HR Partners
About DataArt HR Partners
 
Event management в IT
Event management в ITEvent management в IT
Event management в IT
 
Digital Marketing from inside
Digital Marketing from insideDigital Marketing from inside
Digital Marketing from inside
 
What's new in Android, Igor Malytsky ( Google Post I|O Tour)
What's new in Android, Igor Malytsky ( Google Post I|O Tour)What's new in Android, Igor Malytsky ( Google Post I|O Tour)
What's new in Android, Igor Malytsky ( Google Post I|O Tour)
 
DevOps Workshop:Что бывает, когда DevOps приходит на проект
DevOps Workshop:Что бывает, когда DevOps приходит на проектDevOps Workshop:Что бывает, когда DevOps приходит на проект
DevOps Workshop:Что бывает, когда DevOps приходит на проект
 
IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt
IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArtIT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt
IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt
 
«Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...
 «Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han... «Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...
«Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...
 
Communication in QA's life
Communication in QA's lifeCommunication in QA's life
Communication in QA's life
 
Нельзя просто так взять и договориться, или как мы работали со сложными людьми
Нельзя просто так взять и договориться, или как мы работали со сложными людьмиНельзя просто так взять и договориться, или как мы работали со сложными людьми
Нельзя просто так взять и договориться, или как мы работали со сложными людьми
 
Знакомьтесь, DevOps
Знакомьтесь, DevOpsЗнакомьтесь, DevOps
Знакомьтесь, DevOps
 
DevOps in real life
DevOps in real lifeDevOps in real life
DevOps in real life
 
Codeless: автоматизация тестирования
Codeless: автоматизация тестированияCodeless: автоматизация тестирования
Codeless: автоматизация тестирования
 
Selenoid
SelenoidSelenoid
Selenoid
 
Selenide
SelenideSelenide
Selenide
 
A. Sirota "Building an Automation Solution based on Appium"
A. Sirota "Building an Automation Solution based on Appium"A. Sirota "Building an Automation Solution based on Appium"
A. Sirota "Building an Automation Solution based on Appium"
 
Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...
Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...
Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...
 
IT talk: Как я перестал бояться и полюбил TestNG
IT talk: Как я перестал бояться и полюбил TestNGIT talk: Как я перестал бояться и полюбил TestNG
IT talk: Как я перестал бояться и полюбил TestNG
 

Recently uploaded

How to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 NotebookHow to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 Notebook
Celine George
 
Split Shifts From Gantt View in the Odoo 17
Split Shifts From Gantt View in the  Odoo 17Split Shifts From Gantt View in the  Odoo 17
Split Shifts From Gantt View in the Odoo 17
Celine George
 
Traces of the Holocaust in our communities in Levice Sovakia and Constanta Ro...
Traces of the Holocaust in our communities in Levice Sovakia and Constanta Ro...Traces of the Holocaust in our communities in Levice Sovakia and Constanta Ro...
Traces of the Holocaust in our communities in Levice Sovakia and Constanta Ro...
Zuzana Mészárosová
 
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUMENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
HappieMontevirgenCas
 
"DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ...
"DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ..."DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ...
"DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ...
thanhluan21
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
heathfieldcps1
 
NLC Grade 3.................................... ppt.pptx
NLC Grade 3.................................... ppt.pptxNLC Grade 3.................................... ppt.pptx
NLC Grade 3.................................... ppt.pptx
MichelleDeLaCruz93
 
How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17
Celine George
 
AI_in_HR_Presentation Part 1 2024 0703.pdf
AI_in_HR_Presentation Part 1 2024 0703.pdfAI_in_HR_Presentation Part 1 2024 0703.pdf
AI_in_HR_Presentation Part 1 2024 0703.pdf
SrimanigandanMadurai
 
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptxChapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Brajeswar Paul
 
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
siemaillard
 
No, it's not a robot: prompt writing for investigative journalism
No, it's not a robot: prompt writing for investigative journalismNo, it's not a robot: prompt writing for investigative journalism
No, it's not a robot: prompt writing for investigative journalism
Paul Bradshaw
 
How to Install Theme in the Odoo 17 ERP
How to  Install Theme in the Odoo 17 ERPHow to  Install Theme in the Odoo 17 ERP
How to Install Theme in the Odoo 17 ERP
Celine George
 
Principles of Roods Approach!!!!!!!.pptx
Principles of Roods Approach!!!!!!!.pptxPrinciples of Roods Approach!!!!!!!.pptx
Principles of Roods Approach!!!!!!!.pptx
ibtesaam huma
 
Credit limit improvement system in odoo 17
Credit limit improvement system in odoo 17Credit limit improvement system in odoo 17
Credit limit improvement system in odoo 17
Celine George
 
BRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptx
BRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptxBRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptx
BRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptx
kambal1234567890
 
Howe Writing Center - Orientation Summer 2024
Howe Writing Center - Orientation Summer 2024Howe Writing Center - Orientation Summer 2024
Howe Writing Center - Orientation Summer 2024
Elizabeth Walsh
 
NationalLearningCamp-2024-Orientation-for-RO-SDO.pptx
NationalLearningCamp-2024-Orientation-for-RO-SDO.pptxNationalLearningCamp-2024-Orientation-for-RO-SDO.pptx
NationalLearningCamp-2024-Orientation-for-RO-SDO.pptx
CelestineMiranda
 
L1 L2- NLC PPT for Grade 10 intervention
L1 L2- NLC PPT for Grade 10 interventionL1 L2- NLC PPT for Grade 10 intervention
L1 L2- NLC PPT for Grade 10 intervention
RHODAJANEAURESTILA
 
Front Desk Management in the Odoo 17 ERP
Front Desk  Management in the Odoo 17 ERPFront Desk  Management in the Odoo 17 ERP
Front Desk Management in the Odoo 17 ERP
Celine George
 

Recently uploaded (20)

How to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 NotebookHow to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 Notebook
 
Split Shifts From Gantt View in the Odoo 17
Split Shifts From Gantt View in the  Odoo 17Split Shifts From Gantt View in the  Odoo 17
Split Shifts From Gantt View in the Odoo 17
 
Traces of the Holocaust in our communities in Levice Sovakia and Constanta Ro...
Traces of the Holocaust in our communities in Levice Sovakia and Constanta Ro...Traces of the Holocaust in our communities in Levice Sovakia and Constanta Ro...
Traces of the Holocaust in our communities in Levice Sovakia and Constanta Ro...
 
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUMENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
 
"DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ...
"DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ..."DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ...
"DANH SÁCH THÍ SINH XÉT TUYỂN SỚM ĐỦ ĐIỀU KIỆN TRÚNG TUYỂN ĐẠI HỌC CHÍNH QUY ...
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
NLC Grade 3.................................... ppt.pptx
NLC Grade 3.................................... ppt.pptxNLC Grade 3.................................... ppt.pptx
NLC Grade 3.................................... ppt.pptx
 
How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17How to Configure Time Off Types in Odoo 17
How to Configure Time Off Types in Odoo 17
 
AI_in_HR_Presentation Part 1 2024 0703.pdf
AI_in_HR_Presentation Part 1 2024 0703.pdfAI_in_HR_Presentation Part 1 2024 0703.pdf
AI_in_HR_Presentation Part 1 2024 0703.pdf
 
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptxChapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
 
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
 
No, it's not a robot: prompt writing for investigative journalism
No, it's not a robot: prompt writing for investigative journalismNo, it's not a robot: prompt writing for investigative journalism
No, it's not a robot: prompt writing for investigative journalism
 
How to Install Theme in the Odoo 17 ERP
How to  Install Theme in the Odoo 17 ERPHow to  Install Theme in the Odoo 17 ERP
How to Install Theme in the Odoo 17 ERP
 
Principles of Roods Approach!!!!!!!.pptx
Principles of Roods Approach!!!!!!!.pptxPrinciples of Roods Approach!!!!!!!.pptx
Principles of Roods Approach!!!!!!!.pptx
 
Credit limit improvement system in odoo 17
Credit limit improvement system in odoo 17Credit limit improvement system in odoo 17
Credit limit improvement system in odoo 17
 
BRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptx
BRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptxBRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptx
BRIGADA ESKWELA OPENING PROGRAM KICK OFF.pptx
 
Howe Writing Center - Orientation Summer 2024
Howe Writing Center - Orientation Summer 2024Howe Writing Center - Orientation Summer 2024
Howe Writing Center - Orientation Summer 2024
 
NationalLearningCamp-2024-Orientation-for-RO-SDO.pptx
NationalLearningCamp-2024-Orientation-for-RO-SDO.pptxNationalLearningCamp-2024-Orientation-for-RO-SDO.pptx
NationalLearningCamp-2024-Orientation-for-RO-SDO.pptx
 
L1 L2- NLC PPT for Grade 10 intervention
L1 L2- NLC PPT for Grade 10 interventionL1 L2- NLC PPT for Grade 10 intervention
L1 L2- NLC PPT for Grade 10 intervention
 
Front Desk Management in the Odoo 17 ERP
Front Desk  Management in the Odoo 17 ERPFront Desk  Management in the Odoo 17 ERP
Front Desk Management in the Odoo 17 ERP
 

Apache Spark overview

Editor's Notes

  1. Исполнение в кластере Параллельность Отказоустойчивость Скорость Различные форматы данных Мониторинг и распределение ресурсов
  2. Spark’s cache is fault-tolerant – if any partition of an RDD is lost, it will automatically be recomputed using the transformations that originally created it. Пример: вычитываем данные из файла, берем имя работника, его должность, зарплату и возраст. Фильтруем по нужным должностям. Потом хоть аггрегировать: среднюю зп по должности и по возрасту.
  3. window length - The duration of the window (3 in the figure). sliding interval - The interval at which the window operation is performed (2 in the figure).
  4. Пример с созданием коннекшена к базе