Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Juantomás García - Open Sistemas
Kappa Architecture 2.0
DataScience Lab, Odessa
Доброго ранку
Одеса!!
(Dobroho ranku Odesa)
first
Juantomás García
• Data Solutions Manager @ OpenSistemas
• GDE (Google Developer Expert) for cloud
Others
• Co-Author of the first Spanish free software book “La Pastilla
Roja”
• President of Hispalinux (Spanish Linux User Group)
• Organizer of the Machine Learning Spain and GDG Cloud
Madrid.
Who I am
What’s Kappa Architecture?
July 2, 2014 Jay Kreps coined the term Kappa
Architecture in an article for O’reilly Radar
“Maybe we could call this the Kappa Achitecture, though it may
be too simple of an idea to merit a Greek letter”
Jay has been involved in lots
of projects:
✓ Author of the essay: The
Log: What every software
engineer should know about
real-time data's unifying
abstraction (12/16/2013)
✓ Author of the book I love
Logs
Who is Jay Kreps?
•Involved with projects as:
✓ Apache Kafka
✓ Apache Samza
✓ Voldemort
✓ Azkaban
✓ Ex-Linkedin
✓ Now co-founder and CEO of Confluent
Who is Jay Kreps?
Usual Data Flow
Usual Data Flow
Usual Data Flow
Kappa Architecture Way
Tools we use
Tools we use
Tools we use
✓ If you have an schema spark SQL, is
perfect.
✓ Spark streaming works very fine with spark
and almost each streaming sources.
✓ Structured queries will be a huge advance.
✓ We love Scala, the spirit of Spark.
Some Favorite Spark Features
We love code like this:
Some Favorite Spark Features
• One of our clients wanted to monitor all the
car's information via OBD II
• OBD II is a car interface with the car
electronics.
• Our client developed an app for reading all
the car information throw ODB II with
bluetooth
A Real Use Case
A Real Use Case
• We needed to scale the rest interfaces.
There were too many requests.
• MySQL don’t scale
• Client wanted to do realtime expensive
queries.
First Problems
Some metrics
Architecture v 2.0
Architecture v 3.0
We can have queries like:
“What are the drivers that are not client
of the X gas brand, has a few gas and
are near of gas station of the brand X and
if true, send a notification with a discount
coupon and a link with the route."
Now we’re more flexible!!
• Kappa architecture is not a silver bullet but helps
with a lot of solutions.
• Kafka + spark streaming are our favorite tools
• There are a lots of improvements:
Takeaways
✓ OLAP like Apache Druid
✓ Graph databases like neo4j
✓ Kafka streams and
compacts logs
✓ Apache Beams
✓ Scio Scala bindings
Takeaways: Apache Beam
Takeaways: Scio Scala Binding
Think Big
Think Big
• Forget Legacy Architectures
• Forget Old Tools
• Use Light Technologies / Serverless
• Use pieces of Lego
• Mix different technologies from diverse sources
Spark Use Cases
Not to do list
•Avoid install & config a server even a
VM.
•Avoid installs tools instead use
containers and/or cloud services.
•In general: think if there is a simpler
way to do it and needs less effort
Spark Use Cases
Architecture & Tools
•To use Cloud Services is not a brainer
decision.
•Git + Containers + Kubernetes
•Use the best language* for each
module.
•Use Notebooks: Jupyter, Zeppelin,
DSX
(*) Even java might be an option - unprovable
Google Cloud Version
Kappa Architecture
Questions?
•email: juantomas@opensistemas.com
•twitter: @juantomas
This talk have a free questions lifetime warranty: If you have any questions or concerns
about this talk, feel free to contact me anytime.
Selfie Time: If you like the talk just smile while I take
the selfie ;-)
Kappa Architecture
велике спасибі

More Related Content

What's hot

Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
Treasure Data, Inc.
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
P. Taylor Goetz
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
_mdev_
 
A (XPages) developers guide to Cloudant
A (XPages) developers guide to CloudantA (XPages) developers guide to Cloudant
A (XPages) developers guide to Cloudant
Frank van der Linden
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
Eduardo Silva Pereira
 
Briney - Leveling Up Data Management - With Notes
Briney - Leveling Up Data Management - With NotesBriney - Leveling Up Data Management - With Notes
Briney - Leveling Up Data Management - With Notes
National Information Standards Organization (NISO)
 
Elk meetup
Elk meetupElk meetup
Elk meetup
Asaf Yigal
 
Serverless Big Data Architecture on Google Cloud Platform at Credit OK
Serverless Big Data Architecture on Google Cloud Platform at Credit OKServerless Big Data Architecture on Google Cloud Platform at Credit OK
Serverless Big Data Architecture on Google Cloud Platform at Credit OK
Kriangkrai Chaonithi
 
A (XPages) developers guide to Cloudant - MeetIT
A (XPages) developers guide to Cloudant - MeetITA (XPages) developers guide to Cloudant - MeetIT
A (XPages) developers guide to Cloudant - MeetIT
Frank van der Linden
 
Protecting Your Cluster from Your Humans
Protecting Your Cluster from Your HumansProtecting Your Cluster from Your Humans
Protecting Your Cluster from Your Humans
Elasticsearch
 
Indexing big data in the cloud
Indexing big data in the cloudIndexing big data in the cloud
Indexing big data in the cloud
OpenSource Connections
 
2018 01-15 infra coders meetup vienna ii
2018 01-15 infra coders meetup vienna ii2018 01-15 infra coders meetup vienna ii
2018 01-15 infra coders meetup vienna ii
Aleksandar Lazic
 
DEV-1129 How Watson, Bluemix, Cloudant, and XPages Can Work Together In A Rea...
DEV-1129 How Watson, Bluemix, Cloudant, and XPages Can Work Together In A Rea...DEV-1129 How Watson, Bluemix, Cloudant, and XPages Can Work Together In A Rea...
DEV-1129 How Watson, Bluemix, Cloudant, and XPages Can Work Together In A Rea...
Frank van der Linden
 
Find your data
Find your dataFind your data
Find your data
Oliver Busse
 
Atlogys Academy - Tech Talk on Mongo DB
Atlogys Academy - Tech Talk on Mongo DBAtlogys Academy - Tech Talk on Mongo DB
Atlogys Academy - Tech Talk on Mongo DB
Atlogys Technical Consulting
 
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …
mortardata
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYC
Jason Plurad
 
Going Serverless with AWS Lambda at ReportGarden
Going Serverless with AWS Lambda at ReportGardenGoing Serverless with AWS Lambda at ReportGarden
Going Serverless with AWS Lambda at ReportGarden
Jay Gandhi
 
EDU2.0 and Amazon CloudSearch
EDU2.0 and Amazon CloudSearchEDU2.0 and Amazon CloudSearch
EDU2.0 and Amazon CloudSearch
Michael Bohlig
 

What's hot (19)

Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
 
A (XPages) developers guide to Cloudant
A (XPages) developers guide to CloudantA (XPages) developers guide to Cloudant
A (XPages) developers guide to Cloudant
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 
Briney - Leveling Up Data Management - With Notes
Briney - Leveling Up Data Management - With NotesBriney - Leveling Up Data Management - With Notes
Briney - Leveling Up Data Management - With Notes
 
Elk meetup
Elk meetupElk meetup
Elk meetup
 
Serverless Big Data Architecture on Google Cloud Platform at Credit OK
Serverless Big Data Architecture on Google Cloud Platform at Credit OKServerless Big Data Architecture on Google Cloud Platform at Credit OK
Serverless Big Data Architecture on Google Cloud Platform at Credit OK
 
A (XPages) developers guide to Cloudant - MeetIT
A (XPages) developers guide to Cloudant - MeetITA (XPages) developers guide to Cloudant - MeetIT
A (XPages) developers guide to Cloudant - MeetIT
 
Protecting Your Cluster from Your Humans
Protecting Your Cluster from Your HumansProtecting Your Cluster from Your Humans
Protecting Your Cluster from Your Humans
 
Indexing big data in the cloud
Indexing big data in the cloudIndexing big data in the cloud
Indexing big data in the cloud
 
2018 01-15 infra coders meetup vienna ii
2018 01-15 infra coders meetup vienna ii2018 01-15 infra coders meetup vienna ii
2018 01-15 infra coders meetup vienna ii
 
DEV-1129 How Watson, Bluemix, Cloudant, and XPages Can Work Together In A Rea...
DEV-1129 How Watson, Bluemix, Cloudant, and XPages Can Work Together In A Rea...DEV-1129 How Watson, Bluemix, Cloudant, and XPages Can Work Together In A Rea...
DEV-1129 How Watson, Bluemix, Cloudant, and XPages Can Work Together In A Rea...
 
Find your data
Find your dataFind your data
Find your data
 
Atlogys Academy - Tech Talk on Mongo DB
Atlogys Academy - Tech Talk on Mongo DBAtlogys Academy - Tech Talk on Mongo DB
Atlogys Academy - Tech Talk on Mongo DB
 
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …
Mortar: Hadoop-as-a-Service + Open Source Framework | AWS re: Invent public …
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYC
 
Going Serverless with AWS Lambda at ReportGarden
Going Serverless with AWS Lambda at ReportGardenGoing Serverless with AWS Lambda at ReportGarden
Going Serverless with AWS Lambda at ReportGarden
 
EDU2.0 and Amazon CloudSearch
EDU2.0 and Amazon CloudSearchEDU2.0 and Amazon CloudSearch
EDU2.0 and Amazon CloudSearch
 

Similar to DataScience Lab 2017_Kappa Architecture: How to implement a real-time streaming data analytics engine Juantomás García

Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016
LibreCon
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Sarah Guido
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
Lynn Langit
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
Lynn Langit
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
Juantomás García Molina
 
Apache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosApache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectos
OpenSistemas
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
Radhika Puthiyetath
 
Which database should I use for my app?
Which database should I use for my app?Which database should I use for my app?
Which database should I use for my app?
Nawaz Dhandala
 
Cloud and Big Data trends
Cloud and Big Data trendsCloud and Big Data trends
Cloud and Big Data trends
Sebastien Goasguen
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
Chris Dagdigian
 
Why Organizations are Looking at Alternative Database Technologies – Introduc...
Why Organizations are Looking at Alternative Database Technologies – Introduc...Why Organizations are Looking at Alternative Database Technologies – Introduc...
Why Organizations are Looking at Alternative Database Technologies – Introduc...
DATAVERSITY
 
Decoupling Drupal - Drupal Camp Toronto 2014
Decoupling Drupal - Drupal Camp Toronto 2014Decoupling Drupal - Drupal Camp Toronto 2014
Decoupling Drupal - Drupal Camp Toronto 2014
Alex De Winne
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
N Masahiro
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
RightScale
 
Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2
Traveloka
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
Wes McKinney
 
Comment choisir entre Parse, Heroku et AWS ?
Comment choisir entre Parse, Heroku et AWS ?Comment choisir entre Parse, Heroku et AWS ?
Comment choisir entre Parse, Heroku et AWS ?
TheFamily
 
Server’s variations bsw2015
Server’s variations bsw2015Server’s variations bsw2015
Server’s variations bsw2015
Laurent Cerveau
 
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Edward Wilde
 

Similar to DataScience Lab 2017_Kappa Architecture: How to implement a real-time streaming data analytics engine Juantomás García (20)

Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016Kappa Architecture, IoT of the cars - LibreCon 2016
Kappa Architecture, IoT of the cars - LibreCon 2016
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 
Apache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosApache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectos
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
 
Which database should I use for my app?
Which database should I use for my app?Which database should I use for my app?
Which database should I use for my app?
 
Cloud and Big Data trends
Cloud and Big Data trendsCloud and Big Data trends
Cloud and Big Data trends
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 
Why Organizations are Looking at Alternative Database Technologies – Introduc...
Why Organizations are Looking at Alternative Database Technologies – Introduc...Why Organizations are Looking at Alternative Database Technologies – Introduc...
Why Organizations are Looking at Alternative Database Technologies – Introduc...
 
Decoupling Drupal - Drupal Camp Toronto 2014
Decoupling Drupal - Drupal Camp Toronto 2014Decoupling Drupal - Drupal Camp Toronto 2014
Decoupling Drupal - Drupal Camp Toronto 2014
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
 
Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
 
Comment choisir entre Parse, Heroku et AWS ?
Comment choisir entre Parse, Heroku et AWS ?Comment choisir entre Parse, Heroku et AWS ?
Comment choisir entre Parse, Heroku et AWS ?
 
Server’s variations bsw2015
Server’s variations bsw2015Server’s variations bsw2015
Server’s variations bsw2015
 
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
 

More from GeeksLab Odessa

DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
GeeksLab Odessa
 
DataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский ВикторDataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский Виктор
GeeksLab Odessa
 
DataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеDataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображение
GeeksLab Odessa
 
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
GeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
GeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
GeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
GeeksLab Odessa
 
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
GeeksLab Odessa
 
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
GeeksLab Odessa
 
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
GeeksLab Odessa
 
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
GeeksLab Odessa
 
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
GeeksLab Odessa
 
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
GeeksLab Odessa
 
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
GeeksLab Odessa
 
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
GeeksLab Odessa
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
GeeksLab Odessa
 
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
GeeksLab Odessa
 
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
GeeksLab Odessa
 
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
GeeksLab Odessa
 
JS Lab2017_Redux: время двигаться дальше?_Екатерина Лизогубова
JS Lab2017_Redux: время двигаться дальше?_Екатерина ЛизогубоваJS Lab2017_Redux: время двигаться дальше?_Екатерина Лизогубова
JS Lab2017_Redux: время двигаться дальше?_Екатерина Лизогубова
GeeksLab Odessa
 

More from GeeksLab Odessa (20)

DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
 
DataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский ВикторDataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский Виктор
 
DataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеDataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображение
 
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
 
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
 
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
 
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
 
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
 
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
 
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
 
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
 
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
 
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
 
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
 
JS Lab2017_Redux: время двигаться дальше?_Екатерина Лизогубова
JS Lab2017_Redux: время двигаться дальше?_Екатерина ЛизогубоваJS Lab2017_Redux: время двигаться дальше?_Екатерина Лизогубова
JS Lab2017_Redux: время двигаться дальше?_Екатерина Лизогубова
 

Recently uploaded

Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
ScyllaDB
 
Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1
FellyciaHikmahwarani
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
ScyllaDB
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
ScyllaDB
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
kantakumariji156
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Earley Information Science
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
Matthew Sinclair
 
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
Alpen-Adria-Universität
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
Larry Smarr
 
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
James Anderson
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
ScyllaDB
 
Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024
The Digital Insurer
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
amitchopra0215
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
What's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdfWhat's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdf
SeasiaInfotech2
 

Recently uploaded (20)

Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
 
Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
 
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
 
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
 
Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
What's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdfWhat's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdf
 

DataScience Lab 2017_Kappa Architecture: How to implement a real-time streaming data analytics engine Juantomás García

  • 1. Juantomás García - Open Sistemas Kappa Architecture 2.0 DataScience Lab, Odessa
  • 3. Juantomás García • Data Solutions Manager @ OpenSistemas • GDE (Google Developer Expert) for cloud Others • Co-Author of the first Spanish free software book “La Pastilla Roja” • President of Hispalinux (Spanish Linux User Group) • Organizer of the Machine Learning Spain and GDG Cloud Madrid. Who I am
  • 4. What’s Kappa Architecture? July 2, 2014 Jay Kreps coined the term Kappa Architecture in an article for O’reilly Radar “Maybe we could call this the Kappa Achitecture, though it may be too simple of an idea to merit a Greek letter”
  • 5. Jay has been involved in lots of projects: ✓ Author of the essay: The Log: What every software engineer should know about real-time data's unifying abstraction (12/16/2013) ✓ Author of the book I love Logs Who is Jay Kreps?
  • 6. •Involved with projects as: ✓ Apache Kafka ✓ Apache Samza ✓ Voldemort ✓ Azkaban ✓ Ex-Linkedin ✓ Now co-founder and CEO of Confluent Who is Jay Kreps?
  • 14. ✓ If you have an schema spark SQL, is perfect. ✓ Spark streaming works very fine with spark and almost each streaming sources. ✓ Structured queries will be a huge advance. ✓ We love Scala, the spirit of Spark. Some Favorite Spark Features
  • 15. We love code like this: Some Favorite Spark Features
  • 16. • One of our clients wanted to monitor all the car's information via OBD II • OBD II is a car interface with the car electronics. • Our client developed an app for reading all the car information throw ODB II with bluetooth A Real Use Case
  • 17. A Real Use Case
  • 18. • We needed to scale the rest interfaces. There were too many requests. • MySQL don’t scale • Client wanted to do realtime expensive queries. First Problems
  • 22. We can have queries like: “What are the drivers that are not client of the X gas brand, has a few gas and are near of gas station of the brand X and if true, send a notification with a discount coupon and a link with the route." Now we’re more flexible!!
  • 23. • Kappa architecture is not a silver bullet but helps with a lot of solutions. • Kafka + spark streaming are our favorite tools • There are a lots of improvements: Takeaways ✓ OLAP like Apache Druid ✓ Graph databases like neo4j ✓ Kafka streams and compacts logs ✓ Apache Beams ✓ Scio Scala bindings
  • 27. Think Big • Forget Legacy Architectures • Forget Old Tools • Use Light Technologies / Serverless • Use pieces of Lego • Mix different technologies from diverse sources
  • 28. Spark Use Cases Not to do list •Avoid install & config a server even a VM. •Avoid installs tools instead use containers and/or cloud services. •In general: think if there is a simpler way to do it and needs less effort
  • 29. Spark Use Cases Architecture & Tools •To use Cloud Services is not a brainer decision. •Git + Containers + Kubernetes •Use the best language* for each module. •Use Notebooks: Jupyter, Zeppelin, DSX (*) Even java might be an option - unprovable
  • 31. Kappa Architecture Questions? •email: juantomas@opensistemas.com •twitter: @juantomas This talk have a free questions lifetime warranty: If you have any questions or concerns about this talk, feel free to contact me anytime. Selfie Time: If you like the talk just smile while I take the selfie ;-)