Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
From Batch to Streaming:
H o w A m a z o n F l e x U s e s R e a l - t i m e A n a l y t i c s t o D e l i v e r P a c k a g e s o n T i m e
N o v e m b e r 2 8 , 2 0 1 7
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• Real-time streaming data overview
• Streaming data services
• Benefits of streaming analytics
• Batch to streaming best practices
• How Amazon Flex moved from batch to streaming
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is batch processing?
Execution of a series of jobs in a program on a
computer without manual intervention - Wikipedia
• Data is collected over a period of time
• Process and analyze on a schedule
• Combine several processes to obtain final result
Most data is produced continuously
Mobile apps Web clickstream Application logs
Metering records IoT sensors Smart buildings
The diminishing value of data
Recent data is highly valuable
• If you act on it in time
• Perishable insights (M. Gualtieri,
Forrester)
Old + recent data is more
valuable
• If you have the means to combine
them
Processing real-time, streaming data
• Durable
• Continuous
• Fast
• Correct
• Reactive
• Reliable
What are the key requirements?
Collect Transform Analyze React Persist
Amazon Kinesis makes it easy to work with real-
time streaming data
Kinesis Streams
• For technical developers
• Collect and stream data
for ordered, replayable,
real-time processing
Kinesis Firehose
• For all developers, data
scientists
• Easily load massive
volumes of streaming data
into Amazon S3, Redshift,
ElasticSearch
Kinesis Analytics
• For all developers, data
scientists
• Easily analyze data streams
using standard SQL queries
• Compute analytics in
real time
Amazon Kinesis Streams
• Reliably ingest and durably store streaming data at low cost
• Build custom real-time applications to process streaming data
• Use your stream-processing framework of choice
Amazon Kinesis Firehose
• Reliably ingest and deliver batched, compressed, and
encrypted data to S3, Redshift, and Elasticsearch
• Point and click setup with zero administration and
seamless elasticity
• Managed stream-processing consumer
Amazon Kinesis Analytics
• Interact with streaming data in real time using SQL
• Build fully managed and elastic stream processing
applications that process data for real-time
visualizations and alarms
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of streaming analysis
Immediate results
• Real-time
aggregations
• Filtering
• Anomaly detection
Reduced
complexity
• Fewer scheduled
jobs to manage
• Kinesis is a fully-
managed solution
Scalable
• Enables parallel
processing
• Horizontally
scales, based on
your ingest rate
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch to streaming best practices
Migrate incrementally
• Don’t boil the ocean
• Begin by streaming data
in parallel to existing
batch processes
• Persist streaming data
into durable storage, like
Amazon S3
• Add in streaming
analysis results to
replace batch analysis
Application databases Data warehouseData producer
Amazon Kinesis
ETL
ETL
Amazon S3
Streaming
data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch to streaming best practices
Perform ITL rather than ETL
• ITL: Ingest-Transform-Load
• ETL: Extract-Transform-Load
• Transform data in near-real time
rather than a scheduled job
• Enrich data in near-real time
• Persist transformed and/or
enriched data
Data producer
Amazon Kinesis
Firehose
Raw streaming
data
AWS Lambda
function
Amazon S3
Transformed
data
Transform
data
Enrichment
source data
Raw data Transformed and/or
enriched data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch to streaming best practices
Aggregate upon arrival
• Continuously write raw data
to persistent data store for
archival and other analysis
• Aggregate in real time when
window size < 1 hour
• Write aggregated data to
persistent data store for
immediate value
Amazon Kinesis
Firehose
Raw streaming
data
Amazon S3
Raw
data
Aggregated
data
Amazon Kinesis
Analytics
Aggregate
Results
Data producer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch to streaming example
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Brandon Smith
• Senior software engineer
• Worked at Amazon for 12 years in Kindle, AWS, and now Last Mile Delivery
• Currently working on Amazon Flex
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Amazon delivery app (Android/iOS)
• Crowd-sourced model launched in
30+ U.S. cities
• Used by Amazon Logistics worldwide
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Deliveries for Amazon.com, Prime
Now, Amazon Fresh, restaurants,
grocery stores
• Millions of packages per year
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The problem
• Collecting, processing, and storing telemetry data
• Telemetry data = remote measurements
• Includes metrics, crashes, logs, sensor data, clickstream data, etc.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The goal
• Understand what’s happening in the field
• Analyze all the data and make performance optimizations
• Focus our time on improving the app and the delivery flow
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use cases
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use case 1: Alarming
• We want to know within minutes if there are problems
• Example: If the delivery count drops below our expected/historical value,
we want to alarm
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use case 2: Troubleshooting
• Logs and crashes published to AWS CloudWatch Logs in near-real time
• Can filter and search to troubleshoot issues
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use case 3: Dashboards
• We can write SQL, generate reports, and create visualizations
• But we really want real-time dashboards instead of daily reports
Daily reports Real-time dashboards
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use case 4: Releases
• Deploying new app versions and monitoring adoption in real time
• Release new code smoothly and with confidence
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use case 5: Sharing data
• Consumers get notifications of new data in real time
• Consumers can join their data with other data in the data lake
S3 bucket Data lake
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use case 6: Deeper analytics
• Look at the stream of data and the historical data
• Build ML models, create predictions, detect anomalies
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How did we build it?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Getting from batch to streaming
• To solve our use cases, we had to incrementally improve our system
• We evolved from a batch-based system to a stream-based system
• Let’s walk through the iterations
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Collect metrics and send to an existing metrics service
• ETL jobs to load data into a big Oracle Data Warehouse
Iteration 1: Use existing systems
Existing metrics serviceApp DW
ETL
Data
collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1. Batch process with 24-hour delay
2. Fixed, inflexible DB schema
3. Analysis difficult and slow via SQL
Iteration 1: Use existing systems
Existing metrics serviceApp DW
ETL
Data
collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Collect metrics in the app using AWS Amazon Mobile Analytics SDK,
which automatically loads data into Redshift
Iteration 2: Use AWS
App
CloudFormation
ETL system
Data
collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1. Batch process with 24-hour delay 2-hour delay
2. Fixed, inflexible DB schema
3. Analysis difficult and slow via SQL
Iteration 2: Use AWS
App
CloudFormation
ETL system
Data
Collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Add shared configuration that is used in the app and automatically
updates the Redshift schema
Iteration 3: Automated DB schema
App
CloudFormation
ETL system
Data
collection
Schema config
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1. Batch process with 24-hour delay 2-hour delay
2. Fixed, inflexible Auto-updating DB schema
3. Analysis difficult and slow via SQL
Iteration 3: Automated DB schema
App
Schema config
CloudFormation
ETL system
Data
collection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Introduce a Kinesis stream and Kinesis Firehose to publish to Redshift
• Partition data by date to simplify data retention policies
Iteration 4: Use Streams
App
Data
collection Via Pinpoint
Schema
config
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1. Batch Streaming process with 24-hour 2 hour a delay of a couple
minutes
2. Fixed, inflexible Auto-updating DB schema
3. Analysis difficult and slow via SQL
Iteration 4: Use Streams
App
Data
collection Via Pinpoint
Schema
config
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Use generic message types
• Publish the data to:
• S3
• Redshift
• ElasticSearch
Iteration 5: Generic message types
App
ElasticSearch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Iteration 5
App
Data
collection
ElasticSearch
Consumer Lambdas
SQL reports
Dashboards
ProtoBuf
Consumer Redshifts
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1. Batch Streaming process with 24-hour 2 hour a few seconds delay
2. Fixed, inflexible Auto-updating DB schema and generic message types
3. Analysis difficult and slow via SQL flexible by processing message payload
Iteration 5: Generic message types
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data flow
App
ElasticSearch
Consumer Redshifts
Consumer Lambdas
SQL reports
Dashboards
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Future improvements
Some ideas to make the system even better:
1. Use Kinesis Analytics to query the real-time data stream
2. Use AWS Athena to query data directly from S3
3. Use AWS Amazon AI Services to do deeper data analysis
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Summary
Did we solve our use cases?
1. Real-time metrics and alarming
2. Real-time dashboards
3. Real-time logs and crash troubleshooting
4. Monitoring new releases
5. Sharing data with other teams
6. Deeper analytics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of Streaming
1. Agility: real-time data means your business can react quicker
2. Flexibility: generic message types give you flexible schemas so your
system can handle multiple data types and future use cases
3. Shareability: streams allow you to multiplex and share your data easily
with your consumers
4. Extensibility: Processing streams of data allows us to write it to
multiple data storage systems, which enables a variety of analytics
tools
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon KinesisABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
Amazon Web Services
 
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
Amazon Web Services Japan
 
Introducing AWS Elastic Beanstalk
Introducing AWS Elastic BeanstalkIntroducing AWS Elastic Beanstalk
Introducing AWS Elastic Beanstalk
Amazon Web Services
 
Amazon API Gateway
Amazon API GatewayAmazon API Gateway
Amazon API Gateway
Amazon Web Services
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
Amazon Web Services
 
ElastiCache & Redis
ElastiCache & RedisElastiCache & Redis
ElastiCache & Redis
Amazon Web Services
 
K8s on AWS: Introducing Amazon EKS
K8s on AWS: Introducing Amazon EKSK8s on AWS: Introducing Amazon EKS
K8s on AWS: Introducing Amazon EKS
Amazon Web Services
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
Amazon Web Services
 
Edge Computing Use Cases: Interactive Deep Dive on AWS Snowball Edge (STG387)...
Edge Computing Use Cases: Interactive Deep Dive on AWS Snowball Edge (STG387)...Edge Computing Use Cases: Interactive Deep Dive on AWS Snowball Edge (STG387)...
Edge Computing Use Cases: Interactive Deep Dive on AWS Snowball Edge (STG387)...
Amazon Web Services
 
Getting Started with AWS Database Migration Service
Getting Started with AWS Database Migration ServiceGetting Started with AWS Database Migration Service
Getting Started with AWS Database Migration Service
Amazon Web Services
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
Amazon Web Services
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
Amazon Web Services
 
Serverless
ServerlessServerless
Serverless
Young Yang
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
Amazon Web Services
 
롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...
롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...
롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...
Amazon Web Services Korea
 
Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...
Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...
Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...
Amazon Web Services
 
淺談系統監控與 AWS CloudWatch 的應用
淺談系統監控與 AWS CloudWatch 的應用淺談系統監控與 AWS CloudWatch 的應用
淺談系統監控與 AWS CloudWatch 的應用
Rick Hwang
 
20190122 AWS Black Belt Online Seminar Amazon Redshift Update
20190122 AWS Black Belt Online Seminar Amazon Redshift Update20190122 AWS Black Belt Online Seminar Amazon Redshift Update
20190122 AWS Black Belt Online Seminar Amazon Redshift Update
Amazon Web Services Japan
 
Amazon Sagemaker Studio를 통한 ML개발하기 - 소성운(크로키닷컴) :: AWS Community D...
Amazon Sagemaker Studio를 통한 ML개발하기 - 소성운(크로키닷컴) :: AWS Community D...Amazon Sagemaker Studio를 통한 ML개발하기 - 소성운(크로키닷컴) :: AWS Community D...
Amazon Sagemaker Studio를 통한 ML개발하기 - 소성운(크로키닷컴) :: AWS Community D...
AWSKRUG - AWS한국사용자모임
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
Amazon Web Services
 

What's hot (20)

ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon KinesisABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
 
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
 
Introducing AWS Elastic Beanstalk
Introducing AWS Elastic BeanstalkIntroducing AWS Elastic Beanstalk
Introducing AWS Elastic Beanstalk
 
Amazon API Gateway
Amazon API GatewayAmazon API Gateway
Amazon API Gateway
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
ElastiCache & Redis
ElastiCache & RedisElastiCache & Redis
ElastiCache & Redis
 
K8s on AWS: Introducing Amazon EKS
K8s on AWS: Introducing Amazon EKSK8s on AWS: Introducing Amazon EKS
K8s on AWS: Introducing Amazon EKS
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Edge Computing Use Cases: Interactive Deep Dive on AWS Snowball Edge (STG387)...
Edge Computing Use Cases: Interactive Deep Dive on AWS Snowball Edge (STG387)...Edge Computing Use Cases: Interactive Deep Dive on AWS Snowball Edge (STG387)...
Edge Computing Use Cases: Interactive Deep Dive on AWS Snowball Edge (STG387)...
 
Getting Started with AWS Database Migration Service
Getting Started with AWS Database Migration ServiceGetting Started with AWS Database Migration Service
Getting Started with AWS Database Migration Service
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
 
Serverless
ServerlessServerless
Serverless
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...
롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...
롯데이커머스의 마이크로 서비스 아키텍처 진화와 비용 관점의 운영 노하우-나현길, 롯데이커머스 클라우드플랫폼 팀장::AWS 마이그레이션 A ...
 
Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...
Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...
Automated Compliance and Governance with AWS Config and AWS CloudTrail - June...
 
淺談系統監控與 AWS CloudWatch 的應用
淺談系統監控與 AWS CloudWatch 的應用淺談系統監控與 AWS CloudWatch 的應用
淺談系統監控與 AWS CloudWatch 的應用
 
20190122 AWS Black Belt Online Seminar Amazon Redshift Update
20190122 AWS Black Belt Online Seminar Amazon Redshift Update20190122 AWS Black Belt Online Seminar Amazon Redshift Update
20190122 AWS Black Belt Online Seminar Amazon Redshift Update
 
Amazon Sagemaker Studio를 통한 ML개발하기 - 소성운(크로키닷컴) :: AWS Community D...
Amazon Sagemaker Studio를 통한 ML개발하기 - 소성운(크로키닷컴) :: AWS Community D...Amazon Sagemaker Studio를 통한 ML개발하기 - 소성운(크로키닷컴) :: AWS Community D...
Amazon Sagemaker Studio를 통한 ML개발하기 - 소성운(크로키닷컴) :: AWS Community D...
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 

Similar to ABD217_From Batch to Streaming

From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time AnalyticsFrom Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
Amazon Web Services
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
Amazon Web Services
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
Amazon Web Services
 
Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
 Citrix Moves Data to Amazon Redshift Fast with Matillion ETL Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
Amazon Web Services
 
Analyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAnalyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon Kinesis
Amazon Web Services
 
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Amazon Web Services
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Amazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
Amazon Web Services
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
Jampp
 
Serverless Datalake Day with AWS
Serverless Datalake Day with AWSServerless Datalake Day with AWS
Serverless Datalake Day with AWS
Amazon Web Services
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
Amazon Web Services
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
Amazon Web Services
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
Amazon Web Services
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Amazon Web Services
 
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
Amazon Web Services
 
What's new in AWS?
What's new in AWS?What's new in AWS?
What's new in AWS?
Amazon Web Services
 
STG401_This Is My Architecture
STG401_This Is My ArchitectureSTG401_This Is My Architecture
STG401_This Is My Architecture
Amazon Web Services
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
Amazon Web Services
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Amazon Web Services
 

Similar to ABD217_From Batch to Streaming (20)

From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time AnalyticsFrom Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
 Citrix Moves Data to Amazon Redshift Fast with Matillion ETL Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
Citrix Moves Data to Amazon Redshift Fast with Matillion ETL
 
Analyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAnalyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon Kinesis
 
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
 
Serverless Datalake Day with AWS
Serverless Datalake Day with AWSServerless Datalake Day with AWS
Serverless Datalake Day with AWS
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
 
What's new in AWS?
What's new in AWS?What's new in AWS?
What's new in AWS?
 
STG401_This Is My Architecture
STG401_This Is My ArchitectureSTG401_This Is My Architecture
STG401_This Is My Architecture
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

ABD217_From Batch to Streaming

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT From Batch to Streaming: H o w A m a z o n F l e x U s e s R e a l - t i m e A n a l y t i c s t o D e l i v e r P a c k a g e s o n T i m e N o v e m b e r 2 8 , 2 0 1 7
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda • Real-time streaming data overview • Streaming data services • Benefits of streaming analytics • Batch to streaming best practices • How Amazon Flex moved from batch to streaming
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is batch processing? Execution of a series of jobs in a program on a computer without manual intervention - Wikipedia • Data is collected over a period of time • Process and analyze on a schedule • Combine several processes to obtain final result
  • 4. Most data is produced continuously Mobile apps Web clickstream Application logs Metering records IoT sensors Smart buildings
  • 5. The diminishing value of data Recent data is highly valuable • If you act on it in time • Perishable insights (M. Gualtieri, Forrester) Old + recent data is more valuable • If you have the means to combine them
  • 6. Processing real-time, streaming data • Durable • Continuous • Fast • Correct • Reactive • Reliable What are the key requirements? Collect Transform Analyze React Persist
  • 7. Amazon Kinesis makes it easy to work with real- time streaming data Kinesis Streams • For technical developers • Collect and stream data for ordered, replayable, real-time processing Kinesis Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into Amazon S3, Redshift, ElasticSearch Kinesis Analytics • For all developers, data scientists • Easily analyze data streams using standard SQL queries • Compute analytics in real time
  • 8. Amazon Kinesis Streams • Reliably ingest and durably store streaming data at low cost • Build custom real-time applications to process streaming data • Use your stream-processing framework of choice
  • 9. Amazon Kinesis Firehose • Reliably ingest and deliver batched, compressed, and encrypted data to S3, Redshift, and Elasticsearch • Point and click setup with zero administration and seamless elasticity • Managed stream-processing consumer
  • 10. Amazon Kinesis Analytics • Interact with streaming data in real time using SQL • Build fully managed and elastic stream processing applications that process data for real-time visualizations and alarms
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of streaming analysis Immediate results • Real-time aggregations • Filtering • Anomaly detection Reduced complexity • Fewer scheduled jobs to manage • Kinesis is a fully- managed solution Scalable • Enables parallel processing • Horizontally scales, based on your ingest rate
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch to streaming best practices Migrate incrementally • Don’t boil the ocean • Begin by streaming data in parallel to existing batch processes • Persist streaming data into durable storage, like Amazon S3 • Add in streaming analysis results to replace batch analysis Application databases Data warehouseData producer Amazon Kinesis ETL ETL Amazon S3 Streaming data
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch to streaming best practices Perform ITL rather than ETL • ITL: Ingest-Transform-Load • ETL: Extract-Transform-Load • Transform data in near-real time rather than a scheduled job • Enrich data in near-real time • Persist transformed and/or enriched data Data producer Amazon Kinesis Firehose Raw streaming data AWS Lambda function Amazon S3 Transformed data Transform data Enrichment source data Raw data Transformed and/or enriched data
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch to streaming best practices Aggregate upon arrival • Continuously write raw data to persistent data store for archival and other analysis • Aggregate in real time when window size < 1 hour • Write aggregated data to persistent data store for immediate value Amazon Kinesis Firehose Raw streaming data Amazon S3 Raw data Aggregated data Amazon Kinesis Analytics Aggregate Results Data producer
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch to streaming example
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Brandon Smith • Senior software engineer • Worked at Amazon for 12 years in Kindle, AWS, and now Last Mile Delivery • Currently working on Amazon Flex
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Amazon delivery app (Android/iOS) • Crowd-sourced model launched in 30+ U.S. cities • Used by Amazon Logistics worldwide
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Deliveries for Amazon.com, Prime Now, Amazon Fresh, restaurants, grocery stores • Millions of packages per year
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The problem • Collecting, processing, and storing telemetry data • Telemetry data = remote measurements • Includes metrics, crashes, logs, sensor data, clickstream data, etc.
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The goal • Understand what’s happening in the field • Analyze all the data and make performance optimizations • Focus our time on improving the app and the delivery flow
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use cases
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use case 1: Alarming • We want to know within minutes if there are problems • Example: If the delivery count drops below our expected/historical value, we want to alarm
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use case 2: Troubleshooting • Logs and crashes published to AWS CloudWatch Logs in near-real time • Can filter and search to troubleshoot issues
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use case 3: Dashboards • We can write SQL, generate reports, and create visualizations • But we really want real-time dashboards instead of daily reports Daily reports Real-time dashboards
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use case 4: Releases • Deploying new app versions and monitoring adoption in real time • Release new code smoothly and with confidence
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use case 5: Sharing data • Consumers get notifications of new data in real time • Consumers can join their data with other data in the data lake S3 bucket Data lake
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use case 6: Deeper analytics • Look at the stream of data and the historical data • Build ML models, create predictions, detect anomalies
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How did we build it?
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Getting from batch to streaming • To solve our use cases, we had to incrementally improve our system • We evolved from a batch-based system to a stream-based system • Let’s walk through the iterations
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Collect metrics and send to an existing metrics service • ETL jobs to load data into a big Oracle Data Warehouse Iteration 1: Use existing systems Existing metrics serviceApp DW ETL Data collection
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1. Batch process with 24-hour delay 2. Fixed, inflexible DB schema 3. Analysis difficult and slow via SQL Iteration 1: Use existing systems Existing metrics serviceApp DW ETL Data collection
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Collect metrics in the app using AWS Amazon Mobile Analytics SDK, which automatically loads data into Redshift Iteration 2: Use AWS App CloudFormation ETL system Data collection
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1. Batch process with 24-hour delay 2-hour delay 2. Fixed, inflexible DB schema 3. Analysis difficult and slow via SQL Iteration 2: Use AWS App CloudFormation ETL system Data Collection
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Add shared configuration that is used in the app and automatically updates the Redshift schema Iteration 3: Automated DB schema App CloudFormation ETL system Data collection Schema config
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1. Batch process with 24-hour delay 2-hour delay 2. Fixed, inflexible Auto-updating DB schema 3. Analysis difficult and slow via SQL Iteration 3: Automated DB schema App Schema config CloudFormation ETL system Data collection
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Introduce a Kinesis stream and Kinesis Firehose to publish to Redshift • Partition data by date to simplify data retention policies Iteration 4: Use Streams App Data collection Via Pinpoint Schema config
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1. Batch Streaming process with 24-hour 2 hour a delay of a couple minutes 2. Fixed, inflexible Auto-updating DB schema 3. Analysis difficult and slow via SQL Iteration 4: Use Streams App Data collection Via Pinpoint Schema config
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Use generic message types • Publish the data to: • S3 • Redshift • ElasticSearch Iteration 5: Generic message types App ElasticSearch
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Iteration 5 App Data collection ElasticSearch Consumer Lambdas SQL reports Dashboards ProtoBuf Consumer Redshifts
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1. Batch Streaming process with 24-hour 2 hour a few seconds delay 2. Fixed, inflexible Auto-updating DB schema and generic message types 3. Analysis difficult and slow via SQL flexible by processing message payload Iteration 5: Generic message types
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data flow App ElasticSearch Consumer Redshifts Consumer Lambdas SQL reports Dashboards
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Future improvements Some ideas to make the system even better: 1. Use Kinesis Analytics to query the real-time data stream 2. Use AWS Athena to query data directly from S3 3. Use AWS Amazon AI Services to do deeper data analysis
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Summary Did we solve our use cases? 1. Real-time metrics and alarming 2. Real-time dashboards 3. Real-time logs and crash troubleshooting 4. Monitoring new releases 5. Sharing data with other teams 6. Deeper analytics
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of Streaming 1. Agility: real-time data means your business can react quicker 2. Flexibility: generic message types give you flexible schemas so your system can handle multiple data types and future use cases 3. Shareability: streams allow you to multiplex and share your data easily with your consumers 4. Extensibility: Processing streams of data allows us to write it to multiple data storage systems, which enables a variety of analytics tools
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!