Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Considerations for Building Your First Streaming Application (ANT359) - AWS re:Invent 2018
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Considerations for Building Your
First Streaming Application
0 7 / 1 1 / 2 0 1 8
Praveen Gattu
Software Developer Manager
AWS, Amazon Kinesis
Ryan Nienhuis
Senior Technical Product
Manager
AWS, Amazon Kinesis
A N T 3 5 9
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Talk outline
• Use case – operational dashboard
• Implement real time and batch analysis
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Application architecture
Generate web logs
Collect web logs
and deliver to
Amazon Simple
Storage Service
(Amazon S3)
Process & compute
aggregate web log metrics
Deliver processed web
log metrics to Amazon
CloudWatch
Raw web logs from
Data Firehose
Interactive
analysisof
web logs
Interactive
querying of
web logs
Alarm
Client
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Streaming with Amazon Kinesis
Easily collect, process, and analyze video and data streams in real time
Capture, process, and
store video streams
Load data streams
into AWS data stores
Analyze data streams
in real time
Capture, process, and
store data streams
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Application architecture
Generate web logs
Collect web logs
and deliver to
Amazon Simple
Storage Service
(Amazon S3)
Process & compute
aggregate web log metrics
Deliver processed web
log metrics to Amazon
CloudWatch
Raw web logs from
Data Firehose
Interactive
analysisof
web logs
Interactive
querying of
web logs
Alarm
Client
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collect logs with a Kinesis Data Firehose delivery stream
We are going to
• Write to a Data Firehose delivery stream - Simulate writing transformed
Apache web logs to a Kinesis Data Firehose delivery stream that is
configured to deliver data into an S3 bucket
• There are many different libraries that can be used to write data to a Data
Firehose delivery stream; one popular option is called the Amazon Kinesis
Agent
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collect logs with a Kinesis Data Firehose delivery stream
• So that we don’t have to install or set up software on your machine, we are
going to use a lambda function to simulate using the Amazon Kinesis agent.
The lambda function can populate a Data Firehose delivery stream using a
template and is simple to setup.
• Let’s get started!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Kinesis Data Firehose delivery to S3 stats
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring Kinesis Data Firehose delivery to S3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Application architecture
Generate web logs
Collect web logs
and deliver to
Amazon Simple
Storage Service
(Amazon S3)
Process & compute
aggregate web log metrics
Deliver processed web
log metrics to Amazon
CloudWatch
Raw web logs from
Data Firehose
Interactive
analysisof
web logs
Interactive
querying of
web logs
Alarm
Client
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Analytics
• Powerful real time applications
• Easy to use, fully managed
• Automatic elasticity
• Windowed aggregations
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Kinesis Data Analytics applications
Easily write SQL code to process streaming data
Connect to streaming source
Continuously deliver SQL results
1011101
1011010
0101010
1011101
1011010
0101010
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Process data using Kinesis Data Analytics
• SQL query to compute an aggregate metric for an interesting statistic on
the incoming data – Error Count
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
View sample records in Amazon Kinesis Data Analytics app
• Review sample records delivered to the source stream
(SOURCE_SQL_STREAM_001)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Kinesis Data Analytics application metadata
• Note that Amazon Kinesis adds metadata to each record being sent that was
shown in the formatted record sample
• The ROWTIME represents the time when the Kinesis application inserts a
row in the first in-application stream. It’s a special column used for time
series analytics. This is also known as a the processing time.
• The APPROXIMATE_ARRIVAL_TIME is the time the record was added to the
streaming source. This is also known as ingest time or server-side time.
• The event time is the timestamp when the event occurred. It’s a also called
client side time. Its useful because it’s the time when an event occurred at the
client.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Calculate an aggregate metric
Tumbling
Sliding
Custom
• Fixed size and non-overlapping
• Use FLOOR() or STEP()function in a GROUP BY statement
• Fixed size and overlapping; row boundaries are determined when
new rows enter window
• Use standard OVERand WINDOW clause
• Not fixed size and overlapping; row boundaries by conditions
• Implementations vary, but typically require two steps (step 1 –
identify boundaries, step 2 – perform computation)
Stagger • Not fixed size and non-overlapping; windows open when the first
event matching the partition key arrives
• Use WINDOWED BY STAGGER and PARTITION BY statements
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Calculate error count metric
In Kinesis Data Analytics
Application Editor Page author
following SQL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Application architecture
Generate web logs
Collect web logs
and deliver to
Amazon Simple
Storage Service
(Amazon S3)
Process & compute
aggregate web log metrics
Deliver processed web
log metrics to Amazon
CloudWatch
Raw web logs from
Data Firehose
Interactive
analysisof
web logs
Interactive
querying of
web logs
Alarm
Client
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deliver output data to Amazon CloudWatch
• Connect Kinesis Data Analytics output to lambda function
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deliver output data to Amazon CloudWatch
• Lambda function delivers results to CloudWatch metrics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Real-time alerts on error rate from CloudWatch alarms
• Alarms fire when error-rate breaches a threshold
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HOW TO KNOW THE IMPACT ?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Application architecture
Generate web logs
Collect web logs
and deliver to
Amazon Simple
Storage Service
(Amazon S3)
Process & compute
aggregate web log metrics
Deliver processed web
log metrics to Amazon
CloudWatch
Raw web logs from
Data Firehose
Interactive
analysisof
web logs
Interactive
querying of
web logs
Alarm
Client
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Interactive query service
• Query directly from Amazon S3
• Use ANSI SQL
• Serverless
• Multiple data formats
• Cost effective
Amazon
Athena
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Familiar technologies under the covers
Used for SQL Queries
In-memory distributed query engine
ANSI-SQL compatible with extensions
Used for DDL functionality
Complex data types
Multitude of formats
Supports data partitioning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Want to learn more?
Workshop sessions covering streaming and big data
• ANT213-R and ANT213-R1 – Build Your First Big Data Application on
AWS
• ANT362 - Use Streaming Data to Gain Real-Time Insights into Your
Business
• ANT318-R and ANT318-R1 - Build, Deploy and Serve Machine
Learning Models on Streaming Data Using Amazon SageMaker,
Apache Spark on Amazon EMR and Amazon Kinesis
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Praveen Gattu
Ryan Nienhuis
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

Considerations for Building Your First Streaming Application (ANT359) - AWS re:Invent 2018

  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Considerations for Building Your First Streaming Application 0 7 / 1 1 / 2 0 1 8 Praveen Gattu Software Developer Manager AWS, Amazon Kinesis Ryan Nienhuis Senior Technical Product Manager AWS, Amazon Kinesis A N T 3 5 9
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Talk outline • Use case – operational dashboard • Implement real time and batch analysis
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Application architecture Generate web logs Collect web logs and deliver to Amazon Simple Storage Service (Amazon S3) Process & compute aggregate web log metrics Deliver processed web log metrics to Amazon CloudWatch Raw web logs from Data Firehose Interactive analysisof web logs Interactive querying of web logs Alarm Client
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Streaming with Amazon Kinesis Easily collect, process, and analyze video and data streams in real time Capture, process, and store video streams Load data streams into AWS data stores Analyze data streams in real time Capture, process, and store data streams
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Application architecture Generate web logs Collect web logs and deliver to Amazon Simple Storage Service (Amazon S3) Process & compute aggregate web log metrics Deliver processed web log metrics to Amazon CloudWatch Raw web logs from Data Firehose Interactive analysisof web logs Interactive querying of web logs Alarm Client
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collect logs with a Kinesis Data Firehose delivery stream We are going to • Write to a Data Firehose delivery stream - Simulate writing transformed Apache web logs to a Kinesis Data Firehose delivery stream that is configured to deliver data into an S3 bucket • There are many different libraries that can be used to write data to a Data Firehose delivery stream; one popular option is called the Amazon Kinesis Agent
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collect logs with a Kinesis Data Firehose delivery stream • So that we don’t have to install or set up software on your machine, we are going to use a lambda function to simulate using the Amazon Kinesis agent. The lambda function can populate a Data Firehose delivery stream using a template and is simple to setup. • Let’s get started!
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Kinesis Data Firehose delivery to S3 stats
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring Kinesis Data Firehose delivery to S3
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Application architecture Generate web logs Collect web logs and deliver to Amazon Simple Storage Service (Amazon S3) Process & compute aggregate web log metrics Deliver processed web log metrics to Amazon CloudWatch Raw web logs from Data Firehose Interactive analysisof web logs Interactive querying of web logs Alarm Client
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Analytics • Powerful real time applications • Easy to use, fully managed • Automatic elasticity • Windowed aggregations
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Kinesis Data Analytics applications Easily write SQL code to process streaming data Connect to streaming source Continuously deliver SQL results 1011101 1011010 0101010 1011101 1011010 0101010
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Process data using Kinesis Data Analytics • SQL query to compute an aggregate metric for an interesting statistic on the incoming data – Error Count
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. View sample records in Amazon Kinesis Data Analytics app • Review sample records delivered to the source stream (SOURCE_SQL_STREAM_001)
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Kinesis Data Analytics application metadata • Note that Amazon Kinesis adds metadata to each record being sent that was shown in the formatted record sample • The ROWTIME represents the time when the Kinesis application inserts a row in the first in-application stream. It’s a special column used for time series analytics. This is also known as a the processing time. • The APPROXIMATE_ARRIVAL_TIME is the time the record was added to the streaming source. This is also known as ingest time or server-side time. • The event time is the timestamp when the event occurred. It’s a also called client side time. Its useful because it’s the time when an event occurred at the client.
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Calculate an aggregate metric Tumbling Sliding Custom • Fixed size and non-overlapping • Use FLOOR() or STEP()function in a GROUP BY statement • Fixed size and overlapping; row boundaries are determined when new rows enter window • Use standard OVERand WINDOW clause • Not fixed size and overlapping; row boundaries by conditions • Implementations vary, but typically require two steps (step 1 – identify boundaries, step 2 – perform computation) Stagger • Not fixed size and non-overlapping; windows open when the first event matching the partition key arrives • Use WINDOWED BY STAGGER and PARTITION BY statements
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Calculate error count metric In Kinesis Data Analytics Application Editor Page author following SQL
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Application architecture Generate web logs Collect web logs and deliver to Amazon Simple Storage Service (Amazon S3) Process & compute aggregate web log metrics Deliver processed web log metrics to Amazon CloudWatch Raw web logs from Data Firehose Interactive analysisof web logs Interactive querying of web logs Alarm Client
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deliver output data to Amazon CloudWatch • Connect Kinesis Data Analytics output to lambda function
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deliver output data to Amazon CloudWatch • Lambda function delivers results to CloudWatch metrics
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Real-time alerts on error rate from CloudWatch alarms • Alarms fire when error-rate breaches a threshold
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. HOW TO KNOW THE IMPACT ?
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Application architecture Generate web logs Collect web logs and deliver to Amazon Simple Storage Service (Amazon S3) Process & compute aggregate web log metrics Deliver processed web log metrics to Amazon CloudWatch Raw web logs from Data Firehose Interactive analysisof web logs Interactive querying of web logs Alarm Client
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Interactive query service • Query directly from Amazon S3 • Use ANSI SQL • Serverless • Multiple data formats • Cost effective Amazon Athena
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Familiar technologies under the covers Used for SQL Queries In-memory distributed query engine ANSI-SQL compatible with extensions Used for DDL functionality Complex data types Multitude of formats Supports data partitioning
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Want to learn more? Workshop sessions covering streaming and big data • ANT213-R and ANT213-R1 – Build Your First Big Data Application on AWS • ANT362 - Use Streaming Data to Gain Real-Time Insights into Your Business • ANT318-R and ANT318-R1 - Build, Deploy and Serve Machine Learning Models on Streaming Data Using Amazon SageMaker, Apache Spark on Amazon EMR and Amazon Kinesis
  • 33. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Praveen Gattu Ryan Nienhuis
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.