Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Greg Khairallah, Sr. Mgr, Business Development, AWS
Ben Snively, Sr. Solutions Architect, AWS
May 18, 2017 l 12:00 - 1:00 PM Pacific Time
Serverless Big Data Analytics using
Amazon Athena and Amazon QuickSight
1990 2000 2010 2020
Generated Data
Available for Analysis
Sources:
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Data Volume
Year
The Dark Data Problem
Most generated data is unavailable for analysis
Serverless
Cloud Architecture Evolution
Virtualized Managed Serverless
Virtualized
Servers
Managed
Platforms
Serverless
Analytics
No servers to provision
or manage
Scales with usage
Never pay for idle Availability and fault
tolerance built in
Serverless characteristics
Amazon Athena
Athena is Serverless
• No Infrastructure or
administration
• Zero Spin up time
• Transparent upgrades
Amazon Athena is Easy To Use
• Log into the Console
• Create a table
• Type in a Hive DDL Statement
• Use the console Add Table wizard
• Start querying
Amazon Athena is Highly Available
• You connect to a service endpoint or log into the console
• Athena uses warm compute pools across multiple
Availability Zones
• Your data is in Amazon S3, which is also highly available
and designed for 99.999999999% durability
Query Data Directly from Amazon S3
• No loading of data
• Query data in its raw format
• Text, CSV, JSON, weblogs, AWS service logs
• Convert to an optimized form like ORC or Parquet for the best
performance and lowest cost
• No ETL required
• Stream data from directly from Amazon S3
• Take advantage of Amazon S3 durability and availability
Use ANSI SQL
• Start writing ANSI SQL
• Support for complex joins, nested
queries & window functions
• Support for complex data types
(arrays, structs)
• Support for partitioning of data by
any key
• (date, time, custom keys)
• e.g., Year, Month, Day, Hour or
Customer Key, Date
Simple Pricing - $5/TB Scanned
• Pay by the amount of data scanned per query
• Ways to save costs
• Compress
• Convert to Columnar format
• Use partitioning
• Free: DDL Queries, Failed Queries
Dataset Size on Amazon S3 Query Run time Data Scanned Cost
Logs stored as Text
files
1 TB 237 seconds 1.15TB $5.75
Logs stored in
Apache Parquet
format*
130 GB 5.13 seconds 2.69 GB $0.013
Savings 87% less with Parquet 34x faster 99% less data scanned 99.7% cheaper
Amazon QuickSight
Amazon QuickSight is a Business Analytics Service that lets business users
quickly and easily visualize, explore, and share insights from their data.
Who Is QuickSight For?
DATA
PROFESSIONALS
BUSINESS
PROFESSIONALS
DATA CONSUMERS
Basic Concepts
Retail Data
Ops Data
Marketing Data
Relational
Databases
Flat Files
And Many Others!
Deep Integration with
AWS Data Sources
Amazon RDS,
Aurora
Amazon
Redshift
Amazon
Athena
Amazon S3
Flat Files
Discussing Today
Super-fast Performance
with SPICE
Fast, Easy Ad-Hoc
Analytics for End Users
Collaborate, Share, and Publish
New Connectors, Enhanced Visualizations, Expansion to new regions, SPICE updates, and more
New capabilities in QuickSight
Std. Edition
KPI Charts
Export to
CSV
AD
Connector
US Ohio
East Region
CloudTrail
Audit Logs
Ent. Edition
Sched.
Refresh
Excel Range
Detection
Presto
Connector
SparkSQL
Connector
Nov ’16
GA
Dec ‘16 Feb ‘17 Apr ‘17 May ‘17
A Sample Pipeline
A Sample Pipeline
Ad-hoc access to raw data using SQL
A Sample Pipeline
Ad-hoc access to data using Athena
Athena can query
aggregated datasets as well
Demo
Upcoming Webinar
Thank you!

More Related Content

Serverless Big Data Analytics with Amazon Athena and Amazon Quicksight - May 2017 AWS Online Tech Talks

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Greg Khairallah, Sr. Mgr, Business Development, AWS Ben Snively, Sr. Solutions Architect, AWS May 18, 2017 l 12:00 - 1:00 PM Pacific Time Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight
  • 2. 1990 2000 2010 2020 Generated Data Available for Analysis Sources: Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares Data Volume Year The Dark Data Problem Most generated data is unavailable for analysis
  • 4. Cloud Architecture Evolution Virtualized Managed Serverless Virtualized Servers Managed Platforms Serverless Analytics
  • 5. No servers to provision or manage Scales with usage Never pay for idle Availability and fault tolerance built in Serverless characteristics
  • 7. Athena is Serverless • No Infrastructure or administration • Zero Spin up time • Transparent upgrades
  • 8. Amazon Athena is Easy To Use • Log into the Console • Create a table • Type in a Hive DDL Statement • Use the console Add Table wizard • Start querying
  • 9. Amazon Athena is Highly Available • You connect to a service endpoint or log into the console • Athena uses warm compute pools across multiple Availability Zones • Your data is in Amazon S3, which is also highly available and designed for 99.999999999% durability
  • 10. Query Data Directly from Amazon S3 • No loading of data • Query data in its raw format • Text, CSV, JSON, weblogs, AWS service logs • Convert to an optimized form like ORC or Parquet for the best performance and lowest cost • No ETL required • Stream data from directly from Amazon S3 • Take advantage of Amazon S3 durability and availability
  • 11. Use ANSI SQL • Start writing ANSI SQL • Support for complex joins, nested queries & window functions • Support for complex data types (arrays, structs) • Support for partitioning of data by any key • (date, time, custom keys) • e.g., Year, Month, Day, Hour or Customer Key, Date
  • 12. Simple Pricing - $5/TB Scanned • Pay by the amount of data scanned per query • Ways to save costs • Compress • Convert to Columnar format • Use partitioning • Free: DDL Queries, Failed Queries Dataset Size on Amazon S3 Query Run time Data Scanned Cost Logs stored as Text files 1 TB 237 seconds 1.15TB $5.75 Logs stored in Apache Parquet format* 130 GB 5.13 seconds 2.69 GB $0.013 Savings 87% less with Parquet 34x faster 99% less data scanned 99.7% cheaper
  • 14. Amazon QuickSight is a Business Analytics Service that lets business users quickly and easily visualize, explore, and share insights from their data.
  • 15. Who Is QuickSight For? DATA PROFESSIONALS BUSINESS PROFESSIONALS DATA CONSUMERS
  • 16. Basic Concepts Retail Data Ops Data Marketing Data Relational Databases Flat Files And Many Others!
  • 17. Deep Integration with AWS Data Sources Amazon RDS, Aurora Amazon Redshift Amazon Athena Amazon S3 Flat Files Discussing Today
  • 21. New Connectors, Enhanced Visualizations, Expansion to new regions, SPICE updates, and more New capabilities in QuickSight Std. Edition KPI Charts Export to CSV AD Connector US Ohio East Region CloudTrail Audit Logs Ent. Edition Sched. Refresh Excel Range Detection Presto Connector SparkSQL Connector Nov ’16 GA Dec ‘16 Feb ‘17 Apr ‘17 May ‘17
  • 23. A Sample Pipeline Ad-hoc access to raw data using SQL
  • 24. A Sample Pipeline Ad-hoc access to data using Athena Athena can query aggregated datasets as well
  • 25. Demo