AWS Athena Knowledgebase

Amazon Athena is an interactive query service that allows users to easily analyze data stored in Amazon S3 using standard SQL. It is serverless, so there is no infrastructure to manage, and users only pay for the queries they run. Athena uses Presto for SQL queries and works with common data formats like CSV, JSON, ORC and Parquet. It provides fast query performance for datasets in S3 and automatically executes queries in parallel.

Uploaded by

David Joseph

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views

AWS Athena Knowledgebase

Uploaded by

David Joseph

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

AWS ATHENA

KNOWLEDGEBASE
Supplementary Material to AWS ReStart

JUNE 13, 2022

Amazon Athena

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon
S3 using standard SQL.

Athena is serverless, so there is no infrastructure to manage, and you pay only for the
queries that you run.

Athena is easy to use – simply point to your data in Amazon S3, define the schema, and start
querying using standard SQL.

Amazon Athena uses Presto with full standard SQL support and works with a variety of
standard data formats, including CSV, JSON, ORC, Apache Parquet and Avro.

While Amazon Athena is ideal for quick, ad-hoc querying and integrates with Amazon
QuickSight for easy visualization, it can also handle complex analysis, including large joins,
window functions, and arrays.

Amazon Athena uses a managed Data Catalog to store information and schemas about the
databases and tables that you create for your data stored in Amazon S3.

With Amazon Athena, you don’t have to worry about managing or tuning clusters to get fast
performance.

Athena is optimized for fast performance with Amazon S3.

Athena automatically executes queries in parallel, so that you get query results in seconds,
even on large datasets.

Most results are delivered within seconds.

With Athena, there’s no need for complex ETL jobs to prepare data for analysis.

This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.

Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing you to create a

unified metadata repository across various services, crawl data sources to discover schemas
and populate your Catalog with new and modified table and partition definitions, and
maintain schema versioning.

You can also use Glue’s fully managed ETL capabilities to transform data or convert it into
columnar formats to optimize cost and improve performance.
AWS Athena Use Cases
Query services like Amazon Athena, data warehouses like Amazon Redshift, and
sophisticated data processing frameworks like Amazon EMR, all address different needs and
use cases.

Amazon Redshift provides the fastest query performance for enterprise reporting and
business intelligence workloads, particularly those involving extremely complex SQL with
multiple joins and sub-queries.

Amazon EMR makes it simple and cost effective to run highly distributed processing
frameworks such as Hadoop, Spark, and Presto when compared to on-premises
deployments.

Amazon EMR is flexible – you can run custom applications and code, and define specific
compute, memory, storage, and application parameters to optimize your analytic
requirements.

Amazon Athena provides the easiest way to run ad-hoc queries for data in S3 without the
need to setup or manage any servers.

The table below shows the primary use case and situations for using a few AWS query and
analytics services:

AWS Service Primary Use When to use

Case
Amazon Query Run interactive queries against data directly in Amazon S3
Athena without worrying about formatting data or managing
infrastructure. Can use with other services such as Amazon
RedShift
Amazon Data Pull data from many sources, format and organize it, store it,
RedShift Warehouse and support complex, high speed queries that produce business
reports.
Amazon EMR Data Highly distributed processing frameworks such as Hadoop,
Processing Spark, and Presto. Run a wide variety of scale-out data
processing tasks for applications such as machine learning,
graph analytics, data transformation, streaming data.
AWS Glue ETL Service Transform and move data to various destinations. Used to
prepare and load data for analytics. Data source can be S3,
RedShift or another database. Glue Data Catalog can be queried
by Athena, EMR and RedShift Spectrum
Best Practices

Best practices for performance with Athena:

 Partition your data – Partition the table into parts and keeps the related data
together based on column values such as date, country, region, etc. Athena supports
Hive partitioning.
 Bucket your data – Partition your data is to bucket the data within a single partition.
 Use Compression – AWS recommend using either Apache Parquet or Apache ORC.
 Optimize file sizes – Queries run more efficiently when reading data can be
parallelized and when blocks of data can be read sequentially.
 Optimize columnar data store generation – Apache Parquet and Apache ORC are
popular columnar data stores.
 Optimize ORDER BY – The ORDER BY clause returns the results of a query in sort
order.
 Optimize GROUP BY – The GROUP BY operator distributes rows based on the GROUP
BY columns to worker nodes, which hold the GROUP BY values in memory.
 Use approximate functions – For exploring large datasets, a common use case is to
find the count of distinct values for a certain column using COUNT(DISTINCT
column).
 Only include the columns that you need – When running your queries, limit the final
SELECT statement to only the columns that you need instead of selecting all
columns.

Pricing
With Amazon Athena, you pay only for the queries that you run.

You are charged based on the amount of data scanned by each query.

You can get significant cost savings and performance gains by compressing, partitioning, or
converting your data to a columnar format, because each of those operations reduces the
amount of data that Athena needs to scan to execute a query.

Amazon DEA-C01 AWS Certified Data Engineer - Associate Dumps
No ratings yet
Amazon DEA-C01 AWS Certified Data Engineer - Associate Dumps
20 pages
Snowflake Certification
No ratings yet
Snowflake Certification
102 pages
Shin-Nippon SLM-4000-5000 - Service Manual PDF
No ratings yet
Shin-Nippon SLM-4000-5000 - Service Manual PDF
46 pages
Data Cleaning in SQL
No ratings yet
Data Cleaning in SQL
29 pages
Elastic Stack: Elasticsearch Logstash and Kibana
No ratings yet
Elastic Stack: Elasticsearch Logstash and Kibana
24 pages
Aws Glue Information
No ratings yet
Aws Glue Information
46 pages
Oracle 11g Streams Implementer's Guide
From Everand
Oracle 11g Streams Implementer's Guide
Ann L. R. McKinnell
No ratings yet
IBM Security Product Integration Reference
100% (1)
IBM Security Product Integration Reference
14 pages
Certified Cloud Practitoner CheatSheet
No ratings yet
Certified Cloud Practitoner CheatSheet
16 pages
SS1123 - D2T - Apache Cassandra Overview PDF
100% (1)
SS1123 - D2T - Apache Cassandra Overview PDF
45 pages
Akash Resume
No ratings yet
Akash Resume
7 pages
AWS Redshift
No ratings yet
AWS Redshift
145 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
05.azure Data Lake Authentication
No ratings yet
05.azure Data Lake Authentication
16 pages
Business Intelligence DW
No ratings yet
Business Intelligence DW
17 pages
AWS Certified SysOps Administrator
No ratings yet
AWS Certified SysOps Administrator
3 pages
Spark A To Z
No ratings yet
Spark A To Z
63 pages
Master_Snowflake_Interview_Q_A_�_1729835390
No ratings yet
Master_Snowflake_Interview_Q_A_�_1729835390
7 pages
Azure Data Factory Monitoring Best Practices
No ratings yet
Azure Data Factory Monitoring Best Practices
9 pages
AWS Certification Preparation Notes
No ratings yet
AWS Certification Preparation Notes
25 pages
DW Olap
No ratings yet
DW Olap
57 pages
Matillion Optimizing Snowflake
No ratings yet
Matillion Optimizing Snowflake
23 pages
SQL Server Theory
No ratings yet
SQL Server Theory
2 pages
Databricksmcqsquestionsandanswers
No ratings yet
Databricksmcqsquestionsandanswers
5 pages
Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
Data Engineering
100% (1)
Data Engineering
131 pages
Using Sysdig To Troubleshoot Like A Boss
No ratings yet
Using Sysdig To Troubleshoot Like A Boss
17 pages
Aws Redshift: Calculations Are Typically Executed On Small Number of Columns
No ratings yet
Aws Redshift: Calculations Are Typically Executed On Small Number of Columns
8 pages
AWS Certified Data Engineer - Cheat Sheet _ MyDE
No ratings yet
AWS Certified Data Engineer - Cheat Sheet _ MyDE
87 pages
ElasticSearch Interview Questions and Answers 40
No ratings yet
ElasticSearch Interview Questions and Answers 40
7 pages
Databricks Developer Resume
No ratings yet
Databricks Developer Resume
3 pages
Control M Jobs - An Overview
No ratings yet
Control M Jobs - An Overview
14 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
Snowflake Architecture
No ratings yet
Snowflake Architecture
18 pages
Talend Installation Guide (Data Service Platform)
No ratings yet
Talend Installation Guide (Data Service Platform)
14 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
Snowflake Fundamentals Anand Jha
No ratings yet
Snowflake Fundamentals Anand Jha
50 pages
Snowflake Standards
No ratings yet
Snowflake Standards
2 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
3 Lecture 3-ETL
100% (1)
3 Lecture 3-ETL
42 pages
Apache Airflow Fundamentals Study Guide
No ratings yet
Apache Airflow Fundamentals Study Guide
7 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
Tutorial Analysis Service Tabular Model
No ratings yet
Tutorial Analysis Service Tabular Model
113 pages
Talend Data Integration Basics
No ratings yet
Talend Data Integration Basics
3 pages
SQL Replication Basic
No ratings yet
SQL Replication Basic
22 pages
Module 8 - Database Services
No ratings yet
Module 8 - Database Services
33 pages
Amazon Opensearch
No ratings yet
Amazon Opensearch
6 pages
Interview Questions On ADF
No ratings yet
Interview Questions On ADF
2 pages
MS Azure: Online, Classroom, Corporate Mr. Khaja 45 Days
No ratings yet
MS Azure: Online, Classroom, Corporate Mr. Khaja 45 Days
15 pages
Benchmark Report - Amazon Redshift
No ratings yet
Benchmark Report - Amazon Redshift
22 pages
Apache Druid: Sudhindra Tirupati Nagaraj
No ratings yet
Apache Druid: Sudhindra Tirupati Nagaraj
12 pages
Monitor Logic Apps With Azure Monitor Logs - Azure Logic Apps - Microsoft Docs
No ratings yet
Monitor Logic Apps With Azure Monitor Logs - Azure Logic Apps - Microsoft Docs
19 pages
AWS IAM Notes
No ratings yet
AWS IAM Notes
12 pages
(English (Auto-Generated) ) Building End-to-End Delta Pipelines On GCP (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) Building End-to-End Delta Pipelines On GCP (DownSub - Com)
24 pages
Stream Processing at Lyft
No ratings yet
Stream Processing at Lyft
20 pages
AWS Glue 101 - All You Need To Know With A Full Walk-Through - by Kevin Bok - Towards Data Science
No ratings yet
AWS Glue 101 - All You Need To Know With A Full Walk-Through - by Kevin Bok - Towards Data Science
23 pages
Azure DataBricks Interview Questions
No ratings yet
Azure DataBricks Interview Questions
17 pages
Donald Ngandeu 1
No ratings yet
Donald Ngandeu 1
6 pages
G G 'S Bigtable: Name: Tunahan YILDIRIM Number:2195303 Paper: A Distributed Storage System For Structured Data
No ratings yet
G G 'S Bigtable: Name: Tunahan YILDIRIM Number:2195303 Paper: A Distributed Storage System For Structured Data
38 pages
AWS CP - Sruya Kiran Sir Notes
No ratings yet
AWS CP - Sruya Kiran Sir Notes
8 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
Terraform Installation On Linux
No ratings yet
Terraform Installation On Linux
1 page
Funny Linux Commands
No ratings yet
Funny Linux Commands
2 pages
AWS Cloud Practioner Certification Preparation Guide
100% (1)
AWS Cloud Practioner Certification Preparation Guide
12 pages
Untitled
No ratings yet
Untitled
24 pages
Nifty and BankNifty Constituents and Weightage
No ratings yet
Nifty and BankNifty Constituents and Weightage
5 pages
What Is AWS VPC
No ratings yet
What Is AWS VPC
9 pages
What Is Cloud Presales
No ratings yet
What Is Cloud Presales
13 pages
Image Compression: Mohamed N. Ahmed, PH.D
No ratings yet
Image Compression: Mohamed N. Ahmed, PH.D
67 pages
Chapter 1 Scope of Plant Utility System
No ratings yet
Chapter 1 Scope of Plant Utility System
19 pages
PassLeader 300-375 Exam Dumps (1-10)
No ratings yet
PassLeader 300-375 Exam Dumps (1-10)
7 pages
Final Wlte Full Notes
No ratings yet
Final Wlte Full Notes
236 pages
Optimal Control of A High Gain DC - DC Converter
No ratings yet
Optimal Control of A High Gain DC - DC Converter
11 pages
DAE Book
No ratings yet
DAE Book
80 pages
9 Process Layout
No ratings yet
9 Process Layout
55 pages
Science Process Skills: Observe Classify Measure Infer Predict Credits Extensions About The Author
No ratings yet
Science Process Skills: Observe Classify Measure Infer Predict Credits Extensions About The Author
21 pages
Class-8
No ratings yet
Class-8
130 pages
(Supplements To The Study of Time 1) Fraser, Fraser Julius Thomas - Time and Time Again - Reports From A Boundary of The Universe-Brill (2007)
No ratings yet
(Supplements To The Study of Time 1) Fraser, Fraser Julius Thomas - Time and Time Again - Reports From A Boundary of The Universe-Brill (2007)
448 pages
Chapter 6 Functions: For Educational Purpose Only. Not To Be Circulated Without This Banner
No ratings yet
Chapter 6 Functions: For Educational Purpose Only. Not To Be Circulated Without This Banner
132 pages
Validation of The Romanian Version of The Toronto Empathy Questionnaire (TEQ) Among Undergraduate Medical Students
No ratings yet
Validation of The Romanian Version of The Toronto Empathy Questionnaire (TEQ) Among Undergraduate Medical Students
15 pages
R134a PT Chart
No ratings yet
R134a PT Chart
2 pages
Honeywell Analytics Calibration Handbook 2016
No ratings yet
Honeywell Analytics Calibration Handbook 2016
164 pages
Phased Array Antenna
No ratings yet
Phased Array Antenna
8 pages
1970 Developments in Triaxial Testing Technique
No ratings yet
1970 Developments in Triaxial Testing Technique
6 pages
Advantages and Disadvantages of air cooler
No ratings yet
Advantages and Disadvantages of air cooler
152 pages
Differential and Common Mode Operation
No ratings yet
Differential and Common Mode Operation
8 pages
UA SRG JEE Adv. 24 Assignment Result PCM (12,13 - March - 2024)
No ratings yet
UA SRG JEE Adv. 24 Assignment Result PCM (12,13 - March - 2024)
3 pages
VLF - HF Receiver
No ratings yet
VLF - HF Receiver
4 pages
Gujarat Technological University: Content
No ratings yet
Gujarat Technological University: Content
5 pages
DJJ40153 - Lab Sheet 3
No ratings yet
DJJ40153 - Lab Sheet 3
3 pages
April 2022 Full Math Corrections
No ratings yet
April 2022 Full Math Corrections
26 pages
HEV Question Bank
No ratings yet
HEV Question Bank
5 pages
Prefixessuffixes
50% (4)
Prefixessuffixes
2 pages
EGYR D 22 02557 R2 Reviewer
No ratings yet
EGYR D 22 02557 R2 Reviewer
100 pages
iMOW Datasheet - IPL153
No ratings yet
iMOW Datasheet - IPL153
2 pages
Handwritten Javanese Script Recognition Method Based 12-Layers Deep Convolutional Neural Network and Data Augmentation
No ratings yet
Handwritten Javanese Script Recognition Method Based 12-Layers Deep Convolutional Neural Network and Data Augmentation
11 pages
Over Head Transmission Line Fault Detection
100% (1)
Over Head Transmission Line Fault Detection
9 pages