Amazon Athena - Use Cases
Amazon Athena - Use Cases
Amazon Athena - Use Cases
Discover how Snowflake and other CDWs perform for real-time data READ THE REPORT
Given the streaming nature of the data, The Meet Group sought to find a solution to integrate its data
pipelines and central data collection to drive better real-time aggregate insights and analysis.
The solution
The Meet Group leverages Upsolver as a real-time collection and transformation engine that connects data
producers such as Apache Kafka, Amazon Kinesis, and operational databases to analysis tools such as
Amazon Athena, Amazon Redshift, and Amazon Elasticsearch Service. By choosing a data lake on AWS, The
Meet Group can leverage the best tool for each analytics and data science use case, using Upsolver for
scaling and optimizing the data pipelines automatically to meet their business demands.
Working with data at this scale is always challenging from an ETL perspective. The company had built its data
lake infrastructure on ingesting data from Amazon Kinesis via a Lambda function that ensured exactly-once
processing, while ETL was handled via a batch process, coded in Spark/Hadoop and running on an Amazon
EMR cluster once a day. .Amazon Athena was used to query the data, and due to the batch latencies, the
data in Athena was either up to 24 hours old, or expensive and slow to query as it had not yet been
compacted
The solution
Events are generated by scripts on publisher websites, which are streamed via Amazon Kinesis Streams.
Upsolver ingests the data from Kinesis and writes it to S3 while ensuring partitioning, exactly-once
processing, and other data lake best practices are enforced.
From there, the company built its output ETL flows to Amazon Athena, which is used for data science as well
as BI reporting via Domo. For internal reporting, Upsolver creates daily aggregations of the data which are
processed by a homegrown reporting solution.
While the previous implementation was based on manual coding, Upsolver enables Browsi to manage all ETL
flows from its visual interface and without writing any code. Data engineers are then free to spend their time
developing features rather than infrastructure.
New cloud environments also allow Cox Automotive to onboard applications and data even faster than
before, though the increasing data volume was not without its challenges:
Exponential cost with rapid business and data growth: ingesting and storing data in its raw form is
difficult due to high costs.
Scalability with business growth: onboarding new applications is a long process that requires a
dedicated team with an experienced coding background.
Business agility: retention of data long-term is required to build machine learning models or replay data
from its raw form when needed.
The solution
The benchmark results are in! Discover how Snowflake and other CDWs perform for real-time data READ THE REPORT
Upsolver and Amazon Athena are being used together to deliver an end-to-end solution for streaming data
analytics that is quick to set up and easy to operate without coding. Upsolver processes the raw data to gain
powerful visibility into the cloud operations to ensure application uptime and stability while minimizing costs
by only ingesting relevant data into Splunk.
The architecture also scales elastically using cloud-native computing and storage without manual
maintenance.
4. Product Analytics
The challenge
Sisense is one of the leading software providers in the highly competitive business intelligence and analytics
space. Seeking to expand the scope of its internal analytics, Sisense set out to build a data lake in the AWS
cloud in order to more effectively store and analyze product usage data. However, the rapid growth in its
customer base created a massive influx of data that had accumulated to over 200bn records, with over
150gb of new event data created daily and 20 terabytes overall.
To effectively manage this sprawling data stream, Sisense set out to build a data lake on AWS S3, from which
they could deliver structured datasets for further analysis using its own software – and to do so in a way that
was agile, quick and cost-effective.
The solution
Using a combination of Athena, Sisense’s own BI software, and Upsolver allows different teams at Sisense to
access product log analytics with regular SQL: from product managers having a better understanding of how
the software is being used, through sales representatives gaining real-time insights into prospect behavior,
to data scientists working on machine learning algorithms.
The company had been using IBM Netezza for the past 10 years. As the system approached end of life
support, Peer39 decided to reevaluate their technology stack. Their legacy process and technology stack
presented many challenges, including limited data availability, lack of accuracy, and lack of business agility.
The solution
Moving their data infrastructure from on-prem Netezza to a cloud data lake built on Upsolve, Amazon S3 and
Amazon Athena gave Peer39 a cloud-native, data stream processing platform with an easy-to-use UI.
Instead of relying solely on Netezza for storage and transformations, with all the limitations that created,
Peer39 deployed a modernized data stack in the cloud – with the raw event data stored on Amazon S3, and
then curated using Athena for further reporting, analytics and data science.
© 2021 Upsolver All Rights Reserved. Terms | Privacy LOGIN FREE TRIAL