Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3472716.3472842acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
poster

Cost-effective data analytics across multiple cloud regions

Published: 23 August 2021 Publication History

Abstract

We propose a cloud-native data analytics engine for processing data stored among geographically distributed cloud regions with reduced cost. A job is split into subtasks and placed across regions based on factors including prices of compute resources and data transmission. We present its architecture which leverages existing cloud infrastructures and discuss major challenges of its system design. Preliminary experiments show that the cost is reduced by 15.1% for a decision support query on a four-region public cloud setup.

References

[1]
Amazon Web Services 2021a. Amazon S3. https://aws.amazon.com/s3/.
[2]
Amazon Web Services 2021b. AWS Lambda. https://aws.amazon.com/lambda/.
[3]
Amazon Web Services 2021c. Centralized Logging. https://aws.amazon.com/solutions/implementations/centralized-logging/.
[4]
Amazon Web Services 2021d. Regions and Availability Zones. https://aws.amazon.com/about-aws/global-infrastructure/regionsaz/.
[5]
Databricks 2021. Catalyst Optimizer. https://databricks.com/glossary/catalyst-optimizer.
[6]
Tarek Elgamal. 2018. Costless: Optimizing cost of serverless computing through function fusion and placement. In 2018 IEEE/ACM Symposium on Edge Computing (SEC). IEEE, 300–312.
[7]
Anshul Gandhi and Justin Chan. 2015. Analyzing the Network for AWS Distributed Cloud Computing. SIGMETRICS Perform. Eval. Rev. 43, 3 (Nov. 2015), 12–15. 0163-5999
[8]
Kyungyong Lee and Myungjun Son. 2017. DeepSpotCloud: Leveraging Cross-Region GPU Spot Instances for Deep Learning. In 2017 IEEE 10th International Conference on Cloud Computing (CLOUD). 98–105.
[9]
Ming Mao and Marty Humphrey. 2011. Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. 1–12.
[10]
Microsoft 2021. Azure geographies. https://azure.microsoft.com/en-us/global-infrastructure/geographies/.
[11]
Netflix 2016. Completing the Netflix Cloud Migration. https://about.netflix.com/en/news/completing-the-netflix-cloud-migration.
[12]
Suraj Pandey, Adam Barker, Kapil Kumar Gupta, and Rajkumar Buyya. 2010. Minimizing Execution Costs when Using Globally Distributed Cloud Services. In 2010 24th IEEE International Conference on Advanced Information Networking and Applications. 222–229.
[13]
Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, and Ion Stoica. 2015. Low Latency Geo-Distributed Data Analytics. SIGCOMM Comput. Commun. Rev. 45, 4 (Aug. 2015), 421–434. 0146-4833
[14]
Qifan Pu, Shivaram Venkataraman, and Ion Stoica. 2019. Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 193–206. https://www.usenix.org/conference/nsdi19/presentation/pu
[15]
Snowflake 2021. Sharing Data Securely Across Regions and Cloud Platforms. https://docs.snowflake.com/en/user-guide/secure-data-sharing-across-regions-plaforms.html.
[16]
TPC 2021. TPC-DS benchmark. http://www.tpc.org/tpcds/.
[17]
Raajay Viswanathan, Ganesh Ananthanarayanan, and Aditya Akella. 2016. CLARINET: WAN-Aware Optimization for Analytics Queries. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 435–450. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/viswanathan

Index Terms

  1. Cost-effective data analytics across multiple cloud regions

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGCOMM '21: Proceedings of the SIGCOMM '21 Poster and Demo Sessions
      August 2021
      94 pages
      ISBN:9781450386296
      DOI:10.1145/3472716
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 August 2021

      Check for updates

      Author Tags

      1. cost optimization
      2. data analytics
      3. job scheduling
      4. multi-cloud

      Qualifiers

      • Poster

      Conference

      SIGCOMM '21
      Sponsor:
      SIGCOMM '21: ACM SIGCOMM 2021 Conference
      August 23 - 27, 2021
      Virtual Event

      Acceptance Rates

      SIGCOMM '21 Paper Acceptance Rate 30 of 56 submissions, 54%;
      Overall Acceptance Rate 92 of 158 submissions, 58%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 1,099
        Total Downloads
      • Downloads (Last 12 months)115
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 30 Aug 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media