Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3663741.3664788acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

FunDa: Towards Serverless Data Analytics and In Situ Query Processing

Published: 09 June 2024 Publication History

Abstract

Serverless is a cloud computing paradigm that offers unique advantages to the users due to its pay-what-you-use model. It is particularly suitable for running ephemeral short running tasks. However, this framework is not well-suited for stateful long running data analytics and query processing tasks. Although several recent projects have proposed serverless query processing or data analytics systems that are based on AWS Lambda, they are limited by its constraints and the inherent limitations of serverless.
We propose FunDa, an on-premises serverless data analytics framework, which aims to address the limitations of existing approaches. FunDa extends our previously proposed system for unified data analytics and in situ SQL query processing called DaskDB. We evaluate our prototype with two different workloads with three scale factors. While our experimental results are quite promising, they also reveal significant future research opportunities.

References

[1]
Gustavo Alonso, Ana Klimovic, Tom Kuchler, and Michael Wawrzoniak. 2023. Rethinking Serverless Computing: from the Programming Model to the Platform Design. In VLDB Workshops.
[2]
Haoqiong Bian, Tiannan Sha, and Anastasia Ailamaki. 2023. Using Cloud Functions as Accelerator for Elastic Data Analytics. Proc. ACM Manag. Data 1, 2, Article 161 (jun 2023), 27 pages.
[3]
curl [n. d.]. curl. https://curl.se/.
[4]
Suvam Kumar Das, Ronnit Peter, and Suprio Ray. 2023. Scalable Spatial Analytics and In Situ Query Processing in DaskDB. In Proceedings of the 18th International Symposium on Spatial and Temporal Data (, Calgary, AB, Canada,) (SSTD ’23). Association for Computing Machinery, New York, NY, USA, 189–193. https://doi.org/10.1145/3609956.3609978
[5]
Dask [n. d.]. Dask. https://www.dask.org/.
[6]
Docker [n. d.]. Docker. https://www.docker.com/.
[7]
Docker Swarm [n. d.]. Docker Swarm. https://docs.docker.com/engine/swarm/.
[8]
Fn Project [n. d.]. Fn Project. https://fnproject.io/.
[9]
Hadoop [n. d.]. Apache Hadoop. http://hadoop.apache.org/.
[10]
Eric Jonas, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the Cloud: Distributed Computing for the 99%. CoRR abs/1702.04024 (2017).
[11]
Youngbin Kim and Jimmy Lin. 2018. Serverless Data Analytics with Flint. In CLOUD. IEEE Computer Society, 451–455.
[12]
Ingo Müller, Renato Marroquín, and Gustavo Alonso. 2020. Lambada: Interactive Data Analytics on Cold Data Using Serverless Cloud Infrastructure. In SIGMOD.
[13]
Shoumik Palkar and Matei Zaharia. 2017. DIY Hosting for Online Privacy(HotNets). 1–7.
[14]
Matthew Perron, Raul Castro Fernandez, David J. DeWitt, and Samuel Madden. [n. d.]. Starling: A Scalable Query Engine on Cloud Functions. In SIGMOD. 131–141.
[15]
Johann Schleier-Smith, Vikram Sreekanti, Anurag Khandelwal, Joao Carreira, Neeraja J. Yadwadkar, Raluca Ada Popa, Joseph E. Gonzalez, Ion Stoica, and David A. Patterson. 2021. What serverless computing is and should become: the next phase of cloud computing. Commun. ACM 64, 5 (apr 2021), 76–84.
[16]
Spark [n. d.]. Apache Spark. https://spark.apache.org/.
[17]
Sacheendra Talluri, Nikolas Herbst, Cristina Abad, Tiziano De Matteis, and Alexandru Iosup. 2024. ExDe: Design space exploration of scheduler architectures and mechanisms for serverless data-processing. Future Generation Computer Systems 153 (2024), 84–96.
[18]
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: A Warehousing Solution over a Map-reduce Framework. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1626–1629.
[19]
TPC-H [n. d.]. TPC-H. https://www.tpc.org/tpch/.
[20]
Alex Watson, Suvam Kumar Das, and Suprio Ray. 2021. DaskDB: Scalable Data Science with Unified Data Analytics and In Situ Query Processing. In 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA).
[21]
Michael Wawrzoniak, Rodrigo Bruno, Ana Klimovic, and Gustavo Alonso. 2023. Ephemeral Per-query Engines for Serverless Analytics. In Joint Proceedings of Workshops at VLDB(CEUR Workshop Proceedings).
[22]
Michal Wawrzoniak, Gianluca Moro, Rodrigo Bruno, Ana Klimovic, and Gustavo Alonso. 2024. Off-the-shelf Data Analytics on Serverless. In Conference on Innovative Data Systems Research CIDR.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BiDEDE '24: Proceedings of the International Workshop on Big Data in Emergent Distributed Environments
June 2024
53 pages
ISBN:9798400706790
DOI:10.1145/3663741
  • Editors:
  • Philippe Cudré-Mauroux,
  • Andrea Kö,
  • Robert Wrembel
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DaskDB
  2. Data Analytics
  3. In situ Query Processing
  4. Serverless

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • NSERC
  • NBIF

Conference

SIGMOD/PODS '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 25 of 47 submissions, 53%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 86
    Total Downloads
  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)5
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media