Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3603269.3604816acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Ditto: Efficient Serverless Analytics with Elastic Parallelism

Published: 01 September 2023 Publication History

Abstract

Serverless computing provides fine-grained resource elasticity for data analytics---a job can flexibly scale its resources for each stage, instead of sticking to a fixed pool of resources throughout its lifetime. Due to different data dependencies and different shuffling overheads caused by intra- and inter-server communication, the best degree of parallelism (DoP) for each stage varies based on runtime conditions.
We present Ditto, a job scheduler for serverless analytics that leverages fine-grained resource elasticity to optimize for job completion time (JCT) and cost. The key idea of Ditto is to use a new scheduling granularity---stage group---to decouple parallelism configuration from function placement. Ditto bundles stages into stage groups based on their data dependencies and IO characteristics. It exploits the parallelized time characteristics of the stages to determine the parallelism configuration, and prioritizes the placement of stage groups with large shuffling traffic, so that the stages in these groups can leverage zero-copy intra-server communication for efficient shuffling. We build a system prototype of Ditto and evaluate it with a variety of benchmarking workloads. Experimental results show that Ditto outperforms existing solutions by up to 2.5× on JCT and up to 1.8× on cost.

References

[1]
2023. Amazon Aurora Serverless. https://aws.amazon.com/rds/aurora/serverless.
[2]
2023. Amazon ElastiCache. https://aws.amazon.com/elasticache/.
[3]
2023. Amazon ElasticCache Pricing. https://aws.amazon.com/elasticache/pricing/.
[4]
2023. Amazon Glue. https://aws.amazon.com/glue/.
[5]
2023. Amazon Lambda. https://aws.amazon.com/lambda/.
[6]
2023. Amazon S3 Pricing. https://aws.amazon.com/s3/pricing/.
[7]
2023. Amazon simple storage service (S3). https://aws.amazon.com/s3/.
[8]
2023. Apache Hive. https://hive.apache.org/.
[9]
2023. Apache OpenWhisk. https://openwhisk.apache.org/.
[10]
2023. Azure Functions. https://azure.microsoft.com/products/functions/.
[11]
2023. Azure Synapse Analytics. https://azure.microsoft.com/products/synapse-analytics/.
[12]
2023. Google BigQuery. https://cloud.google.com/bigquery/.
[13]
2023. Google Cloud Functions. https://cloud.google.com/functions/.
[14]
2023. Knative. https://knative.dev/.
[15]
2023. Redis. https://redis.io/.
[16]
2023. Spark SQL. https://spark.apache.org/sql/.
[17]
2023. TPC-DS. https://www.tpc.org/tpcds/.
[18]
Sameer Agarwal, Srikanth Kandula, Nicolas Bruno, Ming-Chuan Wu, Ion Stoica, and Jingren Zhou. 2012. Re-Optimizing Data-Parallel Computing. In USENIX NSDI.
[19]
Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt. 2018. SAND: Towards High-Performance Serverless Computing. In USENIX ATC.
[20]
Ahsan Ali, Riccardo Pinciroli, Feng Yan, and Evgenia Smirni. 2019. BATCH: Machine Learning Inference Serving on Serverless Platforms with Adaptive Batching. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[21]
Lixiang Ao, Liz Izhikevich, Geoffrey M. Voelker, and George Porter. 2018. Sprocket: A Serverless Video Processing Framework. In ACM Symposium on Cloud Computing.
[22]
Benjamin Carver, Jingyuan Zhang, Ao Wang, Ali Anwar, Panruo Wu, and Yue Cheng. 2020. Wukong: A Scalable and Locality-Enhanced Framework for Server-less Parallel Computing. In ACM Symposium on Cloud Computing.
[23]
Paul Castro, Vatche Ishakian, Vinod Muthusamy, and Aleksander Slominski. 2019. The Rise of Serverless Computing. Commun. ACM (2019).
[24]
Chandra Chekuri, Waqar Hasan, and Rajeev Motwani. 1995. Scheduling Problems in Parallel Query Optimization. In ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.
[25]
Mosharaf Chowdhury, Zhenhua Liu, Ali Ghodsi, and Ion Stoica. 2016. HUG: Multi-Resource Fairness for Correlated and Elastic Demands. In USENIX NSDI.
[26]
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In USENIX OSDI.
[27]
Sadjad Fouladi, Francisco Romero, Dan Iter, Qian Li, Shuvo Chatterjee, Christos Kozyrakis, Matei Zaharia, and Keith Winstein. 2019. From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers. In USENIX ATC.
[28]
Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In USENIX NSDI.
[29]
Minos N. Garofalakis and Yannis E. Ioannidis. 1996. Multi-Dimensional Resource Scheduling for Parallel Queries. ACM SIGMOD Rec. (1996).
[30]
Minos N. Garofalakis and Yannis E. Ioannidis. 1997. Parallel Query Scheduling and Optimization with Time- and Space-Shared Resources. In Proceedings of the VLDB Endowment.
[31]
Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, and Aditya Akella. 2014. Multi-Resource Packing for Cluster Schedulers. In ACM SIGCOMM.
[32]
Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. 2016. Altruistic Scheduling in Multi-Resource Clusters. In USENIX OSDI.
[33]
Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, and Janardhan Kulkarni. 2016. GRAPHENE: Packing and Dependency-Aware Scheduling for Data-Parallel Clusters. In USENIX OSDI.
[34]
Chien-Chun Hung, Ganesh Ananthanarayanan, Leana Golubchik, Minlan Yu, and Mingyang Zhang. 2018. Wide-Area Analytics with Multiple Resources. In EuroSys.
[35]
Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. 2009. Quincy: Fair Scheduling for Distributed Computing Clusters. In ACM SOSP.
[36]
Zhipeng Jia and Emmett Witchel. 2021. Nightcore: Efficient and Scalable Server-less Computing for Latency-Sensitive, Interactive Microservices. In ACM ASP-LOS.
[37]
Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja J. Yadwadkar, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica, and David A. Patterson. 2019. Cloud Programming Simplified: A Berkeley View on Serverless Computing. Commun. ACM (2019).
[38]
Ana Klimovic, Yawen Wang, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, and Christos Kozyrakis. 2018. Pocket: Elastic Ephemeral Storage for Serverless Analytics. In USENIX OSDI.
[39]
Konstantinos Kloudas, Margarida Mamede, Nuno Preguiça, and Rodrigo Rodrigues. 2015. Pixida: Optimizing Data Parallel Jobs in Wide-Area Data Analytics. In Proceedings of the VLDB Endowment.
[40]
Swaroop Kotni, Ajay Nayak, Vinod Ganapathy, and Arkaprava Basu. 2021. Faastlane: Accelerating Function-as-a-Service Workflows. In USENIX ATC.
[41]
Ashraf Mahgoub, Karthick Shankar, Subrata Mitra, Ana Klimovic, Somali Chaterji, and Saurabh Bagchi. 2021. SONIC: Application-aware Data Passing for Chained Serverless Applications. In USENIX ATC.
[42]
Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning Scheduling Algorithms for Data Processing Clusters. In ACM SIGCOMM.
[43]
Matthew Perron, Raul Castro Fernandez, David DeWitt, and Samuel Madden. 2020. Starling: A Scalable Query Engine on Cloud Functions. In ACM SIGMOD.
[44]
Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, and Ion Stoica. 2015. Low Latency Geo-Distributed Data Analytics. In ACM SIGCOMM.
[45]
Qifan Pu, Shivaram Venkataraman, and Ion Stoica. 2019. Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure. In USENIX NSDI.
[46]
Shixiong Qi, Leslie Monis, Ziteng Zeng, Ian-chin Wang, and K. K. Ramakrishnan. 2022. SPRIGHT: Extracting the Server from Serverless Computing! High-Performance EBPF-Based Event-Driven, Shared-Memory Processing. In ACM SIGCOMM.
[47]
Hao Wang, Di Niu, and Baochun Li. 2019. Distributed Machine Learning with a Serverless Architecture. In IEEE INFOCOM.
[48]
Minchen Yu, Tingjia Cao, Wei Wang, and Ruichuan Chen. 2023. Following the Data, Not the Function: Rethinking Function Orchestration in Serverless Computing. In USENIX NSDI.
[49]
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In EuroSys.
[50]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In USENIX HotCloud Workshop.
[51]
Hong Zhang, Yupeng Tang, Anurag Khandelwal, Jingrong Chen, and Ion Stoica. 2021. Caerus: NIMBLE Task Scheduling for Serverless Analytics. In USENIX NSDI.

Cited By

View all
  • (2024)SURE: Secure Unikernels Make Serverless Computing Rapid and EfficientProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698558(668-688)Online publication date: 20-Nov-2024
  • (2024)Streamlining Cloud-Native Application Development and Deployment with Robust EncapsulationProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698552(847-865)Online publication date: 20-Nov-2024
  • (2024)Seraph: A Performance-Cost Aware Tuner for Training Reinforcement Learning Model on Serverless ComputingProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680479(95-101)Online publication date: 4-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference
September 2023
1217 pages
ISBN:9798400702365
DOI:10.1145/3603269
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. serverless computing
  2. data analytics
  3. task scheduling

Qualifiers

  • Research-article

Funding Sources

Conference

ACM SIGCOMM '23
Sponsor:
ACM SIGCOMM '23: ACM SIGCOMM 2023 Conference
September 10, 2023
NY, New York, USA

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)412
  • Downloads (Last 6 weeks)24
Reflects downloads up to 01 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SURE: Secure Unikernels Make Serverless Computing Rapid and EfficientProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698558(668-688)Online publication date: 20-Nov-2024
  • (2024)Streamlining Cloud-Native Application Development and Deployment with Robust EncapsulationProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698552(847-865)Online publication date: 20-Nov-2024
  • (2024)Seraph: A Performance-Cost Aware Tuner for Training Reinforcement Learning Model on Serverless ComputingProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680479(95-101)Online publication date: 4-Sep-2024
  • (2024)Dexter: A Performance-Cost Efficient Resource Allocation Manager for Serverless Data AnalyticsProceedings of the 25th International Middleware Conference10.1145/3652892.3700753(117-130)Online publication date: 2-Dec-2024
  • (2024)Pyxis: Scheduling Mixed Tasks in Disaggregated DatacentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.341862035:9(1536-1550)Online publication date: Sep-2024
  • (2024)SPRIGHT: High-Performance eBPF-Based Event-Driven, Shared-Memory Processing for Serverless ComputingIEEE/ACM Transactions on Networking10.1109/TNET.2024.336656132:3(2539-2554)Online publication date: Jun-2024
  • (2024)Online Container Caching with Late-Warm for IoT Data Processing2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00127(1547-1560)Online publication date: 13-May-2024
  • (2023)Online Function Caching in Serverless Edge Computing2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00308(2295-2302)Online publication date: 17-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media