research-article

Ditto: Efficient Serverless Analytics with Elastic Parallelism

Authors:

Xin JinAuthors Info & Claims

ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference

Pages 406 - 419

https://doi.org/10.1145/3603269.3604816

Published: 01 September 2023 Publication History

Abstract

Serverless computing provides fine-grained resource elasticity for data analytics---a job can flexibly scale its resources for each stage, instead of sticking to a fixed pool of resources throughout its lifetime. Due to different data dependencies and different shuffling overheads caused by intra- and inter-server communication, the best degree of parallelism (DoP) for each stage varies based on runtime conditions.

We present Ditto, a job scheduler for serverless analytics that leverages fine-grained resource elasticity to optimize for job completion time (JCT) and cost. The key idea of Ditto is to use a new scheduling granularity---stage group---to decouple parallelism configuration from function placement. Ditto bundles stages into stage groups based on their data dependencies and IO characteristics. It exploits the parallelized time characteristics of the stages to determine the parallelism configuration, and prioritizes the placement of stage groups with large shuffling traffic, so that the stages in these groups can leverage zero-copy intra-server communication for efficient shuffling. We build a system prototype of Ditto and evaluate it with a variety of benchmarking workloads. Experimental results show that Ditto outperforms existing solutions by up to 2.5× on JCT and up to 1.8× on cost.

References

[1]

2023. Amazon Aurora Serverless. https://aws.amazon.com/rds/aurora/serverless.

[2]

2023. Amazon ElastiCache. https://aws.amazon.com/elasticache/.

[3]

2023. Amazon ElasticCache Pricing. https://aws.amazon.com/elasticache/pricing/.

[4]

2023. Amazon Glue. https://aws.amazon.com/glue/.

[5]

2023. Amazon Lambda. https://aws.amazon.com/lambda/.

[6]

2023. Amazon S3 Pricing. https://aws.amazon.com/s3/pricing/.

[7]

2023. Amazon simple storage service (S3). https://aws.amazon.com/s3/.

[8]

2023. Apache Hive. https://hive.apache.org/.

[9]

2023. Apache OpenWhisk. https://openwhisk.apache.org/.

[10]

2023. Azure Functions. https://azure.microsoft.com/products/functions/.

[11]

2023. Azure Synapse Analytics. https://azure.microsoft.com/products/synapse-analytics/.

[12]

2023. Google BigQuery. https://cloud.google.com/bigquery/.

[13]

2023. Google Cloud Functions. https://cloud.google.com/functions/.

[14]

2023. Knative. https://knative.dev/.

[15]

2023. Redis. https://redis.io/.

[16]

2023. Spark SQL. https://spark.apache.org/sql/.

[17]

2023. TPC-DS. https://www.tpc.org/tpcds/.

[18]

Sameer Agarwal, Srikanth Kandula, Nicolas Bruno, Ming-Chuan Wu, Ion Stoica, and Jingren Zhou. 2012. Re-Optimizing Data-Parallel Computing. In USENIX NSDI.

[19]

Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt. 2018. SAND: Towards High-Performance Serverless Computing. In USENIX ATC.

Digital Library

[20]

Ahsan Ali, Riccardo Pinciroli, Feng Yan, and Evgenia Smirni. 2019. BATCH: Machine Learning Inference Serving on Serverless Platforms with Adaptive Batching. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]

Lixiang Ao, Liz Izhikevich, Geoffrey M. Voelker, and George Porter. 2018. Sprocket: A Serverless Video Processing Framework. In ACM Symposium on Cloud Computing.

Digital Library

[22]

Benjamin Carver, Jingyuan Zhang, Ao Wang, Ali Anwar, Panruo Wu, and Yue Cheng. 2020. Wukong: A Scalable and Locality-Enhanced Framework for Server-less Parallel Computing. In ACM Symposium on Cloud Computing.

Digital Library

[23]

Paul Castro, Vatche Ishakian, Vinod Muthusamy, and Aleksander Slominski. 2019. The Rise of Serverless Computing. Commun. ACM (2019).

[24]

Chandra Chekuri, Waqar Hasan, and Rajeev Motwani. 1995. Scheduling Problems in Parallel Query Optimization. In ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[25]

Mosharaf Chowdhury, Zhenhua Liu, Ali Ghodsi, and Ion Stoica. 2016. HUG: Multi-Resource Fairness for Correlated and Elastic Demands. In USENIX NSDI.

Digital Library

[26]

Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In USENIX OSDI.

[27]

Sadjad Fouladi, Francisco Romero, Dan Iter, Qian Li, Shuvo Chatterjee, Christos Kozyrakis, Matei Zaharia, and Keith Winstein. 2019. From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers. In USENIX ATC.

[28]

Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In USENIX NSDI.

Digital Library

[29]

Minos N. Garofalakis and Yannis E. Ioannidis. 1996. Multi-Dimensional Resource Scheduling for Parallel Queries. ACM SIGMOD Rec. (1996).

[30]

Minos N. Garofalakis and Yannis E. Ioannidis. 1997. Parallel Query Scheduling and Optimization with Time- and Space-Shared Resources. In Proceedings of the VLDB Endowment.

[31]

Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, and Aditya Akella. 2014. Multi-Resource Packing for Cluster Schedulers. In ACM SIGCOMM.

[32]

Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. 2016. Altruistic Scheduling in Multi-Resource Clusters. In USENIX OSDI.

[33]

Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, and Janardhan Kulkarni. 2016. GRAPHENE: Packing and Dependency-Aware Scheduling for Data-Parallel Clusters. In USENIX OSDI.

[34]

Chien-Chun Hung, Ganesh Ananthanarayanan, Leana Golubchik, Minlan Yu, and Mingyang Zhang. 2018. Wide-Area Analytics with Multiple Resources. In EuroSys.

[35]

Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. 2009. Quincy: Fair Scheduling for Distributed Computing Clusters. In ACM SOSP.

Digital Library

[36]

Zhipeng Jia and Emmett Witchel. 2021. Nightcore: Efficient and Scalable Server-less Computing for Latency-Sensitive, Interactive Microservices. In ACM ASP-LOS.

[37]

Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Carreira, Karl Krauth, Neeraja J. Yadwadkar, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica, and David A. Patterson. 2019. Cloud Programming Simplified: A Berkeley View on Serverless Computing. Commun. ACM (2019).

[38]

Ana Klimovic, Yawen Wang, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, and Christos Kozyrakis. 2018. Pocket: Elastic Ephemeral Storage for Serverless Analytics. In USENIX OSDI.

[39]

Konstantinos Kloudas, Margarida Mamede, Nuno Preguiça, and Rodrigo Rodrigues. 2015. Pixida: Optimizing Data Parallel Jobs in Wide-Area Data Analytics. In Proceedings of the VLDB Endowment.

Digital Library

[40]

Swaroop Kotni, Ajay Nayak, Vinod Ganapathy, and Arkaprava Basu. 2021. Faastlane: Accelerating Function-as-a-Service Workflows. In USENIX ATC.

[41]

Ashraf Mahgoub, Karthick Shankar, Subrata Mitra, Ana Klimovic, Somali Chaterji, and Saurabh Bagchi. 2021. SONIC: Application-aware Data Passing for Chained Serverless Applications. In USENIX ATC.

[42]

Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning Scheduling Algorithms for Data Processing Clusters. In ACM SIGCOMM.

[43]

Matthew Perron, Raul Castro Fernandez, David DeWitt, and Samuel Madden. 2020. Starling: A Scalable Query Engine on Cloud Functions. In ACM SIGMOD.

[44]

Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, and Ion Stoica. 2015. Low Latency Geo-Distributed Data Analytics. In ACM SIGCOMM.

[45]

Qifan Pu, Shivaram Venkataraman, and Ion Stoica. 2019. Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure. In USENIX NSDI.

[46]

Shixiong Qi, Leslie Monis, Ziteng Zeng, Ian-chin Wang, and K. K. Ramakrishnan. 2022. SPRIGHT: Extracting the Server from Serverless Computing! High-Performance EBPF-Based Event-Driven, Shared-Memory Processing. In ACM SIGCOMM.

Digital Library

[47]

Hao Wang, Di Niu, and Baochun Li. 2019. Distributed Machine Learning with a Serverless Architecture. In IEEE INFOCOM.

[48]

Minchen Yu, Tingjia Cao, Wei Wang, and Ruichuan Chen. 2023. Following the Data, Not the Function: Rethinking Function Orchestration in Serverless Computing. In USENIX NSDI.

[49]

Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In EuroSys.

[50]

Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In USENIX HotCloud Workshop.

[51]

Hong Zhang, Yupeng Tang, Anurag Khandelwal, Jingrong Chen, and Ion Stoica. 2021. Caerus: NIMBLE Task Scheduling for Serverless Analytics. In USENIX NSDI.

Cited By

Parola FQi SNarappa ARamakrishnan KRisso F(2024)SURE: Secure Unikernels Make Serverless Computing Rapid and EfficientProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698558(668-688)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698558
Lertpongrujikorn PNguyen HSalehi M(2024)Streamlining Cloud-Native Application Development and Deployment with Robust EncapsulationProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698552(847-865)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698552
Han JWei XChen RChen H(2024)Seraph: A Performance-Cost Aware Tuner for Training Reinforcement Learning Model on Serverless ComputingProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680479(95-101)Online publication date: 4-Sep-2024
https://dl.acm.org/doi/10.1145/3678015.3680479
Show More Cited By

Index Terms

Ditto: Efficient Serverless Analytics with Elastic Parallelism
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Networks
  1. Network services
    1. Cloud computing

Recommendations

Supporting Multi-Provider Serverless Computing on the Edge
ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

Serverless computing has recently emerged as a new execution model for cloud computing, in which service providers offer compute runtimes, also known as Function-as-a-Service (FaaS) platforms, allowing users to develop, execute and manage application ...
Latency and resource consumption analysis for serverless edge analytics
Abstract
The serverless computing model, implemented by Function as a Service (FaaS) platforms, can offer several advantages for the deployment of data analytics solutions in IoT environments, such as agile and on-demand resource provisioning, automatic ...
Adaptive scheduling with parallelism feedback
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

Multiprocessor scheduling in a shared multiprogramming environment is often structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level task scheduler schedules the work of a job on the allotted ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference

September 2023

1217 pages

ISBN:9798400702365

DOI:10.1145/3603269

Chairs:
Henning Schulzrinne,
Vishal Misra,
Program Chairs:
Eddie Kohler,
David Maltz

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

National Key Research and Development Program of China

Conference

ACM SIGCOMM '23

Sponsor:

SIGCOMM

ACM SIGCOMM '23: ACM SIGCOMM 2023 Conference

September 10, 2023

NY, New York, USA

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
981
Total Downloads

Downloads (Last 12 months)412
Downloads (Last 6 weeks)24

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Parola FQi SNarappa ARamakrishnan KRisso F(2024)SURE: Secure Unikernels Make Serverless Computing Rapid and EfficientProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698558(668-688)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698558
Lertpongrujikorn PNguyen HSalehi M(2024)Streamlining Cloud-Native Application Development and Deployment with Robust EncapsulationProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698552(847-865)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698552
Han JWei XChen RChen H(2024)Seraph: A Performance-Cost Aware Tuner for Training Reinforcement Learning Model on Serverless ComputingProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680479(95-101)Online publication date: 4-Sep-2024
https://dl.acm.org/doi/10.1145/3678015.3680479
Nestorov AMarrón DGutierrez-Torre AWang CMisale CYoussef ACarrera DBerral JSchiavoni VEdinger JCao JJin Z(2024)Dexter: A Performance-Cost Efficient Resource Allocation Manager for Serverless Data AnalyticsProceedings of the 25th International Middleware Conference10.1145/3652892.3700753(117-130)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3652892.3700753
Qi SJin CChowdhury MLiu ZLiu XJin X(2024)Pyxis: Scheduling Mixed Tasks in Disaggregated DatacentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.341862035:9(1536-1550)Online publication date: Sep-2024
https://doi.org/10.1109/TPDS.2024.3418620
Qi SMonis LZeng ZWang IRamakrishnan K(2024)SPRIGHT: High-Performance eBPF-Based Event-Driven, Shared-Memory Processing for Serverless ComputingIEEE/ACM Transactions on Networking10.1109/TNET.2024.336656132:3(2539-2554)Online publication date: Jun-2024
https://doi.org/10.1109/TNET.2024.3366561
Li GTan HZhang XZhang CZhou RHan ZChen G(2024)Online Container Caching with Late-Warm for IoT Data Processing2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00127(1547-1560)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00127
Zhang XGu HLi GHe XTan H(2023)Online Function Caching in Serverless Edge Computing2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00308(2295-2302)Online publication date: 17-Dec-2023
https://doi.org/10.1109/ICPADS60453.2023.00308

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten