Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3698038.3698513acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Snapipeline: Accelerating Snapshot Startup for FaaS Containers

Published: 20 November 2024 Publication History

Abstract

Due to the frequent starts and stops of numerous services in FaaS (Function as a Service), reducing cold start overhead is a core issue in improving the performance of container-based FaaS services. Snapshot and restore-based mechanisms effectively reduce the cold start time of containers by transforming container initialization overhead into restoration overhead. Consequently, this mechanism has become a research hotspot in accelerating the cold start of FaaS containers. Researchers introduce snapshot compression and decompress the snapshots to reduce the storage cost before starting instances. However, existing works have the following shortcomings: (1) File-mapped memory pages are not processed during snapshot compression, resulting in a significant amount of redundant data in memory; (2) The serial execution of snapshot decompression and instance restoration leads to high instance startup latency.
To address these shortcomings, we propose the Snapipeline mechanism, which implements the following optimizations: (1) In the snapshot compression phase, it extends deduplication to file-mapped memory; (2) In the restoration phase, it leverages the hot and cold distinction in FaaS application memory to prioritize the memory pages restoration, pipelining snapshot decompression, memory pages restoration, and function execution. This mechanism hides the expensive snapshot decompression latency behind the instance restoration and function execution latency and removes the snapshot decompression from the critical path of instance startup. Evaluation on real-world FaaS application datasets shows that Snapipeline reduces memory usage by up to 28% and decreases end-to-end latency by an average of 53%, compared to the baseline.

References

[1]
2024. CRIU. https://criu.org/Main_Page
[2]
2024. Custom LLMs. https://deepinfra.com/docs/advanced/custom_llms
[3]
2024. Issue: CRIU lazy restore tends to restore all pages rather than those pages that really touched. https://github.com/checkpoint-restore/criu/issues/2399
[4]
2024. MinIO. https://min.io/
[5]
2024. Overlay Filesystem --- The Linux Kernel documentation. https://docs.kernel.org/filesystems/overlayfs.html
[6]
2024. Page Cache --- The Linux Kernel documentation. https://www.kernel.org/doc/html/next/mm/page_cache.html
[7]
2024. Pip - The package installer for Python. https://pypi.org/project/pip/
[8]
2024. Position Independent Code blobs in CRIU. https://criu.org/Code_blobs
[9]
2024. TensorFlow. https://www.tensorflow.org/
[10]
2024. Together AI Products - Serverless endpoints for leading open-source models. https://www.together.ai/products#inference
[11]
2024. userfaultfd - Linux manual page. https://man7.org/linux/man-pages/man2/userfaultfd.2.html
[12]
Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. 2020. Firecracker: Lightweight virtualization for serverless applications. In 17th USENIX symposium on networked systems design and implementation (NSDI 20). 419--434.
[13]
Amazon. 2024. Amazon Web Services (AWS). https://aws.amazon.com
[14]
Amazon. 2024. Improving startup performance with Lambda Snap-Start - AWS Lambda. https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html
[15]
Lixiang Ao, George Porter, and Geoffrey M. Voelker. 2022. FaaSnap: FaaS made fast using snapshot-based VMs. In Proceedings of the Seventeenth European Conference on Computer Systems (Rennes, France) (EuroSys '22). Association for Computing Machinery, New York, NY, USA, 730--746. https://doi.org/10.1145/3492321.3524270
[16]
Containerd authors. 2024. Containerd - An industry-standard container runtime with an emphasis on simplicity, robustness and portability. https://containerd.io/
[17]
Firecracker Authors. 2024. Firecracker Rootfs and Kernel Setup. https://github.com/firecracker-microvm/firecracker/blob/main/docs/rootfs-and-kernel-setup.md
[18]
Firecracker Authors. 2024. Firecracker User Page Faults supports. https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/handling-page-faults-on-snapshot-resume.md
[19]
Edward Chuah, Arshad Jhumka, and Sai Narasimhamurthy. 2023. An empirical study of major page faults for failure diagnosis in cluster systems. The Journal of Supercomputing 79, 16 (2023), 18445--18479.
[20]
Dong Du, Tianyi Yu, Yubin Xia, Binyu Zang, Guanglu Yan, Chenggang Qin, Qixuan Wu, and Haibo Chen. 2020. Catalyzer: Sub-millisecond Startup for Serverless Computing with Initialization-less Booting. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 467--481. https://doi.org/10.1145/3373376.3378512
[21]
Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, and Luo Mai. 2024. ServerlessLLM: Low-Latency Serverless Inference for Large Language Models. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). USENIX Association, Santa Clara, CA, 135--153. https://www.usenix.org/conference/osdi24/presentation/fu
[22]
Philipp Gackstatter, Pantelis A Frangoudis, and Schahram Dustdar. 2022. Pushing serverless to the edge with webassembly runtimes. In 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid). IEEE, 140--149.
[23]
Google. 2024. Google Cloud. https://cloud.google.com
[24]
Google. 2024. Protocol Buffers. https://protobuf.dev/
[25]
Intel. 2024. Intel® In-Memory Analytics Accelerator (Intel® IAA). https://www.intel.com/content/www/us/en/content-details/780887/intel-in-memory-analytics-accelerator-intel-iaa.html
[26]
Jeongchul Kim and Kyungyong Lee. 2019. FunctionBench: A Suite of Workloads for Serverless Cloud Function Service. In 2019 IEEE 12th International Conference on Cloud Computing (CLOUD). 502--504. https://doi.org/10.1109/CLOUD.2019.00091
[27]
Jörn Kuhlenkamp, Sebastian Werner, Maria C. Borges, Karim El Tal, and Stefan Tai. 2019. An Evaluation of FaaS Platforms as a Foundation for Serverless Big Data Processing. In Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing (Auckland, New Zealand) (UCC'19). Association for Computing Machinery, New York, NY, USA, 1--9. https://doi.org/10.1145/3344341.3368796
[28]
Nikita Lazarev, Varun Gohil, James Tsai, Andy Anderson, Bhushan Chitlur, Zhiru Zhang, and Christina Delimitrou. 2024. Sabre: Hardware-Accelerated Snapshot Compression for Serverless MicroVMs. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). USENIX Association, Santa Clara, CA, 1--18. https://www.usenix.org/conference/osdi24/presentation/lazarev
[29]
Joshua MacDonald. 2024. Xdelta: open-source binary diff, delta/differential compression tools, VCDIFF/RFC 3284 delta compression. https://github.com/jmacd/xdelta
[30]
Abhisek Panda and Smruti R. Sarangi. 2023. SnapStore: A Snapshot Storage System for Serverless Systems. In Proceedings of the 24th International Middleware Conference (Bologna, Italy) (Middleware '23). Association for Computing Machinery, New York, NY, USA, 261--274. https://doi.org/10.1145/3590140.3629120
[31]
Tobias Pfandzelter and David Bermbach. 2020. tinyFaaS: A Lightweight FaaS Platform for Edge Environments. In 2020 IEEE International Conference on Fog Computing (ICFC). 17--24. https://doi.org/10.1109/ICFC49376.2020.00011
[32]
Philipp Raith, Stefan Nastic, and Schahram Dustdar. 2023. Serverless edge computing---where we are and what lies ahead. IEEE Internet Computing 27, 3 (2023), 50--64.
[33]
Alireza Sahraei, Soteris Demetriou, Amirali Sobhgol, Haoran Zhang, Abhigna Nagaraja, Neeraj Pathak, Girish Joshi, Carla Souza, Bo Huang, Wyatt Cook, Andrii Golovei, Pradeep Venkat, Andrew Mcfague, Dimitrios Skarlatos, Vipul Patel, Ravinder Thind, Ernesto Gonzalez, Yun Jin, and Chunqiang Tang. 2023. XFaaS: Hyperscale and Low Cost Serverless Functions at Meta. In Proceedings of the 29th Symposium on Operating Systems Principles (Koblenz, Germany) (SOSP '23). Association for Computing Machinery, New York, NY, USA, 231--246. https://doi.org/10.1145/3600006.3613155
[34]
Divyanshu Saxena, Tao Ji, Arjun Singhvi, Junaid Khalid, and Aditya Akella. 2022. Memory deduplication for serverless computing with Medes. In Proceedings of the Seventeenth European Conference on Computer Systems (Rennes, France) (EuroSys '22). Association for Computing Machinery, New York, NY, USA, 714--729. https://doi.org/10.1145/3492321.3524272
[35]
Mohammad Shahrad, Rodrigo Fonseca, Inigo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and Ricardo Bianchini. 2020. Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 205--218. https://www.usenix.org/conference/atc20/presentation/shahrad
[36]
Philip Shilane, Grant Wallace, Mark Huang, and Windsor Hsu. 2012. Delta compressed and deduplicated storage using stream-informed locality. In Proceedings of the 4th USENIX Conference on Hot Topics in Storage and File Systems (Boston, MA) (HotStorage'12). USENIX Association, USA, 10.
[37]
Paulo Silva, Daniel Fireman, and Thiago Emmanuel Pereira. 2020. Prebaking Functions to Warm the Serverless Cold Start. In Proceedings of the 21st International Middleware Conference (Delft, Netherlands) (Middleware '20). Association for Computing Machinery, New York, NY, USA, 1--13. https://doi.org/10.1145/3423211.3425682
[38]
Dmitrii Ustiugov, Plamen Petrov, Marios Kogias, Edouard Bugnion, and Boris Grot. 2021. Benchmarking, analysis, and optimization of serverless function snapshots. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Virtual, USA) (ASPLOS '21). Association for Computing Machinery, New York, NY, USA, 559--572. https://doi.org/10.1145/3445814.3446714
[39]
Kai-Ting Amy Wang, Rayson Ho, and Peng Wu. 2019. Replayable Execution Optimized for Page Sharing for a Managed Runtime Environment. In Proceedings of the Fourteenth EuroSys Conference 2019 (Dresden, Germany) (EuroSys '19). Association for Computing Machinery, New York, NY, USA, Article 39, 16 pages. https://doi.org/10.1145/3302424.3303978
[40]
Zichuan Xu, Yuexin Fu, Qiufen Xia, and Hao Li. 2023. Enabling Age-Aware Big Data Analytics in Serverless Edge Clouds. In IEEE INFOCOM 2023 - IEEE Conference on Computer Communications. 1--10. https://doi.org/10.1109/INFOCOM53939.2023.10228905
[41]
Peng Zhang. 2008. Chapter 6 - Data Communications in Distributed Control System. In Industrial Control Technology, Peng Zhang (Ed.). William Andrew Publishing, Norwich, NY, 675--774. https://doi.org/10.1016/B978-081551571-5.50007-4

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud Computing
November 2024
1062 pages
ISBN:9798400712869
DOI:10.1145/3698038
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 November 2024

Check for updates

Author Tags

  1. Cloud Computing
  2. Container
  3. Serverless
  4. Snapshot

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

SoCC '24
Sponsor:
SoCC '24: ACM Symposium on Cloud Computing
November 20 - 22, 2024
WA, Redmond, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 197
    Total Downloads
  • Downloads (Last 12 months)197
  • Downloads (Last 6 weeks)122
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media