Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3373376.3378512acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Catalyzer: Sub-millisecond Startup for Serverless Computing with Initialization-less Booting

Published: 13 March 2020 Publication History

Abstract

Serverless computing promises cost-efficiency and elasticity for high-productive software development. To achieve this, the serverless sandbox system must address two challenges: strong isolation between function instances, and low startup latency to ensure user experience. While strong isolation can be provided by virtualization-based sandboxes, the initialization of sandbox and application causes non-negligible startup overhead. Conventional sandbox systems fall short in low-latency startup due to their application-agnostic nature: they can only reduce the latency of sandbox initialization through hypervisor and guest kernel customization, which is inadequate and does not mitigate the majority of startup overhead.
This paper proposes Catalyzer, a serverless sandbox system design providing both strong isolation and extremely fast function startup. Instead of booting from scratch, Catalyzer restores a virtualization-based function instance from a well-formed checkpoint image and thereby skips the initialization on the critical path (init-less). Catalyzer boosts the restore performance by on-demand recovering both user-level memory state and system state. We also propose a new OS primitive, sfork (sandbox fork), to further reduce the startup latency by directly reusing the state of a running sandbox instance. Fundamentally, Catalyzer removes the initialization cost by reusing state, which enables general optimizations for diverse serverless functions. The evaluation shows that Catalyzer reduces startup latency by orders of magnitude, achieves < 1ms latency in the best case, and significantly reduces the end-to-end latency for real-world workloads. Catalyzer has been adopted by Ant Financial, and we also present lessons learned from industrial development.

References

[1]
[n.d.]. Apache OpenWhisk is a serverless, open source cloud platform. http://openwhisk.apache.org/. Referenced December 2018.
[2]
[n.d.]. AWS Lambda - Serverless Compute. https://aws.amazon.com/ lambda/. Referenced December 2018.
[3]
[n.d.]. Azure Functions Serverless Architecture. https://azure. microsoft.com/en-us/services/functions/. Referenced December 2018.
[4]
[n.d.]. Checkpoint/Restore in gVisor. https://gvisor.dev/docs/user_ guide/checkpoint_restore/. Referenced July 2019.
[5]
[n.d.]. The Docker Containerization Platform. https://www.docker. com/. Referenced December 2018.
[6]
[n.d.]. Firecracker. https://firecracker-microvm.github.io/. Referenced December 2018.
[7]
[n.d.]. Google Cloud Function. https://cloud.google.com/functions/. Referenced December 2018.
[8]
[n.d.]. Google gVisor: Container Runtime Sandbox. https://github. com/google/gvisor. Referenced December 2018.
[9]
[n.d.]. google/novm: Experimental KVM-based VMM for containers, written in Go. https://github.com/google/novm. Referenced Jan 2020.
[10]
[n.d.]. Hyper - Make VM run like Container. https://hypercontainer.io/. Referenced December 2018.
[11]
[n.d.]. Keeping Functions Warm - How To Fix AWS Lambda Cold Start Issue. https://serverless.com/blog/keep-your-lambdas-warm/. Referenced July 2019.
[12]
[n.d.]. OCI Runtime Specification. https://github.com/opencontainers/ runtime-spec. Referenced December 2018.
[13]
[n.d.]. Overlay Filesystem. https://www.kernel.org/doc/ Documentation/filesystems/overlayfs.txt. Referenced July 2019.
[14]
[n.d.]. Overview of memory management | Android Developers. https: //developer.android.com/topic/performance/memory-overview. Referenced December 2018.
[15]
[n.d.]. Pillow: the friendly PIL fork. https://python-pillow.org/. Referenced December 2018.
[16]
[n.d.]. Protocol Buffers Google Developers. https://developers.google. com/protocol-buffers/. Referenced July 2019.
[17]
Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt. 2018. {SAND}: Towards High-Performance Serverless Computing. In 2018 {USENIX} Annual Technical Conference ({USENIX} {ATC} 18). 923--935.
[18]
Nadav Amit and Michael Wei. 2018. The Design and Implementation of Hyperupcalls. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 97--112. https://www. usenix.org/conference/atc18/presentation/amit
[19]
Adam Belay, Andrea Bittau, Ali José Mashtizadeh, David Terei, David Mazières, and Christos Kozyrakis. 2012. Dune: Safe User-level Access to Privileged CPU Features. In Osdi, Vol. 12. 335--348.
[20]
Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In USENIX Annual Technical Conference, FREENIX Track, Vol. 41. 46.
[21]
Sol Boucher, Anuj Kalia, David G. Andersen, and Michael Kaminsky. 2018. Putting the "Micro" Back in Microservice. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 645--650. https://www.usenix.org/conference/atc18/presentation/ boucher
[22]
Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, et al. 2019. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 3--18.
[23]
Abel Gordon, Nadav Amit, Nadav Har'El, Muli Ben-Yehuda, Alex Landau, Assaf Schuster, and Dan Tsafrir. 2012. ELI: Bare-metal Performance for I/O Virtualization. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York, NY, USA, 411--422. https://doi.org/10.1145/2150976.2151020
[24]
Joseph M Hellerstein, Jose Faleiro, Joseph E Gonzalez, Johann Schleier- Smith, Vikram Sreekanti, Alexey Tumanov, and Chenggang Wu. 2018. Serverless Computing: One Step Forward, Two Steps Back. arXiv preprint arXiv:1812.03651 (2018).
[25]
Sanidhya Kashyap, Changwoo Min, and Taesoo Kim. 2018. Scaling Guest OS Critical Sections with eCS. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 159-- 172. https://www.usenix.org/conference/atc18/presentation/kashyap
[26]
Avi Kivity, Dor Laor Glauber Costa, and Pekka Enberg. 2014. OS v-Optimizing the Operating System for Virtual Machines. In Proceedings of USENIX ATC'14: 2014 USENIX Annual Technical Conference. 61.
[27]
Ana Klimovic, Yawen Wang, Christos Kozyrakis, Patrick Stuedi, Jonas Pfefferle, and Animesh Trivedi. 2018. Understanding Ephemeral Storage for Serverless Analytics. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 789-- 794. https://www.usenix.org/conference/atc18/presentation/klimovicserverless
[28]
Ana Klimovic, Yawen Wang, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, and Christos Kozyrakis. 2018. Pocket: Elastic Ephemeral Storage for Serverless Analytics. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 427--444. https://www.usenix.org/conference/osdi18/ presentation/klimovic
[29]
Yossi Kuperman, Eyal Moscovici, Joel Nider, Razya Ladelsky, Abel Gordon, and Dan Tsafrir. 2016. Paravirtual Remote I/O. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 49--65. https://doi.org/10.1145/2872362.2872378
[30]
Horacio Andrés Lagar-Cavilla, Joseph AndrewWhitney, Adin Matthew Scannell, Philip Patchin, Stephen M Rumble, Eyal De Lara, Michael Brudno, and Mahadev Satyanarayanan. 2009. SnowFlock: rapid virtual machine cloning for cloud computing. In Proceedings of the 4th ACM European conference on Computer systems. ACM, 1--12.
[31]
David Lion, Adrian Chiu, Hailong Sun, Xin Zhuang, Nikola Grcevski, and Ding Yuan. 2016. Don't Get Caught in the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data- Parallel Systems. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 383--400. https://www.usenix.org/conference/osdi16/technicalsessions/ presentation/lion
[32]
Ming Liu, Simon Peter, Arvind Krishnamurthy, and Phitchaya Mangpo Phothilimthana. 2019. E3: Energy-Efficient Microservices on SmartNIC-Accelerated Servers. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 363--378. https://www.usenix.org/conference/atc19/presentation/liu-ming
[33]
Kangjie Lu, Wenke Lee, Stefan Nürnberger, and Michael Backes. 2016. How to Make ASLR Win the CloneWars: Runtime Re-Randomization. In NDSS.
[34]
Kangjie Lu, Chengyu Song, Byoungyoung Lee, Simon P Chung, Taesoo Kim, and Wenke Lee. 2015. ASLR-Guard: Stopping address space leakage for code reuse attacks. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. ACM, 280--291.
[35]
Anil Madhavapeddy, Thomas Leonard, Magnus Skjegstad, Thomas Gazagnaire, David Sheets, David J Scott, Richard Mortier, Amir Chaudhry, Balraj Singh, Jon Ludlam, et al. 2015. Jitsu: Just-In-Time Summoning of Unikernels. In NSDI. 559--573.
[36]
Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David Scott, Balraj Singh, Thomas Gazagnaire, Steven Smith, Steven Hand, and Jon Crowcroft. 2013. Unikernels: Library operating systems for the cloud. In Acm Sigplan Notices, Vol. 48. ACM, 461--472.
[37]
Filipe Manco, Costin Lupu, Florian Schmidt, Jose Mendes, Simon Kuenzer, Sumit Sati, Kenichi Yasukata, Costin Raiciu, and Felipe Huici. 2017. My VM is Lighter (and Safer) than your Container. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM, 218--233.
[38]
Garrett McGrath, Jared Short, Stephen Ennis, Brenden Judson, and Paul Brenner. 2016. Cloud event programming paradigms: Applications and analysis. In 2016 IEEE 9th International Conference on Cloud Computing (CLOUD). IEEE, 400--406.
[39]
Edward Oakes, Leon Yang, Kevin Houck, Tyler Harter, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. 2017. Pipsqueak: Lean Lambdas with large libraries. In 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW). IEEE, 395--400.
[40]
Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Harter, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. {SOCK}: Rapid Task Provisioning with Serverless-Optimized Containers. In 2018 {USENIX} Annual Technical Conference ({USENIX} {ATC} 18).
[41]
Zhiming Shen, Zhen Sun, Gur-Eyal Sela, Eugene Bagdasaryan, Christina Delimitrou, Robbert Van Renesse, and HakimWeatherspoon. 2019. X-containers: Breaking down barriers to improve performance and isolation of cloud-native containers. In Proceedings of the Twenty- Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 121--135.
[42]
Michael Vrable, Justin Ma, Jay Chen, David Moore, Erik Vandekieft, Alex C Snoeren, Geoffrey M Voelker, and Stefan Savage. 2005. Scalability, fidelity, and containment in the potemkin virtual honeyfarm. In ACM SIGOPS Operating Systems Review, Vol. 39. ACM, 148--162.
[43]
Kai-Ting Amy Wang, Rayson Ho, and Peng Wu. 2019. Replayable Execution Optimized for Page Sharing for a Managed Runtime Environment. In Proceedings of the Fourteenth EuroSys Conference 2019. ACM, 39.
[44]
Liang Wang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and Michael Swift. 2018. Peeking behind the curtains of serverless platforms. In 2018 {USENIX} Annual Technical Conference ({USENIX} {ATC} 18). 133--146.
[45]
Liang Zhang, James Litton, Frank Cangialosi, Theophilus Benson, Dave Levin, and Alan Mislove. 2016. Picocenter: Supporting Long-lived, Mostly-idle Applications in Cloud Environments. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys '16). ACM, New York, NY, USA, Article 37, 16 pages. https://doi.org/10. 1145/2901318.2901345

Cited By

View all
  • (2024)Serverless computing based on dynamic-addressable sessionSCIENTIA SINICA Informationis10.1360/SSI-2023-015554:3(582)Online publication date: 11-Mar-2024
  • (2024)Cold Start Latency in Serverless Computing: A Systematic Review, Taxonomy, and Future DirectionsACM Computing Surveys10.1145/370087557:3(1-36)Online publication date: 11-Nov-2024
  • (2024)Diminishing cold starts in serverless computing with approximation algorithmsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673118(327-336)Online publication date: 12-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
March 2020
1412 pages
ISBN:9781450371025
DOI:10.1145/3373376
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. checkpoint and restore
  2. operating system
  3. serverless computing
  4. startup latency

Qualifiers

  • Research-article

Funding Sources

  • HighTech Support Program from Shanghai Committee of Science and Technology
  • Program of Shanghai Academic/Technology Research Leader
  • National Natural Science Foundation of China
  • National Key Research & Development Program

Conference

ASPLOS '20

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)583
  • Downloads (Last 6 weeks)80
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Serverless computing based on dynamic-addressable sessionSCIENTIA SINICA Informationis10.1360/SSI-2023-015554:3(582)Online publication date: 11-Mar-2024
  • (2024)Cold Start Latency in Serverless Computing: A Systematic Review, Taxonomy, and Future DirectionsACM Computing Surveys10.1145/370087557:3(1-36)Online publication date: 11-Nov-2024
  • (2024)Diminishing cold starts in serverless computing with approximation algorithmsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673118(327-336)Online publication date: 12-Aug-2024
  • (2024)Understanding Network Startup for Secure Containers in Multi-Tenant Clouds: Performance, Bottleneck and OptimizationProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3688436(635-650)Online publication date: 4-Nov-2024
  • (2024)StarShip: Mitigating I/O Bottlenecks in Serverless Computing for Scientific WorkflowsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390288:1(1-29)Online publication date: 21-Feb-2024
  • (2024)Characterization and Reclamation of Frozen Garbage in Managed FaaS WorkloadsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629579(281-297)Online publication date: 22-Apr-2024
  • (2024)Serialization/Deserialization-free State Transfer in Serverless WorkflowsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629568(132-147)Online publication date: 22-Apr-2024
  • (2024)Optimus: Warming Serverless ML Inference via Inter-Function Model TransformationProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629567(1039-1053)Online publication date: 22-Apr-2024
  • (2024)Pronghorn: Effective Checkpoint Orchestration for Serverless Hot-StartsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629556(298-316)Online publication date: 22-Apr-2024
  • (2024)PISeL: Pipelining DNN Inference for Serverless ComputingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679824(1951-1960)Online publication date: 21-Oct-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media