review-article

Machine learning inference serving models in serverless computing: a survey

Authors:

Mostafa Ghobaei-AraniAuthors Info & Claims

Computing, Volume 107, Issue 1

https://doi.org/10.1007/s00607-024-01377-9

Published: 07 January 2025 Publication History

Abstract

Serverless computing has attracted many researchers with features such as scalability and optimization of operating costs, no need to manage infrastructures, and build programs at a higher speed. Serverless computing can be used for real-time machine learning (ML) prediction using serverless inference functions. Deploying an ML serverless inference function involves building a compute resource, deploying an ML model, network infrastructure, and permissions to call the inference function. However, the subject of machine learning inference (MLI) has challenges such as resource management, delay and response time, large and complex models, and security and privacy, not many studies have been conducted in this field. This comprehensive literature review article examines the recent developments in MLI in serverless computing environments. The mechanisms presented in the taxonomy can be summarized in four categories: service level objective SLO-aware, acceleration-aware, framework-aware, and latency-aware. In each category, different methods and algorithms used to optimize inference in serverless environments have been examined along with their advantages and disadvantages. We show that acceleration-aware methods focus on the optimal use of computing resources, and framework-aware methods play an important role in improving system efficiency and scalability by examining different frameworks for inference in serverless environments. Also, SLO-aware and Latency-aware methods, considering time limits and service level agreement, help provide quality and reliable inference in serverless environments. Finally, this article presents a vision of future challenges and opportunities in this field and provides solutions for future research in the field of MLI in serverless.

References

[1]

Hassan HB, Barakat SA, and Sarhan QI Survey on serverless computing J Cloud Comput Adv Syst Appl 2021 10 1-29

Digital Library

[2]

Yu T, Liu Q, Du D, Xia Y, Zang B, Lu Z, Yang P, Qin C, Chen H (2020) Characterizing serverless platforms with serverlessbench. In: Proceedings of the 11th ACM symposium on cloud computing

[3]

Lin C and Khazaei H Modeling and optimization of performance and cost of serverless applications IEEE Trans Parallel Distrib Syst 2020 32 3 615-632

[4]

Ali H (2023) Serverless ML: building and deploying AI models in the cloud.

[5]

Brooker M, Danilov M, Greenwood C (2023) On-demand container loading in AWS Lambda. In: USENIX annual technical conference (USENIX ATC 23), pp 315–328

[6]

Manner J, Endreß M, Bohm S, Wirtz G (2021) Optimizing cloud function configuration via local simulations. In: 2021 IEEE 14th international conference on cloud computing (CLOUD), pp 168–178

[7]

Bertram J, Krüger T, Röhling HM, Jelusic A, Mansow-Model S, Schniepp R, Wuehr M, and Otte K Accuracy and repeatability of the Microsoft Azure Kinect for clinical measurement of motor function PLoS ONE 2023 18 1 e0279697

[8]

Barrak A, Petrillo F, and Jaafar F Serverless on ML: a systematic mapping study IEEE Access 2022 10 99337-99352

[9]

Han A, Yang Q, Chen Y, Li J (2024) Failure-distribution-dependent H_∞ fuzzy fault-tolerant control for nonlinear multilateral teleoperation system with communication delays. Electronics 13(17):3454.

[10]

Nguyen HT, Usman M, and Buyya R QFaaS: a serverless function-as-a-service framework for quantum computing Future Gener Comput Syst 2024 154 281

Digital Library

[11]

Yue S, Xu N, Zhang L, Zhao N (2024) Observer-based event-triggered adaptive fuzzy hierarchical sliding mode fault-tolerant control for uncertain under-actuated nonlinear systems. Int J Fuzzy Syst.

[12]

Zhu B, Liang H, Niu B, Wang H, Zhao N, Zhao X (2025) Observer-based reinforcement learning for optimal fault-tolerant consensus control of nonlinear multi-agent systems via a dynamic event-triggered mechanism. Inform Sci 689:121350

[13]

Cai J, Guo D, Wang W (2024) Adaptive fault-tolerant control of uncertain systems with unknown actuator failures and input delay. Meas Control.

[14]

Christidis A, Moschoyiannis S, Hsu CH, and Davies R Enabling serverless deployment of large-scale AI workloads IEEE Access 2020 8 70150-70161

[15]

Li X, Leng X, and Chen Y Securing serverless computing: challenges, solutions, and opportunities IEEE Netw 2023 37 2 166-173

Digital Library

[16]

Lenarduzzi V and Panichella A Serverless testing: tool vendors’ and experts’ point of view IEEE Softw 2021 38 1 54-60

Digital Library

[17]

Mo D, Cordingly R, Chinn D, Lloyd W (2023) Addressing serverless computing vendor lock-in through cloud service abstraction. In: 2023 IEEE International conference on cloud computing technology and science (CloudCom), IEEE, pp 193–199

[18]

Ahuja A, Jain V (2021) Challenges and opportunities for Unikernels in MLI. In: 9th International conference on reliability, infocom technologies and optimization (Trends and Future Directions) (ICRITO)

[19]

Shuvo MMH, Islam SK, Cheng J, Morshed BI (2022) Efficient acceleration of deep learning inference on resource-constrained edge devices: a review. Proc IEEE 111(1):42–91.

[20]

Zhang H, Zou Q, Ju Y, Song C, Chen D (2022) Distance-based Support Vector Machine to Predict DNA N6-methyladine Modification. Curr Bioinform 17(5):473–482

[21]

Wu X, Ding S, Wang H, Xu N, Zhao X, Wang W (2025) Dual-channel event-triggered prescribed performance adaptive fuzzy time-varying formation tracking control for nonlinear multi-agent systems. Fuzzy Sets Syst 498:109140

[22]

Zhang X, Chen C, Xie Y, Chen X, Zhang J, Xiang Y (2023) A survey on privacy inference attacks and defenses in cloud-based deep neural network. Comput Stand Interfaces 83:103672.

Digital Library

[23]

Jambi S (2022) Serverless ML platform: a case for real-time crisis detection over social media. In: 2nd International conference on computer science, engineering and applications (ICCSEA), 9

[24]

Kojs K (2023) A survey of serverless ML model inference. arXiv preprint arXiv, p 13

[25]

Jiang J, Gan S, Bo D, Alonso G, Klimovic A, Singla A, Wentao W, Wang S, and Zhang C A systematic evaluation of machine learning on serverless infrastructure VLDB J 2023 33 2 425-449

Digital Library

[26]

Alajlan NN and Ibrahim DM TinyML: enabling of inference deep learning models on ultra-low-power IoT edge devices for AI applications Micromachines 2022

[27]

Jarachanthan J, Chen L, Xu F, Li B ( "AMPS-Inf: Automatic Model Partitioning for Serverless Inference with Cost Efficiency," Association for Computing Machinery, 2021.

[28]

Ali A, Pinciroli R, Yan F, Smirni E (2020) BATCH: MLI Serving on serverless platforms with adaptive batching. In: International conference for high performance computing, networking, storage and analysis

[29]

Yu M, Jiang Z, Chun Ng H, Wang W, Chen R, Li B (2021) Gillis: serving large neural networks in serverless functions with automatic model partitioning. In: IEEE 41st international conference on distributed computing systems (ICDCS)

[30]

Cho Y, Kim Y, Rhu M (2021) LazyBatching: an SLA-aware batching system for cloud MLI. In: IEEE International symposium on high-performance computer architecture (HPCA)

[31]

Cho J, Zad Tootaghaj D, Cao L, Sharma P (2022) SLA-Driven MLI framework for clouds with heterogeneous accelerators. In: Proceedings of the 5MLSys conference

[32]

Ali A, Pinciroli R, Yan F, Smirni E (2022) Optimizing inference serving on serverless platforms. In: Proceedings of the VLDB Endowment

[33]

Li B, Basu Roy R, Patel T (2021) Ribbon: cost-effective and QoS-aware deep learning model inference using a diverse pool of cloud computing instances. Association for Computing Machinery

[34]

Seo W, Cha S, Kim Y, Huh J, and Park J SLO-aware inference scheduler for heterogeneous processors in edge platforms ACM Trans Archit Code Opt 2021 18 26

[35]

Gunasekaran JR, Thinakaran P, Mishra CS, Kandemir MT, Das CR (2020) Towards designing a self-managed MLI serving system in public cloud. arXiv preprint arXiv

[36]

Zhang C, Yu M, Wang W, and Yan F Enabling cost-effective, SLO-aware MLI serving on public cloud IEEE Trans Cloud Comput 2022 10 3 1765-1779

[37]

Yu M, Li Z, Wang W, Chen R, Nie D, Yang H (2024) FaaSwap: SLO-aware, GPU-efficient serverless inference via model swapping. arXiv preprint arXiv

[38]

Mahmoudi N, Khazaei H (2022) MLProxy: SLA-aware reverse proxy for MLI serving on serverless computing platforms. Distributed, Parallel, and Cluster Computing (cs.DC)

[39]

Naranjo DM, Risco S, Moltó G, Blanquer I (2021) A serverless gateway for event-driven MLI in multiple clouds. wiley

[40]

Duarte J, Harris P, Hauck S, Holzman B, Hsu SC, Jindariani S, Khan S, Kreis B, et al. FPGA-accelerated machine learning inference as a service for particle physics computing Comput Softw Big Sci 2019

[41]

Wang M, Yang T, Flechas MA, Harris P, Hawks B, Holzman B, Knoepfel K, Krupa J, et al. GPU-accelerated machine learning inference as a service for computing in neutrino experiments Front Big Data 2021

[42]

Zhao M, Jha K, Hong S (2023) GPU-enabled Function-as-a-Service for MLI. In: IEEE International parallel and distributed processing symposium (IPDPS)

[43]

Jain V, Giraldo S, De Roose J, Mei L, Boons B, and Verhelst M TinyVers: a tiny versatile system-on-chip with state-retentive eMRAM for ML inference at the extreme edge IEEE J Solid-State Circuits 2023 58 8 2360-2371

[44]

Gu J, Zhu Y, Wang P, Chadha M, Gerndt M (2023) FaST-GShare: enabling efficient spatio-temporal GPU sharing in serverless computing for deep learning inference

[45]

Verma V, Tracy T, and Stan MR EXTREM-EDGE—extensions To RISC-V for energy-efficient ML inference at the EDGE of IoT Sustain Comput: Inform Syst 2022 35 100742

[46]

Boroumand A, Ghose S, Akin B, Narayanaswami R (2021) Mitigating edge MLI bottlenecks: an empirical study on accelerating google edge models. arXiv preprint arXiv

[47]

Kodandarama MRD, Shaikh MD, Patnaik S (2019) Serfer: serverless inference of ML models

[48]

Barrak A, Trabelsi R, Jaafar F, Petrillo F (2023) Exploring the impact of serverless computing on peer to peer training ML. In: IEEE International conference on cloud engineering (IC2E)

[49]

Kotsehub N, Baughman M, Chard R, Hudson N, Patros P, Rana O, Foster I, Chard K (2022) FLoX: federated learning with FaaS at the Edge. In IEEE 18th international conference on e-science (e-Science)

[50]

Liu X, Zheng Y, Yuan X, and Yi X Securely outsourcing neural network inference to the cloud with lightweight techniques IEEE Trans Dependable Secure Comput 2023 20 1 620-636

[51]

Cox C, Sun D, Tarn Animesh Singh E, Kelkar R, Goodwin D (2020) Serverless inferencing on Kubernetes. arXiv preprint arXiv

[52]

Ali A, Zawad S, Aditya P, Ekin Akkus I, Chen R, Yan F (2022) A serverless framework for scalable and adaptive ML design and training. arXiv preprint arXiv

[53]

Barrak A, Jaziri M, Trabelsi R, Jaafar F, Petrillo F (2023) SPIRT: A fault-tolerant and reliable peer-to-peer serverless ML training architecture. In: IEEE 23rd international conference on software quality, reliability, and security (QRS)

[54]

Li J, Zhao L, Yang Y (2022) Tetris: memory-efficient serverless inference through tensor sharing. In: USENIX annual technical conference

[55]

Kim MH, Lee J, Yu H, Lee E (2023) Improving memory utilization by sharing DNN models for serverless inference. In IEEE international conference on consumer electronics (ICCE)

[56]

Ogden SS, Guo T (2023) LAYERCAKE: efficient inference serving with cloud and mobile resources. In IEEE/ACM 23rd international symposium on cluster, cloud and internet computing (CCGrid)

[57]

Torres DR, Martín C, Rubio B, and Díaz M An open source framework based on Kafka-ML for distributed DNN inference over the cloud-to-things continuum J Syst Archit 2021 118 102214

[58]

Li B, Samsi S (2022) Building heterogeneous cloud system for MLI. arXiv preprint arXiv

[59]

Elzohairy M, Chadha M, Jindal A, Grafberger A, Gu J (2022) FedLesScan: mitigating stragglers in serverless federated learning. In IEEE international conference on big data (Big Data)

[60]

Benson Guo R, Guo V, Kim A, Hildred J, Daudjee K (2022) Hydrozoa: dynamic hybrid-parallel dnn training on serverless containers. In: Proceedings of the MLSys conference

[61]

Wang J, He D, Castiglione A, Gupta BB, Karuppiah M, and Libing W PCNN_CEC: efficient and privacy-preserving convolutional neural network inference based on cloud-edge-client collaboration IEEE Trans Netw Sci Eng 2023

[62]

Patros P, Ooi M, Huang V, Mayo M, Anderson C, Burroughs S, Baughman M, Almurshed O, et al. Rural AI: serverless-powered federated learning for remote applications IEEE Internet Comput 2023

Digital Library

[63]

Laskaridis S, Venieris SI, Almeida M, Leontiadis I, Lane ND (2020) SPINN: synergistic progressive inference of neural networks over device and cloud. In: Proceedings of the 26th annual international conference on mobile computing and networking

[64]

Ravi BS, Madhukanthi C, Sivasankar P, Prasanna JD (2023) A Novel Technique to Improve Latency and Response Time of AI Models using Serverless Infrastructure. In: Proceedings of the International Conference on Inventive Computation Technologies (ICICT 2023)

[65]

Naveen S, Kounte MR, and Ahmed MR Low latency deep learning inference model for distributed intelligent IoT edge clusters IEEE Access 2021

[66]

Sreekanti V, Subbaraj H, Wu C, Gonzalez JE (2020) Optimizing prediction serving on low-latency serverless dataflow. arXiv preprint arXiv

[67]

Yang Y, Zhao L, Li Y (2022) INFless: a native serverless system for low-latency, high-throughput inference. In: Proceedings of the 27th ACM international conference on architectural support for programming languages and operating systems

[68]

Dakkak A, Li C, Garcia de Gonzalo S, Xiong J, W Hwu (2019) TrIMS: transparent and isolated model sharing for low latency deep learning inference in function-as-a-service. In IEEE 12th International conference on cloud computing (CLOUD)

[69]

Yang G, Liu J, Qu M, Wang S, Ye D, Zhong H (2021) Faasrs: remote sensing image processing system on serverless platform. In: 2021 IEEE 45th annual computers, software, and applications conference (COMPSAC), IEEE, pp 258–267

[70]

Jefferson S, Chelliah P, and Surianarayanan C A Resource-optimized and accelerated sentiment analysis method using serverless computing Proc Comput Sci 2022 215 33-43

Digital Library

[71]

Wei F, Niu B, Zong G, Zhao X (2024) Adaptive neural self-triggered bipartite consensus control for nonlinear fractional-order multi-agent systems with actuator fault. Nonlinear Dyn.

Index Terms

Index terms have been assigned to the content through auto-classification.

Recommendations

A survey on the cold start latency approaches in serverless computing: an optimization-based perspective
Abstract
Serverless computing is one of the latest technologies that has received much attention from researchers and companies in recent years since it provides dynamic scalability and a clear economic model. Serverless computing enables users to pay only ...
Supporting Multi-Provider Serverless Computing on the Edge
ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

Serverless computing has recently emerged as a new execution model for cloud computing, in which service providers offer compute runtimes, also known as Function-as-a-Service (FaaS) platforms, allowing users to develop, execute and manage application ...
Serverless Computing Revisited: Evolution, State-of-the-Art, and Performance Challenges
ICPE '23 Companion: Companion of the 2023 ACM/SPEC International Conference on Performance Engineering

Market analysts are agreed that serverless computing has strong market potential, with projected compound annual growth rates varying between 21% and 28% through 2028 and a projected market value of 36.8 billion by that time. Although serverless ...

Comments

Information & Contributors

Information

Published In

cover image Computing

Computing Volume 107, Issue 1

Jan 2025

1593 pages

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2025.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 07 January 2025

Accepted: 22 November 2024

Received: 12 May 2024

Author Tags

Author Tags

Author Tag

Qualifiers

Review-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents