Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Optimizing inference serving on serverless platforms

Published: 01 June 2022 Publication History

Abstract

Serverless computing is gaining popularity for machine learning (ML) serving workload due to its autonomous resource scaling, easy to use and pay-per-use cost model. Existing serverless platforms work well for image-based ML inference, where requests are homogeneous in service demands. That said, recent advances in natural language processing could not fully benefit from existing serverless platforms as their requests are intrinsically heterogeneous.
Batching requests for processing can significantly increase ML serving efficiency while reducing monetary cost, thanks to the pay-per-use pricing model adopted by serverless platforms. Yet, batching heterogeneous ML requests leads to additional computation overhead as small requests need to be "padded" to the same size as large requests within the same batch. Reaching effective batching decisions (i.e., which requests should be batched together and why) is non-trivial: the padding overhead coupled with the serverless auto-scaling forms a complex optimization problem.
To address this, we develop Multi-Buffer Serving (MBS), a framework that optimizes the batching of heterogeneous ML inference serving requests to minimize their monetary cost while meeting their service level objectives (SLOs). The core of MBS is a performance and cost estimator driven by analytical models supercharged by a Bayesian optimizer. MBS is prototyped and evaluated on AWS using bursty workloads. Experimental results show that MBS preserves SLOs while outperforming the state-of-the-art by up to 8 x in terms of cost savings while minimizing the padding overhead by up to 37 x with 3 x less number of serverless function invocations.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI). USENIX, 265--283.
[2]
Adil Akhter, Marios Fragkoulis, and Asterios Katsifodimos. 2019. Stateful functions as a service in action. Proceedings of the VLDB Endowment 12, 12 (2019), 1890--1893.
[3]
Ahsan Ali, Riccardo Pinciroli, Feng Yan, and Evgenia Smirni. 2020. Batch: machine learning inference serving on serverless platforms with adaptive batching. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE, 1--15.
[4]
Ahsan Ali, Hemant Sharma, Rajkumar Kettimuthu, Peter Kenesei, Dennis Trujillo, Antonino Miceli, Ian Foster, Ryan Coffee, Jana Thayer, and Zhengchun Liu. 2022. fairDMS: Rapid Model Training by Data and Model Reuse. arXiv preprint arXiv:2204.09805 (2022).
[5]
Ahsan Ali, Syed Zawad, Paarijaat Aditya, Istemi Ekin Akkus, Ruichuan Chen, and Feng Yan. 2022. SMLT: A Serverless Framework for Scalable and Adaptive Machine Learning Design and Training. arXiv preprint arXiv:2205.01853 (2022).
[6]
Lixiang Ao, Liz Izhikevich, Geoffrey M. Voelker, and George Porter. 2018. Sprocket: A serverless video processing framework. In Proceedings of the Symposium on Cloud Computing (SoCC). ACM, 263--274.
[7]
Tayebeh Bahreini, Hossein Badri, and Daniel Grosu. 2021. Mechanisms for resource allocation and pricing in mobile edge computing systems. IEEE Transactions on Parallel and Distributed Systems 33, 3 (2021), 667--682.
[8]
Ioana Baldini, Paul C. Castro, Kerry Shih-Ping Chang, Perry Cheng, Stephen Fink, Vatche Ishakian, Nick Mitchell, Vinod Muthusamy, Rodric Rabbah, Aleksander Slominski, and Philippe Suter. 2017. Serverless computing: Current trends and open problems. In Research Advances in Cloud Computing. Springer, 1--20.
[9]
Daniel Barcelona-Pons, Marc Sánchez-Artigas, Gerard París, Pierre Sutra, and Pedro García-López. 2019. On the faas track: Building stateful distributed applications with serverless architectures. In Proceedings of the International Middleware Conference (Middleware). ACM, 41--54.
[10]
Falko Bause, Peter Buchholz, and Jan Kriege. 2009. A comparison of Markovian arrival and ARMA/ARTA processes for the modeling of correlated input processes. In Proceedings of the Winter Simulation Conference (WSC). IEEE, 634--645.
[11]
Syrine Belakaria, Aryan Deshwal, Nitthilan Kannappan Jayakodi, and Janardhan Rao Doppa. 2020. Uncertainty-aware search framework for multi-objective Bayesian optimization. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 34. AAAI Press, 10044--10052.
[12]
Julian Berk, Sunil Gupta, Santu Rana, and Svetha Venkatesh. 2020. Randomised Gaussian Process Upper Confidence Bound for Bayesian Optimisation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). IJCAI, 2284--2290.
[13]
Anirban Bhattacharjee, Ajay Dev Chhokra, Zhuangwei Kang, Hongyang Sun, Aniruddha Gokhale, and Gabor Karsai. 2019. BARISTA: Efficient and Scalable Serverless Serving System for Deep Learning Prediction Services. In Proceedings of the International Conference on Cloud Engineering (IC2E). IEEE, 23--33.
[14]
Eric Brochu, Vlad M. Cora, and Nando De Freitas. 2010. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010).
[15]
Joao Carreira, Pedro Fonseca, Alexey Tumanov, Andrew Zhang, and Randy Katz. 2018. A Case for Serverless Machine Learning. In Workshop on Systems for ML and Open Source Software at NeurIPS.
[16]
Joao Carreira, Pedro Fonseca, Alexey Tumanov, Andrew Zhang, and Randy Katz. 2019. Cirrus: a Serverless Framework for End-to-end ML Workflows. In Proceedings of the Symposium on Cloud Computing (SoCC). ACM, 13--24.
[17]
Giuliano Casale, Ningfang Mi, Ludmila Cherkasova, and Evgenia Smirni. 2008. How to parameterize models with bursty workloads. SIGMETRICS Performance Evaluation Review 36, 2 (2008), 38--44.
[18]
Giuliano Casale, Ningfang Mi, and Evgenia Smirni. 2010. Model-Driven System Capacity Planning under Workload Burstiness. IEEE Transactions on Computers 59, 1 (2010), 66--80.
[19]
Giuliano Casale, Eddy Z. Zhang, and Evgenia Smirni. 2010. KPC-Toolbox: Best recipes for automatic trace fitting using Markovian Arrival Processes. Performance Evaluation 67, 9 (2010), 873--896.
[20]
Giuliano Casale, Eddy Z. Zhang, and Evgenia Smirni. 2010. Trace data characterization and fitting for Markov modeling. Performance Evaluation 67, 2 (2010), 61--79.
[21]
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015).
[22]
Zhengdao Chen, Xiang Li, and Joan Bruna. 2017. Supervised community detection with line graph neural networks. arXiv preprint arXiv:1705.08415 (2017).
[23]
Bin Cheng, Jonathan Fuerst, Gurkan Solmaz, and Takuya Sanada. 2019. Fog Function: Serverless Fog Computing for Data Intensive IoT Services. In Proceedings of the International Conference on Services Computing (SCC). IEEE, 28--35.
[24]
Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A low-latency online prediction serving system. In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI). USENIX, 613--627.
[25]
Bailu Ding, Lucja Kot, and Johannes Gehrke. 2018. Improving optimistic concurrency control through transaction batching and operation reordering. Proceedings of the VLDB Endowment 12, 2 (2018), 169--182.
[26]
Unai Elordi, Luis Unzueta, Jon Goenetxea, Sergio Sanchez-Carballido, Ignacio Arganda-Carreras, and Oihana Otaegui. 2020. Benchmarking deep neural network inference performance on serverless environments with MLPerf. IEEE Software 38, 1 (2020), 81--87.
[27]
Jiarui Fang, Yang Yu, Chengduo Zhao, and Jie Zhou. 2021. TurboTransformers: an efficient GPU serving system for transformer models. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 389--402.
[28]
Lang Feng, Prabhakar Kudva, Dilma Da Silva, and Jiang Hu. 2018. Exploring serverless computing for neural network training. In Proceedings of the International Conference on Cloud Computing (CLOUD). IEEE, 334--341.
[29]
Peter I. Frazier. 2018. Bayesian optimization. In Recent Advances in Optimization and Modeling of Contemporary Problems. INFORMS, 255--278.
[30]
Pin Gao, Lingfan Yu, Yongwei Wu, and Jinyang Li. 2018. Low latency RNN inference with cellular batching. In Proceedings of the EuroSys Conference (EuroSys). ACM, 31:1--31:15.
[31]
Arpan Gujarati, Sameh Elnikety, Yuxiong He, Kathryn S. McKinley, and Björn B. Brandenburg. 2017. Swayam: distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency. In Proceedings of the International Middleware Conference (Middleware). ACM, 109--120.
[32]
Jashwant Raj Gunasekaran, Prashanth Thinakaran, Mahmut Taylan Kandemir, Bhuvan Urgaonkar, George Kesidis, and Chita Das. 2019. Spock: Exploiting serverless functions for slo and cost aware resource procurement in public cloud. In Proceedings of the International Conference on Cloud Computing (CLOUD). IEEE, 199--208.
[33]
Luis Felipe Herrera-Quintero, Julian Camilo Vega-Alfonso, Klaus Bodo Albert Banse, and Eduardo Carrillo Zambrano. 2018. Smart ITS sensor for the transportation planning based on IoT approaches using serverless and microservices architecture. IEEE Intelligent Transportation Systems Magazine 10, 2 (2018), 17--27.
[34]
Connor Holmes, Daniel Mawhirter, Yuxiong He, Feng Yan, and Bo Wu. 2019. Grnn: Low-latency and scalable rnn inference on gpus. In Proceedings of the Fourteenth EuroSys Conference 2019. 1--16.
[35]
Vatche Ishakian, Vinod Muthusamy, and Aleksander Slominski. 2018. Serving deep learning models in a serverless platform. In Proceedings of the International Conference on Cloud Engineering (IC2E). IEEE, 257--262.
[36]
Jananie Jarachanthan, Li Chen, Fei Xu, and Bo Li. 2021. AMPS-Inf: Automatic Model Partitioning for Serverless Inference with Cost Efficiency. In Proceedings of the International Conference on Parallel Processing (ICPP). ACM, 1--12.
[37]
Jiawei Jiang, Shaoduo Gan, Yue Liu, Fanlin Wang, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, and Ce Zhang. 2021. Towards Demystifying Serverless Machine Learning Training. In Proceedings of the International Conference on Management of Data (SIGMOD). ACM, 857--871.
[38]
Ram Srivatsa Kannan, Lavanya Subramanian, Ashwin Raju, Jeongseob Ahn, Jason Mars, and Lingjia Tang. 2019. GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks. In Proceedings of the EuroSys Conference (EuroSys). ACM, 34:1--34:16.
[39]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[40]
Ana Klimovic, Yawen Wang, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, and Christos Kozyrakis. 2018. Pocket: Elastic Ephemeral Storage for Serverless Analytics. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI). USENIX, 427--444.
[41]
Malte S. Kurz. 2021. Distributed Double Machine Learning with a Serverless Architecture. In Proceedings of the International Conference on Performance Engineering (ICPE-C). ACM/SPEC, 27--33.
[42]
Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. 2018. Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324 (2018).
[43]
Zhengchun Liu, Ahsan Ali, Peter Kenesei, Antonino Miceli, Hemant Sharma, Nicholas Schwarz, Dennis Trujillo, Hyunseung Yoo, Ryan Coffee, Naoufal Layad, et al. 2021. Bridge data center AI systems with edge computing for actionable information retrieval. arXiv preprint arXiv:2105.13967 (2021).
[44]
Zhengchun Liu, Ahsan Ali, Peter Kenesei, Antonino Miceli, Hemant Sharma, Nicholas Schwarz, Dennis Trujillo, Hyunseung Yoo, Ryan Coffee, Naoufal Layad, et al. 2021. Bridging data center AI systems with edge computing for actionable information retrieval. In 2021 3rd Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP). IEEE, 15--23.
[45]
Ashraf Mahgoub, Karthick Shankar, Subrata Mitra, Ana Klimovic, Somali Chaterji, and Saurabh Bagchi. 2021. SONIC: Application-aware Data Passing for Chained Serverless Applications. In Proceedings of the Annual Technical Conference (ATC). USENIX, 285--301.
[46]
John C. McCullough, John Dunagan, Alec Wolman, and Alex C. Snoeren. 2010. Stout: An adaptive interface to scalable cloud storage. In Proceedings of the Annual Technical Conference (ATC). USENIX, 47--60.
[47]
Nicholas Metropolis and Stanislaw Ulam. 1949. The Monte Carlo Method. Journal of the American Statistical Association 44, 247 (1949), 335--341.
[48]
Ningfang Mi, Giuliano Casale, Ludmila Cherkasova, and Evgenia Smirni. 2008. Burstiness in Multi-tier Applications: Symptoms, Causes, and New Models. In Proceedings of the International Middleware Conference (Middleware). ACM, 265--286.
[49]
Ningfang Mi, Qi Zhang, Alma Riska, Evgenia Smirni, and Erik Riedel. 2007. Performance impacts of autocorrelated flows in multi-tiered systems. Performance Evaluation 64, 9-12 (2007), 1082--1101.
[50]
Microsoft. 2022. Azure. Create the games that you would play. https://azure.microsoft.com/en-us/. [Online; accessed 04-December-2019].
[51]
Ingo Müller, Renato Marroquin, and Gustavo Alonso. 2020. Lambada: Interactive Data Analytics on Cold Data Using Serverless Cloud Infrastructure. In Proceedings of the International Conference on Management of Data (SIGMOD). ACM, 115--130.
[52]
Mahyar Najibi, Bharat Singh, and Larry Davis. 2019. AutoFocus: Efficient Multi-Scale Inference. In Proceedings of the International Conference on Computer Vision (ICCV). IEEE, 9744--9754.
[53]
Marcel F. Neuts. 1989. Structured Stochastic Matrices of M/G/1 Type and Their Applications. Vol. 5. Marcel Dekker New York.
[54]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS). 8024--8035.
[55]
Alfonso Pérez, Sebastián Risco, Diana María Naranjo, Miguel Caballer, and Germán Moltó. 2019. On-premises Serverless Computing for Event-Driven Data Processing Applications. In Proceedings of the International Conference on Cloud Computing (CLOUD). IEEE, 414--421.
[56]
Antônio J. Pinheiro, Paulo Freitas de Araujo-Filho, Jeandro de M. Bezerra, and Divanilson R. Campelo. 2021. Adaptive Packet Padding Approach for Smart Home Networks: A Trade-off between Privacy and Performance. IEEE Internet of Things Journal 8, 5 (2021), 3930--3938.
[57]
Jonathan Ponader, Kyle Thomas, Sandip Kundu, and Yan Solihin. 2021. MILR: Mathematically induced layer recovery for plaintext space error correction of CNNs. In Proceedings of the International Conference on Dependable Systems and Networks (DSN). IEEE, 75--87.
[58]
Jeff Rajewski. 2018. System and method for live streaming content to subscription audiences using a serverless computing system. US Patent Application No. 15/369,473.
[59]
Carl Edward Rasmussen and Christopher K. I. Williams. 2006. Gaussian processes for machine learning. MIT Press.
[60]
Amazon Web Services. 2022. Amazon SageMaker. https://aws.amazon.com/sagemaker/. [Online; accessed 04-April-2022].
[61]
Amazon Web Services. 2022. AWS Lambda. https://aws.amazon.com/lambda/. [Online; accessed 04-April-2022].
[62]
Amazon Web Services. 2022. AWS Lambda Pricing. https://aws.amazon.com/lambda/pricing/. [Online; accessed 04-April-2022].
[63]
Amazon Web Services. 2022. Start Building on AWS Today. https://aws.amazon.com. [Online; accessed 04-April-2022].
[64]
Archive Team. 2017. The Twitter Stream Grab. https://archive.org/details/twitterstream?and[]=year%3A%222017%22. [Online; accessed 04-April-2022].
[65]
Google Cloud Platform. 2022. Cloud Functions. https://cloud.google.com/functions/. [Online; accessed 04-April-2022].
[66]
Google Cloud Platform. 2022. Dream, build, and transform with Google Cloud. https://cloud.google.com/ [Online; accessed 04-April-2022].
[67]
Microsoft. 2022. Azure Functions. https://azure.microsoft.com/en-us/services/functions/. [Online; accessed 04-April-2022].
[68]
Alma Riska and Evgenia Smirni. 2002. MAMSolver: A Matrix Analytic Methods Tool. In Proceedings of the International Conference on Modelling Techniques and Tools for Computer Performance Evaluation (TOOLS). Springer, 205--211.
[69]
Anthony Rousseau, Paul Deléglise, Yannick Esteve, et al. 2014. Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks. In LREC. 3935--3939.
[70]
Maximilian Schleich and Dan Olteanu. 2020. LMFAO: An Engine for Batches of Group-By Aggregates. Proceedings of the VLDB Endowment 13, 12 (2020), 2945--2948.
[71]
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European semantic web conference. Springer, 593--607.
[72]
Franyell Silfa, Jose Maria Arnau, and Antonio Gonzalez. 2020. E-BATCH: Energy-Efficient and High-Throughput RNN Batching. arXiv preprint arXiv.2009.10656 (2020).
[73]
Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical Bayesian Optimization of Machine Learning Algorithms. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS). 2960--2968.
[74]
Niranjan Srinivas, Andreas Krause, Sham M. Kakade, and Matthias W. Seeger. 2010. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. In Proceedings of the International Conference on Machine Learning (ICML). Omnipress, 1015--1022.
[75]
Rachael Tatman. 2017. Speech Accent Archive. https://www.kaggle.com/rtatman/speech-accent-archive. [Online; accessed 04-April-2022].
[76]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv.1710.10903 (2017).
[77]
Hao Wang, Di Niu, and Baochun Li. 2019. Distributed Machine Learning with a Serverless Architecture. In Proceedings of the Conference on Computer Communications (INFOCOM). IEEE, 1288--1296.
[78]
Chenggang Wu, Vikram Sreekanti, and Joseph M. Hellerstein. 2020. Transactional Causal Consistency for Serverless Computing. In Proceedings of the International Conference on Management of Data (SIGMOD). ACM, 83--97.
[79]
Ji Xue, Robert Birke, Lydia Y. Chen, and Evgenia Smirni. 2016. Managing Data Center Tickets: Prediction and Active Sizing. In Proceedings of the International Conference on Dependable Systems and Networks (DSN). IEEE, 335--346.
[80]
Feng Yan, Olatunji Ruwase, Yuxiong He, and Evgenia Smirni. 2016. SERF: efficient scheduling for fast deep neural network serving via judicious parallelism. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE, 300--311.
[81]
Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan. 2019. MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving. In Proceedings of the Annual Technical Conference (ATC). USENIX.
[82]
Hong Zhang, Lan Zhang, Lan Xu, Xiaoyang Ma, Zhengtao Wu, Cong Tang, Wei Xu, and Yiguo Yang. 2020. A Request-level Guaranteed Delivery Advertising Planning: Forecasting and Allocation. In Proceedings of the Conference on Knowledge Discovery and Data Mining (SIGKDD). ACM, 2980--2988.
[83]
Qi Zhang, Alma Riska, Wei Sun, Evgenia Smirni, and Gianfranco Ciardo. 2005. Workload-Aware Load Balancing for Clustered Web Servers. IEEE Transactions on Parallel and Distributed Systems 16, 3 (2005), 219--233.

Cited By

View all
  • (2024)Optimus: Warming Serverless ML Inference via Inter-Function Model TransformationProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629567(1039-1053)Online publication date: 22-Apr-2024
  • (2024)FaaSGraph: Enabling Scalable, Efficient, and Cost-Effective Graph Processing with Serverless ComputingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640361(385-400)Online publication date: 27-Apr-2024
  • (2024)Intelligence-Endogenous Management Platform for Computing and Network ConvergenceIEEE Network: The Magazine of Global Internetworking10.1109/MNET.2023.332152938:4(166-173)Online publication date: 1-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 15, Issue 10
June 2022
319 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 June 2022
Published in PVLDB Volume 15, Issue 10

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)10
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimus: Warming Serverless ML Inference via Inter-Function Model TransformationProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629567(1039-1053)Online publication date: 22-Apr-2024
  • (2024)FaaSGraph: Enabling Scalable, Efficient, and Cost-Effective Graph Processing with Serverless ComputingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640361(385-400)Online publication date: 27-Apr-2024
  • (2024)Intelligence-Endogenous Management Platform for Computing and Network ConvergenceIEEE Network: The Magazine of Global Internetworking10.1109/MNET.2023.332152938:4(166-173)Online publication date: 1-Jul-2024
  • (2024)SMSS: Stateful Model Serving in Metaverse With Serverless Computing and GPU SharingIEEE Journal on Selected Areas in Communications10.1109/JSAC.2023.334540142:3(799-811)Online publication date: 1-Mar-2024
  • (2024)A Survey on Scheduling Techniques in Computing and Network ConvergenceIEEE Communications Surveys & Tutorials10.1109/COMST.2023.332902726:1(160-195)Online publication date: 1-Jan-2024
  • (2023)On Serving Image Classification ModelsProceedings of the 9th International Workshop on Serverless Computing10.1145/3631295.3631401(48-52)Online publication date: 11-Dec-2023
  • (2023)BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler ApproachProceedings of the ACM on Management of Data10.1145/36173271:3(1-29)Online publication date: 13-Nov-2023
  • (2023)Cost-Efficient Serverless Inference Serving with Joint Batching and Multi-ProcessingProceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3609510.3609816(43-49)Online publication date: 24-Aug-2023
  • (2023)FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in Serverless Computing for Deep Learning InferenceProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605638(635-644)Online publication date: 7-Aug-2023
  • (2023)mSIRM: Cost-Efficient and SLO-aware ML Load Balancing on Fog and Multi-Cloud NetworkProceedings of the 13th Workshop on AI and Scientific Computing at Scale using Flexible Computing10.1145/3589013.3596676(19-26)Online publication date: 10-Aug-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media