research-article

Public Access

RIBBON: cost-effective and qos-aware deep learning model inference using a diverse pool of cloud computing instances

Authors:

Rohan Basu Roy,

Vijay Gadepally,

Karen Gettings,

Devesh TiwariAuthors Info & Claims

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 24, Pages 1 - 13

https://doi.org/10.1145/3458817.3476168

Published: 13 November 2021 Publication History

Abstract

Deep learning model inference is a key service in many businesses and scientific discovery processes. This paper introduces Ribbon, a novel deep learning inference serving system that meets two competing objectives: quality-of-service (QoS) target and cost-effectiveness. The key idea behind Ribbon is to intelligently employ a diverse set of cloud computing instances (heterogeneous instances) to meet the QoS target and maximize cost savings. Ribbon devises a Bayesian Optimization-driven strategy that helps users build the optimal set of heterogeneous instances for their model inference service needs on cloud computing platforms - and, Ribbon demonstrates its superiority over existing approaches of inference serving systems using homogeneous instance pools. Ribbon saves up to 16% of the inference service cost for different learning models including emerging deep learning recommender system models and drug-discovery enabling models.

Supplementary Material

MP4 File (Ribbon_ Cost-Effective and QoS-Aware Deep Learning Model Inference Using a Diverse Pool of Cloud Computing Instances.mp4.mp4)

Presentation video

Download
288.57 MB

References

[1]

Yujin Oh, Sangjoon Park, and Jong Chul Ye. Deep learning covid-19 features on cxr using limited training data sets. IEEE Transactions on Medical Imaging, 39(8): 2688--2700, 2020.

[2]

Kristof T Schütt, Huziel E Sauceda, P-J Kindermans, Alexandre Tkatchenko, and K-R Müller. Schnet-a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148(24):241722, 2018.

[3]

Bethany Lusch, J Nathan Kutz, and Steven L Brunton. Deep learning for universal linear embeddings of nonlinear dynamics. Nature communications, 9(1):1--10, 2018.

[4]

Niall O'Mahony, Sean Campbell, Anderson Carvalho, Suman Harapanahalli, Gustavo Velasco Hernandez, Lenka Krpalkova, Daniel Riordan, and Joseph Walsh. Deep learning vs. traditional computer vision. In Science and Information Conference, pages 128--144. Springer, 2019.

[5]

Ali Farhadi and Joseph Redmon. Yolov3: An incremental improvement. Computer Vision and Pattern Recognition, cite as, 2018.

[6]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligence, 2017.

Digital Library

[7]

Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S Lee, David Brooks, and Carole-Jean Wu. Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference. arXiv preprint arXiv:2001.02772, 2020.

[8]

Andreas Argyriou, Miguel González-Fierro, and Le Zhang. Microsoft recommenders: Best practices for production-ready recommendation systems. In Companion Proceedings of the Web Conference 2020, pages 50--51, 2020.

Digital Library

[9]

Samuel Hsia, Udit Gupta, Mark Wilkening, Carole-Jean Wu, Gu-Yeon Wei, and David Brooks. Cross-stack workload characterization of deep recommendation systems. In 2020 IEEE International Symposium on Workload Characterization (IISWC), pages 157--168. IEEE, 2020.

[10]

Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Mark Hempstead, Bill Jia, et al. The architectural implications of facebook's dnn-based personalized recommendation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 488--501. IEEE, 2020.

[11]

Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture, pages 1--12, 2017.

Digital Library

[12]

Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiaodong Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, et al. Deep learning training in facebook data centers: Design of scale-up and scale-out systems. arXiv preprint arXiv:2003.09518, 2020.

[13]

Jingo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, et al. Deep learning inference in facebook data centers: Characterization, performance optimizations and hardware implications. arXiv preprint arXiv:1811.09886, 2018.

[14]

Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, and Fahim Kawsar. An early resource characterization of deep learning on wearables, smartphones and internet-of-things devices. In Proceedings of the 2015 international workshop on internet of things towards applications, pages 7--12, 2015.

Digital Library

[15]

Jing Li, Kunal Agrawal, Sameh Elnikety, Yuxiong He, I-Ting Angelina Lee, Chenyang Lu, and Kathryn S McKinley. Work stealing for interactive services to meet target latency. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 1--13, 2016.

Digital Library

[16]

Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, et al. An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 3--18, 2019.

Digital Library

[17]

Johann Hauswald, Michael A Laurenzano, Yunqi Zhang, Cheng Li, Austin Rovinski, Arjun Khurana, Ronald G Dreslinski, Trevor Mudge, Vinicius Petrucci, Lingjia Tang, et al. Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 223--238, 2015.

Digital Library

[18]

Harshad Kasture and Daniel Sanchez. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization (IISWC), pages 1--10. IEEE, 2016.

[19]

Dhiraj Kalamkar, Evangelos Georganas, Sudarshan Srinivasan, Jianping Chen, Mikhail Shiryaev, and Alexander Heinecke. Optimizing deep learning recommender systems training on cpu cluster architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1--15, 2020.

Digital Library

[20]

Neeraja J Yadwadkar, Bharath Hariharan, Joseph E Gonzalez, Burton Smith, and Randy H Katz. Selecting the best vm across multiple public clouds: A data-driven performance modeling approach. In Proceedings of the 2017 Symposium on Cloud Computing, pages 452--465, 2017.

Digital Library

[21]

Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 153--167, 2017.

Digital Library

[22]

Sokol Kosta, Andrius Aucinas, Pan Hui, Richard Mortier, and Xinwen Zhang. Thinkair: Dynamic resource allocation and parallel execution in the cloud for mobile code offloading. In 2012 Proceedings IEEE Infocom, pages 945--953. IEEE, 2012.

[23]

Xiangbo Li, Mohsen Amini Salehi, Yamini Joshi, Mahmoud K Darwich, Brad Landreneau, and Magdy Bayoumi. Performance analysis and modeling of video transcoding using heterogeneous cloud services. IEEE Transactions on Parallel and Distributed Systems, 30(4):910--922, 2018.

Digital Library

[24]

Krzysztof Rzadca, Pawel Findeisen, Jacek Swiderski, Przemyslaw Zych, Przemyslaw Broniek, Jarek Kusmierek, Pawel Nowak, Beata Strack, Piotr Witusowski, Steven Hand, et al. Autopilot: workload autoscaling at google. In Proceedings of the Fifteenth European Conference on Computer Systems, pages 1--16, 2020.

Digital Library

[25]

Marcus Felipe Fontoura, Ricardo Gouvea Bianchini, and Girish Bablani. Predictive rightsizing for virtual machines in cloud computing systems, May 30 2019. US Patent App. 16/115,414.

[26]

Maotong Xu, Sultan Alamro, Tian Lan, and Suresh Subramaniam. Cred: Cloud right-sizing with execution deadlines and data locality. IEEE Transactions on Parallel and Distributed Systems, 28(12):3389--3400, 2017.

Digital Library

[27]

Vojislav Dukic and Ankit Singla. Happiness index: Right-sizing the cloud's tenant-provider interface. In 11th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 19), 2019.

[28]

Chin-Jung Hsu, Vivek Nair, Tim Menzies, and Vincent Freeh. Micky: A cheaper alternative for selecting cloud instances. In 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pages 409--416. IEEE, 2018.

[29]

Joel Scheuner and Philipp Leitner. Estimating cloud application performance based on micro-benchmark profiling. In 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pages 90--97. IEEE, 2018.

[30]

Kai Hwang, Xiaoying Bai, Yue Shi, Muyang Li, Wen-Guang Chen, and Yongwei Wu. Cloud performance modeling with benchmark evaluation of elastic scaling strategies. IEEE Transactions on parallel and distributed systems, 27(1):130--143, 2015.

Digital Library

[31]

Ang Li, Xiaowei Yang, Ming Zhang, and S Kandula. Cloudcmp: Shopping for a cloud made easy. HotCloud, 10:1--7, 2010.

[32]

Joel Scheuner and Philipp Leitner. A cloud benchmark suite combining micro and applications benchmarks. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering, pages 161--166, 2018.

Digital Library

[33]

Mohammad Shahrad and David Wentzlaff. Availability knob: Flexible user-defined availability in the cloud. In Proceedings of the Seventh ACM Symposium on Cloud Computing, pages 42--56, 2016.

Digital Library

[34]

Andrew D Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca. Jockey: guaranteed job latency in data parallel clusters. In Proceedings of the 7th ACM european conference on Computer Systems, pages 99--112, 2012.

Digital Library

[35]

Abhishek Verma, Ludmila Cherkasova, and Roy H Campbell. Aria: automatic resource inference and allocation for mapreduce environments. In Proceedings of the 8th ACM international conference on Autonomic computing, pages 235--244, 2011.

Digital Library

[36]

Shivaram Venkataraman, Zongheng Yang, Michael Franklin, Benjamin Recht, and Ion Stoica. Ernest: Efficient performance prediction for large-scale advanced analytics. In 13th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 16), pages 363--378, 2016.

[37]

Virajith Jalaparti, Hitesh Ballani, Paolo Costa, Thomas Karagiannis, and Ant Rowstron. Bridging the tenant-provider gap in cloud services. In Proceedings of the Third ACM Symposium on Cloud Computing, pages 1--14, 2012.

Digital Library

[38]

Juwei Shi, Jia Zou, Jiaheng Lu, Zhao Cao, Shiqiang Li, and Chen Wang. Mrtuner: a toolkit to enable holistic optimization for mapreduce jobs. Proceedings of the VLDB Endowment, 7(13):1319--1330, 2014.

Digital Library

[39]

Yurong Jiang, Lenin Ravindranath Sivalingam, Suman Nath, and Ramesh Govindan. Webperf: Evaluating what-if scenarios for cloud-hosted web applications. In Proceedings of the 2016 ACM SIGCOMM Conference, pages 258--271, 2016.

Digital Library

[40]

Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan. Mark: Exploiting cloud services for cost-effective, slo-aware machine learning inference serving. In 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19), pages 1049--1062, 2019.

[41]

Yunjing Xu, Zachary Musgrave, Brian Noble, and Michael Bailey. Bobtail: Avoiding long tails in the cloud. In 10th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 13), pages 329--341, 2013.

[42]

Jeongseob Ahn, Changdae Kim, Jaeung Han, Young-ri Choi, and Jaehyuk Huh. Dynamic virtual machine scheduling in clouds for architectural shared resources. In 4th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 12), 2012.

[43]

Seyedhamid Mashhadi Moghaddam, Sareh Fotuhi Piraghaj, Michael O'Sullivan, Cameron Walker, and Charles Unsworth. Energy-efficient and sla-aware virtual machine selection algorithm for dynamic resource allocation in cloud data centers. In 2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC), pages 103--113. IEEE, 2018.

[44]

Salman A Baset, Long Wang, and Chunqiang Tang. Towards an understanding of oversubscription in cloud. In 2nd {USENIX} Workshop on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services (Hot-ICE 12), 2012.

[45]

Peipei Zhou, Jiayi Sheng, Cody Hao Yu, Peng Wei, Jie Wang, Di Wu, and Jason Cong. Mocha: Multinode cost optimization in heterogeneous clouds with accelerators. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 273--279, 2021.

Digital Library

[46]

Chin-Jung Hsu, Vivek Nair, Vincent W Freeh, and Tim Menzies. Arrow: Low-level augmented bayesian optimization for finding the best cloud vm. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pages 660--670. IEEE, 2018.

[47]

Luo Mai, Guo Li, Marcel Wagenländer, Konstantinos Fertakis, Andrei-Octavian Brabete, and Peter Pietzuch. Kungfu: Making training in distributed machine learning adaptive. In 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20), pages 937--954, 2020.

[48]

Sajib Kundu, Raju Rangaswami, Ajay Gulati, Ming Zhao, and Kaushik Dutta. Modeling virtualized applications using machine learning techniques. In Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments, pages 3--14, 2012.

Digital Library

[49]

Marcel Wagenländer, Luo Mai, Guo Li, and Peter Pietzuch. Spotnik: Designing distributed machine learning for transient cloud resources. In 12th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 20), 2020.

[50]

Changyeon Jo, Youngsu Cho, and Bernhard Egger. A machine learning approach to live migration modeling. In Proceedings of the 2017 Symposium on Cloud Computing, pages 351--364, 2017.

Digital Library

[51]

Jeff Zhang, Sameh Elnikety, Shuayb Zarar, Atul Gupta, and Siddharth Garg. Model-switching: Dealing with fluctuating workloads in machine-learning-as-a-service systems. In 12th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 20), 2020.

[52]

Ron C Chiang, Jinho Hwang, H Howie Huang, and Timothy Wood. Matrix: Achieving predictable virtual machine performance in the clouds. In 11th International Conference on Autonomic Computing ({ICAC} 14), pages 45--56, 2014.

[53]

Álvaro López García, Jesus Marco De Lucas, Marica Antonacci, Wolfgang Zu Castell, Mario David, Marcus Hardt, Lara Lloret Iglesias, Germán Moltó, Marcin Plociennik, Viet Tran, et al. A cloud-based framework for machine learning workloads and applications. IEEE access, 8:18681--18692, 2020.

[54]

Xiao Wang, Yuanyuan Zhang, Shengnan Yu, Xiwei Liu, Yong Yuan, and Fei-Yue Wang. E-learning recommendation framework based on deep learning. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 455--460. IEEE, 2017.

Digital Library

[55]

Moshe Unger, Alexander Tuzhilin, and Amit Livne. Context-aware recommendations based on deep learning frameworks. ACM Transactions on Management Information Systems (TMIS), 11(2):1--15, 2020.

[56]

Hui Fang, Danning Zhang, Yiheng Shu, and Guibing Guo. Deep learning for sequential recommendation: Algorithms, influential factors, and evaluations. ACM Transactions on Information Systems (TOIS), 39(1):1--42, 2020.

[57]

Ahsan Ali, Riccardo Pinciroli, Feng Yan, and Evgenia Smirni. Batch: machine learning inference serving on serverless platforms with adaptive batching. In 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 972--986. IEEE Computer Society, 2020.

[58]

Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. Cherrypick: Adaptively unearthing the best cloud configurations for big data analytics. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17), pages 469--482, 2017.

[59]

Yang Li, Zhenhua Han, Quanlu Zhang, Zhenhua Li, and Haisheng Tan. Automating cloud deployment for deep learning inference of real-time online services. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pages 1668--1677. IEEE, 2020.

Digital Library

[60]

Christina Delimitrou and Christos Kozyrakis. Paragon: Qos-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices, 48(4):77--88, 2013.

Digital Library

[61]

Christina Delimitrou and Christos Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. ACM SIGPLAN Notices, 49(4):127--144, 2014.

Digital Library

[62]

Gingfung Yeung, Damian Borowiec, Adrian Friday, Richard Harper, and Peter Garraghan. Towards {GPU} utilization prediction for cloud deep learning. In 12th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 20), 2020.

[63]

Chengzhi Lu, Kejiang Ye, Guoyao Xu, Cheng-Zhong Xu, and Tongxin Bai. Imbalance in the cloud: An analysis on alibaba cluster trace. In 2017 IEEE International Conference on Big Data (Big Data), pages 2884--2892. IEEE, 2017.

[64]

Yash Ukidave, Xiangyu Li, and David Kaeli. Mystic: Predictive scheduling for gpu based cloud servers using machine learning. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 353--362. IEEE, 2016.

[65]

Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, and Seungryoul Maeng. Locality-aware dynamic vm reconfiguration on mapreduce clouds. In Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, pages 27--36, 2012.

Digital Library

[66]

Marcelo Amaral, Jordà Polo, David Carrera, Seetharami Seelam, and Malgorzata Steinder. Topology-aware gpu scheduling for learning workloads in cloud environments. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1--12, 2017.

Digital Library

[67]

Esmail Asyabi, Azer Bestavros, Renato Mancuso, Richard West, and Erfan Sharafzadeh. Akita: A cpu scheduler for virtualized clouds. arXiv preprint arXiv:2009.09104, 2020.

[68]

Tirthak Patel and Devesh Tiwari. Clite: Efficient and qos-aware co-location of multiple latency-critical jobs for warehouse scale computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 193--206. IEEE, 2020.

[69]

Rohan Basu Roy, Tirthak Patel, and Devesh Tiwari. Satori: Efficient and fair resource partitioning by sacrificing short-term benefits for long-term gains. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pages 292--305. IEEE, 2021.

Digital Library

[70]

Rohan Basu Roy, Tirthak Patel, Vijay Gadepally, and Devesh Tiwari. Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, pages 1280--1295, 2021.

Digital Library

[71]

Fangfang Xia, Maulik Shukla, Thomas Brettin, Cristina Garcia-Cardona, Judith Cohn, Jonathan E Allen, Sergei Maslov, Susan L Holbeck, James H Doroshow, Yvonne A Evrard, et al. Predicting tumor cell line response to drug pairs with deep learning. BMC bioinformatics, 19(18):71--79, 2018.

[72]

Justin M Wozniak, Rajeev Jain, Prasanna Balaprakash, Jonathan Ozik, Nicholson T Collier, John Bauer, Fangfang Xia, Thomas Brettin, Rick Stevens, Jamaludin Mohd-Yusof, et al. Candle/supervisor: A workflow framework for machine learning applied to cancer research. BMC bioinformatics, 19(18):59--69, 2018.

[73]

Candle, 2021. URL https://candle.cels.anl.gov/.

[74]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.

[75]

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[76]

Ryan Chard, Zhuozhao Li, Kyle Chard, Logan Ward, Yadu Babuji, Anna Woodard, Steven Tuecke, Ben Blaiszik, Michael J Franklin, and Ian Foster. Dlhub: Model and data serving for science. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 283--292. IEEE, 2019.

[77]

Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed Chi. Recommending what video to watch next: a multitask ranking system. In Proceedings of the 13th ACM Conference on Recommender Systems, pages 43--51, 2019.

Digital Library

[78]

Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 5941--5948, 2019.

Digital Library

[79]

Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J Franklin, Joseph E Gonzalez, and Ion Stoica. Clipper: A low-latency online prediction serving system. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17), pages 613--627, 2017.

[80]

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proceedings of the IEEE, 104(1):148--175, 2015.

[81]

Michael A Osborne, Roman Garnett, and Stephen J Roberts. Gaussian Processes for Global Optimization. In 3rd international conference on learning and intelligent optimization (LION3), volume 2009, 2009.

[82]

Carl Edward Rasmussen. Gaussian processes in machine learning. In Summer school on machine learning, pages 63--71. Springer, 2003.

[83]

Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. arXiv preprint arXiv:1206.2944, 2012.

[84]

Takashi Wada and Hideitsu Hino. Bayesian optimization for multi-objective optimization and multi-point search, 2019.

[85]

Majid Abdolshah, Alistair Shilton, Santu Rana, Sunil Gupta, and Svetha Venkatesh. Multi-objective bayesian optimisation with preferences over objectives. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32, pages 12235--12245. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/a7b7e4b27722574c611fe91476a50238-Paper.pdf.

[86]

Eduardo C Garrido-Merchán and Daniel Hernández-Lobato. Dealing with categorical and integer-valued variables in bayesian optimization with gaussian processes. Neurocomputing, 380:20--35, 2020.

Digital Library

[87]

Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, et al. Mlperf inference benchmark. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pages 446--459. IEEE, 2020.

Digital Library

[88]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems, pages 7--10, 2016.

Digital Library

[89]

Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1059--1068, 2018.

Digital Library

[90]

En Li, Liekang Zeng, Zhi Zhou, and Xu Chen. Edge ai: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications, 19(1):447--457, 2019.

[91]

Arpan Gujarati, Sameh Elnikety, Yuxiong He, Kathryn S McKinley, and Björn B Brandenburg. Swayam: distributed autoscaling to meet slas of machine learning inference services with resource efficiency. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, pages 109--120, 2017.

Digital Library

[92]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web, pages 173--182, 2017.

Digital Library

[93]

Shuang Chen, Christina Delimitrou, and José F Martínez. Parties: Qos-aware resource partitioning for multiple interactive services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 107--120, 2019.

Digital Library

[94]

Ali Riza Yildiz. An effective hybrid immune-hill climbing optimization approach for solving design and manufacturing optimization problems in industry. Journal of Materials Processing Technology, 209(6):2773--2780, 2009.

[95]

Mohammad Shehab, Ahamad Tajudin Khader, Mohammed Azmi Al-Betar, and Laith Mohammad Abualigah. Hybridizing cuckoo search algorithm with hill climbing for numerical optimization problems. In 2017 8th International conference on information technology (ICIT), pages 36--43. IEEE, 2017.

[96]

Nuran Bradley. The response surface methodology. PhD thesis, Indiana University South Bend, 2007.

[97]

M Ahmadi, F Vahabzadeh, B Bonakdarpour, E Mofarrah, and M Mehranian. Application of the central composite design and response surface methodology to the advanced treatment of olive oil processing wastewater using fenton's peroxidation. Journal of Hazardous Materials, 123(1--3):187--195, 2005.

[98]

M Bashiri and A Farshbaf Geranmayeh. Tuning the parameters of an artificial neural network using central composite design and genetic algorithm. Scientia Iranica, 18(6):1600--1608, 2011.

[99]

Thiagarajan Rajmohan and Kayaroganam Palanikumar. Application of the central composite design in optimization of machining parameters in drilling hybrid metal matrix composites. Measurement, 46(4):1470--1481, 2013.

Cited By

Ali AMa XZawad SAditya PAkkus IChen RYang LYan F(2025)Enabling scalable and adaptive machine learning training via serverless computing on public cloudPerformance Evaluation10.1016/j.peva.2024.102451167(102451)Online publication date: Mar-2025
https://doi.org/10.1016/j.peva.2024.102451
Aslani AGhobaei-Arani M(2025)Machine learning inference serving models in serverless computing: a surveyComputing10.1007/s00607-024-01377-9107:1Online publication date: 7-Jan-2025
https://doi.org/10.1007/s00607-024-01377-9
Hui XXu YGuo ZShen XMencagli GDazzi PLowenthal DBadia R(2024)ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUsProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658657(42-55)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658657
Show More Cited By

Index Terms

RIBBON: cost-effective and qos-aware deep learning model inference using a diverse pool of cloud computing instances

Recommendations

mSIRM: Cost-Efficient and SLO-aware ML Load Balancing on Fog and Multi-Cloud Network
FlexScience '23: Proceedings of the 13th Workshop on AI and Scientific Computing at Scale using Flexible Computing

The use of intelligent sensors and edge devices has grown exponentially for automation in the industry to hyper-personalize applications, minimize cost, improve efficiency, and optimize operations. In a typical Internet-of-Thing (IoT) workflow, pre-...
CSI2: Cloud Server Idleness Identification by Advanced Machine Learning in Theories and Practice
Service-Oriented Computing
Abstract
Studies show that virtual machines (VMs) in cloud are easily forgotten with non-productive status. This incurs unnecessary cost for cloud tenants and resource waste for cloud providers. As a solution to this problem, we present our Cloud Server ...
Accounting for Gaussian Process Imprecision in Bayesian Optimization
Integrated Uncertainty in Knowledge Modelling and Decision Making
Abstract
Bayesian optimization (BO) with Gaussian processes (GP) as surrogate models is widely used to optimize analytically unknown and expensive-to-evaluate functions. In this paper, we propose Prior-mean-RObust Bayesian Optimization (PROBO) that ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2021

1493 pages

ISBN:9781450384421

DOI:10.1145/3458817

General Chair:
Bronis R. de Supinski,
Program Chairs:
Mary Hall,
Todd Gamblin

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

United States Air Force Research Laboratory, United States Air Force Artificial Intelligence Accelerator
NSF (National Science Foundation)

Conference

SC '21

Sponsor:

SIGHPC

SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 14 - 19, 2021

Missouri, St. Louis

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
1,266
Total Downloads

Downloads (Last 12 months)302
Downloads (Last 6 weeks)38

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ali AMa XZawad SAditya PAkkus IChen RYang LYan F(2025)Enabling scalable and adaptive machine learning training via serverless computing on public cloudPerformance Evaluation10.1016/j.peva.2024.102451167(102451)Online publication date: Mar-2025
https://doi.org/10.1016/j.peva.2024.102451
Aslani AGhobaei-Arani M(2025)Machine learning inference serving models in serverless computing: a surveyComputing10.1007/s00607-024-01377-9107:1Online publication date: 7-Jan-2025
https://doi.org/10.1007/s00607-024-01377-9
Hui XXu YGuo ZShen XMencagli GDazzi PLowenthal DBadia R(2024)ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUsProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658657(42-55)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658657
Pinto LAlves ADos Santos AMoura FOliveira Wde Morais Jde Oliveira RCardoso DDa Rocha Seruffo M(2024)Optimized Assertiveness-Cost Evaluation: An Innovative Performance Measuring Method for Machine Learning Models2024 IEEE Latin American Conference on Computational Intelligence (LA-CCI)10.1109/LA-CCI62337.2024.10814843(1-6)Online publication date: 13-Nov-2024
https://doi.org/10.1109/LA-CCI62337.2024.10814843
Cai SZhou ZZhao KChen X(2023)Cost-Efficient Serverless Inference Serving with Joint Batching and Multi-ProcessingProceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3609510.3609816(43-49)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.1145/3609510.3609816
Phalak CChahal DRamesh MSinghal RCostan ANicolae BSato K(2023)mSIRM: Cost-Efficient and SLO-aware ML Load Balancing on Fog and Multi-Cloud NetworkProceedings of the 13th Workshop on AI and Scientific Computing at Scale using Flexible Computing10.1145/3589013.3596676(19-26)Online publication date: 10-Aug-2023
https://dl.acm.org/doi/10.1145/3589013.3596676
Zhang WShi ZLiao ZLi YDu YWu YWang FFeng DMohror KArnold DBadia R(2023)Graph3PO: A Temporal Graph Data Processing Method for Latency QoS Guarantee in Object Cloud Storage SystemProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607075(1-16)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607075
Lazuka MAnghel ARam PPozidis HParnell T(2023)xCloudServing: Automated ML Serving Across Clouds2023 IEEE 16th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD60044.2023.00011(1-12)Online publication date: Jul-2023
https://doi.org/10.1109/CLOUD60044.2023.00011
Fu BChen FLi PZeng D(2022)TCB: Accelerating Transformer Inference Services with Request ConcatenationProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545052(1-11)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3545008.3545052
Li BGadepally VSamsi STiwari D(2022)Characterizing Multi-Instance GPU for Machine Learning Workloads2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW55747.2022.00124(724-731)Online publication date: May-2022
https://doi.org/10.1109/IPDPSW55747.2022.00124
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten