research-article

VELTAIR: towards high-performance multi-tenant deep learning services via adaptive compilation and scheduling

Authors:

Minyi GuoAuthors Info & Claims

ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 388 - 401

https://doi.org/10.1145/3503222.3507752

Published: 22 February 2022 Publication History

Abstract

Deep learning (DL) models have achieved great success in many application domains. As such, many industrial companies such as Google and Facebook have acknowledged the importance of multi-tenant DL services. Although the multi-tenant service has been studied in conventional workloads, it is not been deeply studied on deep learning service, especially on general-purpose hardware.

In this work, we systematically analyze the opportunities and challenges of providing multi-tenant deep learning services on the general-purpose CPU architecture from the aspects of scheduling granularity and code generation. We propose an adaptive granularity scheduling scheme to both guarantee resource usage efficiency and reduce the scheduling conflict rate. We also propose an adaptive compilation strategy, by which we can dynamically and intelligently pick a program with proper exclusive and shared resource usage to reduce overall interference-induced performance loss. Compared to the existing works, our design can serve more requests under the same QoS target in various scenarios (e.g., +71%, +62%, +45% for light, medium, and heavy workloads, respectively), and reduce the averaged query latency by 50%.

References

[1]

AMD. 2020. Ryzen Threadripper 3990X Processor. https://www.amd.com/en/products/cpu/amd-ryzen-threadripper-3990x

[2]

Eunjin Baek, Dongup Kwon, and Jangwoo Kim. 2020. A Multi-Neural Network Acceleration Architecture. In 47th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2020, Valencia, Spain, May 30 - June 3, 2020. IEEE, 940–953. https://doi.org/10.1109/ISCA45697.2020.00081

Digital Library

[3]

Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman P. Amarasinghe. 2019. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. In IEEE/ACM International Symposium on Code Generation and Optimization (CGO). https://doi.org/10.1109/CGO.2019.8661197

[4]

Quan Chen, Hailong Yang, Minyi Guo, Ram Srivatsa Kannan, Jason Mars, and Lingjia Tang. 2017. Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2017, Xi’an, China, April 8-12, 2017. ACM, 17–32. https://doi.org/10.1145/3037697.3037700

Digital Library

[5]

Quan Chen, Hailong Yang, Jason Mars, and Lingjia Tang. 2016. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2016, Atlanta, GA, USA, April 2-6, 2016. ACM, 681–696. https://doi.org/10.1145/2872362.2872368

Digital Library

[6]

Shuang Chen, Christina Delimitrou, and José F. Martínez. 2019. PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). https://doi.org/10.1145/3297858.3304005

Digital Library

[7]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In Architectural Support for Programming Languages and Operating Systems, ASPLOS 2014, Salt Lake City, UT, USA, March 1-5, 2014. ACM, 269–284. https://doi.org/10.1145/2541940.2541967

Digital Library

[8]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Q. Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018. USENIX Association, 578–594. https://doi.org/10.5555/3291168.3291211

Digital Library

[9]

Tianqi Chen, Lianmin Zheng, Eddie Q. Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. Learning to Optimize Tensor Programs. In Advances in Neural Information Processing Systems 31. 3393–3404. https://doi.org/10.5555/3327144.3327258

Digital Library

[10]

Yu-Hsin Chen, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2016, Seoul, South Korea, June 18-22, 2016. IEEE Computer Society, 367–379. https://doi.org/10.1109/ISCA.2016.40

Digital Library

[11]

Yujeong Choi, Yunseong Kim, and Minsoo Rhu. 2021. Lazy Batching: An SLA-aware Batching System for Cloud Machine Learning Inference. In IEEE International Symposium on High-Performance Computer Architecture, HPCA 2021, Seoul, South Korea, February 27 - March 3, 2021. IEEE, 493–506. https://doi.org/10.1109/HPCA51647.2021.00049

[12]

Yujeong Choi and Minsoo Rhu. 2020. PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units. In IEEE International Symposium on High Performance Computer Architecture, HPCA 2020, San Diego, CA, USA, February 22-26, 2020. IEEE, 220–233. https://doi.org/10.1109/HPCA47549.2020.00027

[13]

Weihao Cui, Mengze Wei, Quan Chen, Xiaoxin Tang, Jingwen Leng, Li Li, and Mingyi Guo. 2019. Ebird: Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services. In 37th IEEE International Conference on Computer Design, ICCD 2019, Abu Dhabi, United Arab Emirates, November 17-20, 2019. IEEE, 497–505. https://doi.org/10.1109/ICCD46524.2019.00075

[14]

Weihao Cui, Han Zhao, Quan Chen, Ningxin Zheng, Jingwen Leng, Jieru Zhao, Zhuo Song, Tao Ma, Yong Yang, Chao Li, and Minyi Guo. 2021. Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction. In The International Conference for High Performance Computing, Networking, Storage and Analysis (SC). https://doi.org/10.1145/3458817.3476143

Digital Library

[15]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. https://doi.org/10.18653/v1/n19-1423

[16]

Message Passing Interface Forum. 1994. MPI: A message - passing interface standard.

[17]

Karl Pearson F.R.S. 1901. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2, 11 (1901), 559–572. https://doi.org/10.1080/14786440109462720

[18]

Yiming Gan, Yuxian Qiu, Lele Chen, Jingwen Leng, and Yuhao Zhu. 2020. Low-Latency Proactive Continuous Vision. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT). https://doi.org/10.1145/3410463.3414650

Digital Library

[19]

Yiming Gan, Yuxian Qiu, Jingwen Leng, Minyi Guo, and Yuhao Zhu. 2020. Ptolemy: Architecture Support for Robust Deep Learning. In 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 241–255. https://doi.org/10.1109/MICRO50266.2020.00031

[20]

Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, and Nayan Katarki. [n.d.]. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, Providence, RI, USA, April 13-17, 2019. 3–18. https://doi.org/10.1145/3297858.3304013

Digital Library

[21]

Soroush Ghodrati, Byung Hoon Ahn, Joon Kyung Kim, Sean Kinzer, Brahmendra Reddy Yatham, Navateja Alla, Hardik Sharma, Mohammad Alian, Eiman Ebrahimi, Nam Sung Kim, Cliff Young, and Hadi Esmaeilzadeh. 2020. Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks. In 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020, Athens, Greece, October 17-21, 2020. IEEE, 681–697. https://doi.org/10.1109/MICRO50266.2020.00062

[22]

Google. 2020. XLA: Optimizing Compiler for TensorFlow. https://www.tensorflow.org/xla

[23]

Google. 2021. TensorFlow graph optimization with Grappler. https://www.tensorflow.org/guide/graph_optimization

[24]

Yue Guan, Jingwen Leng, Chao Li, Quan Chen, and Minyi Guo. 2020. How Far Does BERT Look At: Distance-based Clustering and Analysis of BERT’s Attention. In Proceedings of the 28th International Conference on Computational Linguistics (COLING). International Committee on Computational Linguistics, 3853–3860. https://doi.org/10.18653/v1/2020.coling-main.342

[25]

Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, and Jonathan Mace. 2020. Serving DNNs like Clockwork: Performance Predictability from the Bottom Up. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6, 2020. USENIX Association, 443–462. https://doi.org/10.5555/3488766.3488791

Digital Library

[26]

Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, and Yuhao Zhu. 2020. Accelerating sparse DNN models without hardware-support via tile-wise sparsity. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE/ACM. https://doi.org/10.1109/SC41405.2020.00020

[27]

Cong Guo, Yangjie Zhou, Jingwen Leng, Yuhao Zhu, Zidong Du, Quan Chen, Chao Li, Bin Yao, and Minyi Guo. 2020. Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration. In 57th ACM/IEEE Design Automation Conference, DAC 2020, San Francisco, CA, USA, July 20-24, 2020. IEEE, 1–6. https://doi.org/10.1109/DAC18072.2020.9218732

[28]

Kim M. Hazelwood, Sarah Bird, David M. Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, James Law, Kevin Lee, Jason Lu, Pieter Noordhuis, Misha Smelyanskiy, Liang Xiong, and Xiaodong Wang. 2018. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. In IEEE International Symposium on High Performance Computer Architecture, HPCA 2018, Vienna, Austria, February 24-28, 2018. IEEE Computer Society, 620–629. https://doi.org/10.1109/HPCA.2018.00059

[29]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 770–778. https://doi.org/10.1109/CVPR.2016.90

[30]

Intel. 2019. Math Kernel Library for Deep Neural Networks. https://github.com/rsdubtso/mkl-dnn

[31]

Zhihao Jia, Oded Padon, James J. Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. 2019. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP). ACM, 47–62. https://doi.org/10.1145/3341301.3359630

Digital Library

[32]

Norman P. Jouppi, Cliff Young, Nishant Patil, David A. Patterson, Gaurav Agrawal, Raminder Bajwa, and Sarah Bates. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24-28, 2017. ACM, 1–12. https://doi.org/10.1145/3079856.3080246

Digital Library

[33]

Harshad Kasture and Daniel Sánchez. 2016. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization, IISWC 2016, Providence, RI, USA, September 25-27, 2016. IEEE Computer Society, 3–12. https://doi.org/10.1109/IISWC.2016.7581261

[34]

Neeraj Kulkarni, Gonzalo Gonzalez-Pumariega, Amulya Khurana, Christine A. Shoemaker, Christina Delimitrou, and David H. Albonesi. 2020. CuttleSys: Data-Driven Resource Management for Interactive Services on Reconfigurable Multicores. In 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). https://doi.org/10.1109/MICRO50266.2020.00060

[35]

Michael A. Laurenzano, Yunqi Zhang, Lingjia Tang, and Jason Mars. 2014. Protean Code: Achieving Near-Free Online Code Transformations for Warehouse Scale Computers. In 47th Annual IEEE/ACM International Symposium on Microarchitecture, (MICRO). IEEE Computer Society, 558–570. https://doi.org/10.1109/MICRO.2014.21

Digital Library

[36]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 9905). Springer, 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

[37]

Zihan Liu, Jingwen Leng, Quan Chen, Chao Li, Wenli Zheng, Li Li, and Minyi Guo. 2020. DLFusion: An Auto-Tuning Compiler for Layer Fusion on Deep Neural Network Accelerator. In IEEE International Conference on Parallel & Distributed Processing with Applications (ISPA). IEEE, 118–127. https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00041

[38]

Zihan Liu, Jingwen Leng, Guandong Lu, Chenhui Wang, Quan Chen, and Minyi Guo. 2020. Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator. CCF Trans. High Perform. Comput., 2, 4 (2020), 332–347. https://doi.org/10.1007/s42514-020-00044-7

[39]

David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA). https://doi.org/10.1145/2749469.2749475

Digital Library

[40]

Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations. In IEEE/ACM International Symposium on Microarchitecture (MICRO). https://doi.org/10.1145/2155620.2155650

Digital Library

[41]

Microsoft. 2021. Optimize and Accelerate Machine Learning Inferencing and Training. https://onnxruntime.ai/

[42]

Pascale Minet, Eric Renault, Ines Khoufi, and Selma Boumerdassi. 2018. Analyzing Traces from a Google Data Center. In 14th International Wireless Communications & Mobile Computing Conference, (IWCMC). IEEE, 1167–1172. https://doi.org/10.1109/IWCMC.2018.8450304

[43]

Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, and Pengcheng Yin. 2017. DyNet: The Dynamic Neural Network Toolkit. arXiv preprint arXiv:1701.03980.

[44]

NVIDIA. [n.d.]. Multi-Process Service.

[45]

NVIDIA. 2021. NVIDIA A100 Tensor Core GPU. https://www.nvidia.com/en-us/data-center/a100/

[46]

NVIDIA. 2021. NVIDIA cuDNN. https://developer.nvidia.com/cudnn

[47]

NVIDIA. 2021. NVIDIA MULTI-INSTANCE GPU. https://www.nvidia.com/en-us/technologies/multi-instance-gpu/

[48]

Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, and Carole-Jean Wu. 2020. MLPerf Inference Benchmark. In 47th ACM/IEEE Annual International Symposium on Computer Architecture, (ISCA). IEEE, 446–459. https://doi.org/10.1109/ISCA45697.2020.00045

Digital Library

[49]

Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, Faster, Stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 6517–6525. https://doi.org/10.1109/CVPR.2017.690

[50]

Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. IEEE Computer Society, 4510–4520. https://doi.org/10.1109/CVPR.2018.00474

[51]

Haichen Shen, Lequn Chen, Yuchen Jin, Liangyu Zhao, Bingyu Kong, Matthai Philipose, Arvind Krishnamurthy, and Ravi Sundaram. 2019. Nexus: a GPU cluster engine for accelerating DNN-based video analysis. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP). ACM. https://doi.org/10.1145/3341301.3359658

Digital Library

[52]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE Computer Society, 1–9. https://doi.org/10.1109/CVPR.2015.7298594

[53]

Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA (Proceedings of Machine Learning Research, Vol. 97). PMLR, 6105–6114.

[54]

Lingjia Tang, Jason Mars, and Mary Lou Soffa. 2012. Compiling for niceness: mitigating contention for QoS in warehouse scale computers. In 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2012, San Jose, CA, USA, March 31 - April 04, 2012. ACM, 1–12. https://doi.org/10.1145/2259016.2259018

Digital Library

[55]

Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy transactions in multicore in-memory databases. In ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP ’13, Farmington, PA, USA, November 3-6, 2013, Michael Kaminsky and Mike Dahlin (Eds.). ACM, 18–32. https://doi.org/10.1145/2517349.2522713

Digital Library

[56]

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. CoRR, abs/1802.04730 (2018).

[57]

Wei-Jen Wang, Yue-Shan Chang, Win-Tsung Lo, and Yi-Kang Lee. 2013. Adaptive scheduling for parallel tasks with QoS satisfaction for hybrid cloud environments. J. Supercomput., 66, 2 (2013), 783–811. https://doi.org/10.1007/s11227-013-0890-2

Digital Library

[58]

Yang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo, Yunxin Liu, and Jingwen Leng. 2021. Dual-side Sparse Tensor Core. In 48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021, Valencia, Spain, June 14-18, 2021. IEEE, 1083–1095. https://doi.org/10.1109/ISCA52012.2021.00088

Digital Library

[59]

Yecheng Xiang and Hyoseung Kim. 2019. Pipelined Data-Parallel CPU/GPU Scheduling for Multi-DNN Real-Time Inference. In IEEE Real-Time Systems Symposium, RTSS 2019, Hong Kong, SAR, China, December 3-6, 2019. IEEE, 392–405. https://doi.org/10.1109/RTSS46320.2019.00042

[60]

Hailong Yang, Alex D. Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers. In The 40th Annual International Symposium on Computer Architecture (ISCA). https://doi.org/10.1145/2485922.2485974

Digital Library

[61]

Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, October 15-19, 2016. IEEE Computer Society, 20:1–20:12. https://doi.org/10.1109/MICRO.2016.7783723

[62]

Zhihui Zhang, Jingwen Leng, Lingxiao Ma, Youshan Miao, Chao Li, and Minyi Guo. 2020. Architectural Implications of Graph Neural Networks. IEEE Computer Architecture Letter, https://doi.org/10.1109/LCA.2020.2988991

[63]

Wenyi Zhao, Quan Chen, Hao Lin, Jianfeng Zhang, Jingwen Leng, Chao Li, Wenli Zheng, Li Li, and Minyi Guo. 2019. Themis: Predicting and Reining in Application-Level Slowdown on Spatial Multitasking GPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium, (IPDPS). IEEE, 653–663. https://doi.org/10.1109/IPDPS.2019.00074

[64]

Xia Zhao, Magnus Jahre, and Lieven Eeckhout. 2020. HSM: A Hybrid Slowdown Model for Multitasking GPUs. In Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 1371–1385. https://doi.org/10.1145/3373376.3378457

Digital Library

[65]

Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, and Ion Stoica. 2020. Ansor: Generating High-Performance Tensor Programs for Deep Learning. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI). https://doi.org/10.5555/3488766.3488815

Digital Library

[66]

Size Zheng, Yun Liang, Shuo Wang, Renze Chen, and Kaiwen Sheng. 2020. FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System. In Architectural Support for Programming Languages and Operating Systems, Lausanne (ASPLOS). https://doi.org/10.1145/3373376.3378508

Digital Library

[67]

Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2021. Graph Neural Networks: A Review of Methods and Applications. arxiv:1812.08434.

[68]

Yangjie Zhou, Mengtian Yang, Cong Guo, Jingwen Leng, Yun Liang, Quan Chen, Minyi Guo, and Yuhao Zhu. 2021. Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators. In 2021 IEEE International Symposium on Workload Characterization (IISWC). https://doi.org/10.1109/IISWC53511.2021.00029

Cited By

Blanco FRusso EPalesi MPatti DAscia GCatania VDe V(2024)A Deep Reinforcement Learning based Online Scheduling Policy for Deep Neural Network Multi-Tenant Multi-Accelerator SystemsProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3657319(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3657319
Ma LChen HShao EWang LChen QTan GMencagli GDazzi PLowenthal DBadia R(2024)ElasticRoom: Multi-Tenant DNN Inference Engine via Co-design with Resource-constrained Compilation and Strong Priority SchedulingProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658654(1-14)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658654
Guo CZhang RXu JLeng JLiu ZHuang ZGuo MWu HZhao SZhao JZhang KTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory StitchingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640423(450-466)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640423
Show More Cited By

Index Terms

VELTAIR: towards high-performance multi-tenant deep learning services via adaptive compilation and scheduling
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
    2. Other architectures
      1. Neural networks
2. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent algorithms

Recommendations

Multi-tenant, secure, load disseminated SaaS architecture
ICACT'10: Proceedings of the 12th international conference on Advanced communication technology

The availability of high speed internet has diversified the way we used to intermingle with each other. The emergence of social networks and interactive web applications has left a dent in existing software and service delivery models. Software vendors ...
Towards Dynamic Tenant Management for Microservice based Multi-Tenant SaaS Applications
ISEC '18: Proceedings of the 11th Innovations in Software Engineering Conference

In a multi-tenant cloud application, more than one heterogeneous tenants share the single instance of the application. It increases the degree of resource sharing among tenants and brings down the operational cost. In this work, we propose a ...
Multi-tenant intrusion detection system for public cloud (MTIDS)

Cloud computing is an innovative paradigm technology that is known for its versatility. It provides many creative services as requested, and it is both cost efficient and reliable. More specifically, cloud computing provides an opportunity for tenants ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

February 2022

1164 pages

ISBN:9781450392051

DOI:10.1145/3503222

General Chairs:
Babak Falsafi
EPFL, Switzerland
,
Michael Ferdman
Stony Brook University, USA
,
Program Chairs:
Shan Lu
University of Chicago, USA
,
Tom Wenisch
University of Michigan, USA / Google, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China (NSFC)
National Key R&D Program of China

Conference

ASPLOS '22

Sponsor:

ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

February 28 - March 4, 2022

Lausanne, Switzerland

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
1,546
Total Downloads

Downloads (Last 12 months)347
Downloads (Last 6 weeks)20

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Blanco FRusso EPalesi MPatti DAscia GCatania VDe V(2024)A Deep Reinforcement Learning based Online Scheduling Policy for Deep Neural Network Multi-Tenant Multi-Accelerator SystemsProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3657319(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3657319
Ma LChen HShao EWang LChen QTan GMencagli GDazzi PLowenthal DBadia R(2024)ElasticRoom: Multi-Tenant DNN Inference Engine via Co-design with Resource-constrained Compilation and Strong Priority SchedulingProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658654(1-14)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658654
Guo CZhang RXu JLeng JLiu ZHuang ZGuo MWu HZhao SZhao JZhang KTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory StitchingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640423(450-466)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640423
Yu FLi GZhao JCui HFeng XXue JTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Optimizing Dynamic-Shape Neural Networks on Accelerators via On-the-Fly Micro-Kernel PolymerizationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640390(797-812)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640390
Liu ZNi WLeng JFeng YGuo CChen QLi CGuo MZhu YTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core MappingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640360(549-565)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640360
Guan YQiu YLeng JYang FYu SLiu YFeng YZhu YZhou LLiang YZhang CLi CGuo MTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Amanda: Unified Instrumentation Framework for Deep Neural NetworksProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624864(1-18)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3617232.3624864
Ahmad SGuan HFriedman BWilliams TSitaraman RWoo TTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Proteus: A High-Throughput Inference-Serving System with Accuracy ScalingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624849(318-334)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3617232.3624849
Zhang ZZhao YLi HLiu J(2024)BCEdge: SLO-Aware DNN Inference Services With Adaptive Batch-Concurrent Scheduling on Edge DevicesIEEE Transactions on Network and Service Management10.1109/TNSM.2024.340970121:4(4131-4145)Online publication date: Aug-2024
https://doi.org/10.1109/TNSM.2024.3409701
Wang CWu H(2024)Energy Optimization for Federated Learning on Consumer Mobile Devices With Asynchronous SGD and Application Co-ExecutionIEEE Transactions on Mobile Computing10.1109/TMC.2024.337923623:11(10235-10250)Online publication date: Nov-2024
https://doi.org/10.1109/TMC.2024.3379236
Wang XWang YSu YZhang SMeng DHou R(2024)EnsGuard: A Novel Acceleration Framework for Adversarial Ensemble LearningIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339003143:10(3088-3101)Online publication date: Oct-2024
https://doi.org/10.1109/TCAD.2024.3390031
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents