Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3623278.3624753acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Open access

DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads

Published: 07 February 2024 Publication History

Abstract

Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone control involve dynamic behaviors in various granularity; task, model, and layers within a model. Such dynamic behaviors introduce new challenges to the system software in an ML system since the overall system load is not completely predictable, unlike traditional ML workloads. In addition, RTMM workloads require real-time processing, involve highly heterogeneous models, and target resource-constrained devices. Under such circumstances, developing an effective scheduler gains more importance to better utilize underlying hardware considering the unique characteristics of RTMM workloads. Therefore, we propose a new scheduler, DREAM, which effectively handles various dynamicity in RTMM workloads targeting multi-accelerator systems. DREAM quantifies the unique requirements for RTMM workloads and utilizes the quantified scores to drive scheduling decisions, considering the current system load and other inference jobs on different models and input frames. DREAM utilizes tunable parameters that provide fast and effective adaptivity to dynamic workload changes. In our evaluation of five scenarios of RTMM workload, DREAM reduces the overall UXCosT, which is an equivalent metric of the energy-delay product (EDP) for RTMM defined in the paper, by 32.2% and 50.0% in the geometric mean (up to 80.8% and 97.6%) compared to state-of-the-art baselines, which shows the efficacy of our scheduling methodology.

References

[1]
T.F. Abdelzaher and K.G. Shin. 1999. Combined task and message scheduling in distributed real-time systems. IEEE Transactions on Parallel and Distributed Systems (1999).
[2]
G. Bernat and A. Burns. 1997. Combining (/sub m//sup n/)-hard deadlines and dual priority scheduling. In Proceedings of the 18th IEEE Real-Time Systems Symposium (RTSS'97). IEEE, San Francisco, CA, USA, 46--57.
[3]
A. Burchard, J. Liebeherr, Yingfeng Oh, and S.H. Son. 1995. New strategies for assigning real-time tasks to multiprocessor systems. IEEE Trans. Comput. (1995).
[4]
Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once for All: Train One Network and Specialize it for Efficient Deployment. In International Conference on Learning Representations (ICLR 2020). https://openreview.net/pdf?id=HylxE1HKwS
[5]
Hyeonjoong Cho, Binoy Ravindran, and E. Douglas Jensen. 2006. An Optimal Real-Time Scheduling Algorithm for Multiprocessors. In IEEE International Real-Time Systems Symposium (RTSS 2006). IEEE, Rio de Janeiro, Brazil, 101--110.
[6]
Yujeong Choi and Minsoo Rhu. 2020. Prema: A predictive multi-task scheduling algorithm for preemptible neural processing units. In Proceedings of 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA 2020). IEEE, San Diego, CA, USA, 220--233.
[7]
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proceedings of the ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA 2015). IEEE, Portland, OR, USA, 92--104.
[8]
Soroush Ghodrati, Byung Hoon Ahn, Joon Kyung Kim, Sean Kinzer, Brahmendra Reddy Yatham, Navateja Alla, Hardik Sharma, Mohammad Alian, Eiman Ebrahimi, Nam Sung Kim, et al. 2020. Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks. In Proceedings of The 53rd IEEE/ACM International Symposium on Microarchitecture (MICRO 2020). IEEE, Athens, Greece, 681--697.
[9]
Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, and Jonathan Mace. 2020. Serving {DNNs} like clockwork: Performance predictability from the bottom up. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, Banff, Canada, 443--462.
[10]
Lei He, Guanghui Wang, and Zhanyi Hu. 2018. Learning depth from single images with deep neural network embedding focal length. IEEE Transactions on Image Processing 27, 9 (2018), 4676--4689.
[11]
Sheng-Chun Kao and Tushar Krishna. 2022. Magma: An optimization framework for mapping multiple dnns on multiple accelerator cores. In Proceedings of the 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA 2022). IEEE, IEEE, Seoul, South Korea, 814--830.
[12]
Seah Kim, Hasan Genc, Vadim Vadimovich Nikiforov, Krste Asanović, Borivoje Nikolić, and Yakun Sophia Shao. 2023. MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks. In Proceedings of the 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA 2023). IEEE, Montreal, Canada, 828--841.
[13]
Gilad Koren and Dennis Shasha. 1995. Skip-over: Algorithms and complexity for overloaded systems that allow skips. In Proceedings of the 16th IEEE Real-Time Systems Symposium (RTSS'95). IEEE, Pisa, Italy, 110--117.
[14]
Adarsh Kumar Kosta, Malik Aqeel Anwar, Priyadarshini Panda, Arijit Raychowdhury, and Kaushik Roy. 2022. RAPID-RL: A Reconfigurable Architecture with Preemptive-Exits for Efficient Deep-Reinforcement Learning. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA 2022). IEEE, Philadelphia, PA, USA, 7492--7498.
[15]
Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. 2019. Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2019). IEEE, Columbus, OH, USA, 754--768.
[16]
Hyoukjun Kwon, Liangzhen Lai, Michael Pellauer, Tushar Krishna, Yu-Hsin Chen, and Vikas Chandra. 2021. Heterogeneous dataflow accelerators for multi-DNN workloads. In Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA 2021). IEEE, Seoul, South Korea, 71--83.
[17]
Hyoukjun Kwon, Krishnakumar Nair, Jamin Seo, Jason Yik, Debabrata Mohapatra, Dongyuan Zhan, Jinook Song, Peter Capak, Peizhao Zhang, Peter Vajda, Colby Banbury, Mark Mazumder, Liangzhen Lai, Ashish Sirasao, Tushar Krishna, Harshit Khaitan, Vikas Chandra, and Vijay Janapa Reddi. 2023. XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse. In Proceedings of the 5th Machine Learning and Systems Conference (MLSys 2023) (MLSys 2023). Miami, FL, USA. https://proceedings.mlsys.org/paper_files/paper/2023/hash/baf570e47e7f4e314a9ffb72c4a5459c-Abstract-mlsys2023.html
[18]
Colin Lea, Michael D Flynn, Rene Vidal, Austin Reiter, and Gregory D Hager. 2017. Temporal convolutional networks for action segmentation and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). Honolulu, HI, USA, 156--165. https://openaccess.thecvf.com/content_cvpr_2017/html/Lea_Temporal_Convolutional_Networks_CVPR_2017_paper.html
[19]
J.P. Lehoczky and S. Ramos-Thuel. 1992. An optimal algorithm for scheduling soft-aperiodic tasks in fixed-priority preemptive systems. In [1992] Proceedings Real-Time Systems Symposium (RTSS'92). IEEE, Phoenix, AZ, USA, 110--123.
[20]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. SSD: Single shot multibox detector. In Proceedings of the 14th European Conference (ECCV 2016). Amsterdam, the Netherlands, Portland, OR, USA, 21--37.
[21]
Zihan Liu, Jingwen Leng, Zhihui Zhang, Quan Chen, Chao Li, and Minyi Guo. 2022. VELTAIR: towards high-performance multi-tenant deep learning services via adaptive compilation and scheduling. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2022). Association for Computing Machinery (ACM), Lausanne, Switzerland, 388--401.
[22]
Meysam Madadi, Sergio Escalera, Xavier Baró, and Jordi Gonzàlez. 2022. End-to-end global to local convolutional neural network learning for hand pose recovery in depth data. IET Computer Vision 16, 1 (2022), 50--66.
[23]
Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2017. VoxCeleb: A Large-Scale Speaker Identification Dataset. In Procedings of the Interspeech 2017 (Interspeech 2017). Stockholm, Sweden, 2616--2620.
[24]
NVIDIA. 2017. NVDLA Deep Learning Accelerator. Retrived from http://nvdla.org.
[25]
Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W Keckler, and Joel Emer. 2019. Timeloop: A systematic approach to dnn accelerator evaluation. In 2019 IEEE international symposium on performance analysis of systems and software (ISPASS 2019). IEEE, Madison, WI, USA, 304--315.
[26]
K. Ramamritham. 1990. Allocation and scheduling of complex periodic tasks. In Proceedings of the 10th International Conference on Distributed Computing Systems (ICDCS 1990). IEEE, Los Alamitos, CA, USA, 108--109.
[27]
Krithi Ramamritham. 1995. Allocation and scheduling of precedence-related periodic tasks. IEEE Transactions on Parallel and Distributed Systems 6, 4 (1995), 412--420.
[28]
Parameswaran Ramanathan. 1999. Overload management in real-time control applications using (m, k)-firm guarantee. IEEE Transactions on parallel and distributed systems 10, 6 (1999), 549--559.
[29]
Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou. 2020. Mlperf inference benchmark. In Proceddings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA 2020). IEEE, IEEE, Valencia, Spain, 446--459.
[30]
Ismael Ripoll, Alfons Crespo, and Ana Garcia-Fornes. 1997. An optimal algorithm for scheduling soft aperiodic tasks in dynamic-priority preemptive systems. IEEE Transactions on Software Engineering 23, 6 (1997), 388--400.
[31]
Haichen Shen, Lequn Chen, Yuchen Jin, Liangyu Zhao, Bingyu Kong, Matthai Philipose, Arvind Krishnamurthy, and Ravi Sundaram. 2019. Nexus: A GPU Cluster Engine for Accelerating DNN-Based Video Analysis. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP'19). ACM, Huntsville, Ontario, Canada, 322--337.
[32]
Nikolai Smolyanskiy, Alexey Kamenev, Jeffrey Smith, and Stan Birchfield. 2017. Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017). IEEE, IEEE, Vancouver, BC, Canada, 4241--4247.
[33]
John A. Stankovic and Krithi Ramamritham. 1989. The Spring kernel: A new paradigm for real-time operating systems. ACM SIGOPS Operating Systems Review 23, 3 (1989), 54--71.
[34]
Jay K. Strosnider, John P. Lehoczky, and Lui Sha. 1995. The deferrable server algorithm for enhanced aperiodic responsiveness in hard real-time environments. IEEE Trans. Comput. 44, 1 (Jan. 1995), 73--91.
[35]
Raphael Tang and Jimmy Lin. 2018. Deep residual learning for small-footprint keyword spotting. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018). IEEE, IEEE, Calgary, AB, Canada, 5484--5488.
[36]
Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. 2016. Branchynet: Fast inference via early exiting from deep neural networks. In Proceedings of the 23rd international conference on pattern recognition (ICPR 2016). IEEE, Cancun, Mexico, 2464--2469.
[37]
Yurun Tian, Xin Yu, Bin Fan, Fuchao Wu, Huub Heijnen, and Vassileios Balntas. 2019. Sosnet: Second order similarity regularization for local descriptor learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019). IEEE, Long Beach, CA, USA, 11016--11025.
[38]
Jianming Tong, Yangyu Chen, Yue Pan, Abhimanyu Bambhaniya, Alind Khare, Taekyung Heo, Alexey Tumanov, and Tushar Krishna. 2022. Enabling Real-time DNN Switching via Weight-Sharing. In The 2nd Architecture, Compiler, and System Support for Multi-model DNN Workloads Workshop (ACSMD 2022). New York, NY, USA. https://research.facebook.com/file/703126461319360/enabling-real-time-dnn-switching-via-weight-sharing.pdf
[39]
Mihaela-Andreea Vasile, Florin Pop, Radu-Ioan Tutueanu, Valentin Cristea, and Joanna Kołodziej. 2015. Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Generation Computer Systems 51 (Oct. 2015), 61--71.
[40]
Andreas Veit and Serge Belongie. 2018. Convolutional networks with adaptive inference graphs. In Proceedings of the European Conference on Computer Vision (ECCV 2018). IEEE, Munich, Germany, 3--18.
[41]
Dilin Wang, Chengyue Gong, Meng Li, Qiang Liu, and Vikas Chandra. 2021. AlphaNet: Improved Training of Supernets with Alpha-Divergence. In Proceedings of the 38th International Conference on Machine Learning (ICML 2021). PMLR, 10760--10771. https://proceedings.mlr.press/v139/wang21i.html
[42]
Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E Gonzalez. 2018. Skipnet: Learning dynamic routing in convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV 2018). Springer, Munich, Germany, 409--424.
[43]
Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR 2019). IEEE, Long Beach, CA, USA, 10734--10742.
[44]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. (2016). arXiv:arXiv:1609.08144
[45]
Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S Davis, Kristen Grauman, and Rogerio Feris. 2018. Blockdrop: Dynamic inference paths in residual networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2018). IEEE, Salt Lake City, Utah, 8817--8826.
[46]
Jia Xu and David Lorge Parnas. 1990. Scheduling processes with release times, deadlines, precedence and exclusion relations. IEEE Transactions on software engineering 16, 3 (March 1990), 360--369.
[47]
Linjie Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2015. A large-scale car dataset for fine-grained categorization and verification. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2015). IEEE, Boston, MA, USA, 3973--3981.
[48]
Haoran You, Cheng Wan, Yang Zhao, Zhongzhi Yu, Yonggan Fu, Jiayi Yuan, Shang Wu, Shunyao Zhang, Yongan Zhang, Chaojian Li, et al. 2022. EyeCoD: eye tracking system acceleration via flatcam-based algorithm & accelerator co-design. In Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA 2022). ACM, New York, NY, USA, 610--622.
[49]
Haitao Yuan, Jing Bi, Wei Tan, and Bo Hu Li. 2016. Temporal task scheduling with constrained service delay for profit maximization in hybrid clouds. IEEE Transactions on Automation Science and Engineering 14, 1 (Feb. 2016), 337--348.
[50]
Haitao Yuan, Jing Bi, Wei Tan, MengChu Zhou, Bo Hu Li, and Jianqiang Li. 2016. TTSA: An effective scheduling approach for delay bounded tasks in hybrid clouds. IEEE transactions on cybernetics 47, 11 (July 2016), 3658--3668.
[51]
Qi Zhang, Mohamed Faten Zhani, Raouf Boutaba, and Joseph L Hellerstein. 2014. Dynamic heterogeneity-aware resource provisioning in the cloud. IEEE transactions on cloud computing 2, 1 (2014), 14--28.

Cited By

View all
  • (2024)Elastic Execution of Multi-Tenant DNNs on Heterogeneous Edge MPSoCs2024 IEEE/ACM Symposium on Edge Computing (SEC)10.1109/SEC62691.2024.00029(279-291)Online publication date: 4-Dec-2024
  • (2024)SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00049(565-579)Online publication date: 2-Nov-2024

Index Terms

  1. DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4
      March 2023
      430 pages
      ISBN:9798400703942
      DOI:10.1145/3623278
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 February 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. scheduler
      2. AR/VR
      3. multi-model ML
      4. hardware-software co-design

      Qualifiers

      • Research-article

      Conference

      ASPLOS '23

      Acceptance Rates

      Overall Acceptance Rate 535 of 2,713 submissions, 20%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1,383
      • Downloads (Last 6 weeks)192
      Reflects downloads up to 11 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Elastic Execution of Multi-Tenant DNNs on Heterogeneous Edge MPSoCs2024 IEEE/ACM Symposium on Edge Computing (SEC)10.1109/SEC62691.2024.00029(279-291)Online publication date: 4-Dec-2024
      • (2024)SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00049(565-579)Online publication date: 2-Nov-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media