research-article

Open access

DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads

Authors:

Vikas ChandraAuthors Info & Claims

ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4

Pages 73 - 86

https://doi.org/10.1145/3623278.3624753

Published: 07 February 2024 Publication History

Abstract

Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone control involve dynamic behaviors in various granularity; task, model, and layers within a model. Such dynamic behaviors introduce new challenges to the system software in an ML system since the overall system load is not completely predictable, unlike traditional ML workloads. In addition, RTMM workloads require real-time processing, involve highly heterogeneous models, and target resource-constrained devices. Under such circumstances, developing an effective scheduler gains more importance to better utilize underlying hardware considering the unique characteristics of RTMM workloads. Therefore, we propose a new scheduler, DREAM, which effectively handles various dynamicity in RTMM workloads targeting multi-accelerator systems. DREAM quantifies the unique requirements for RTMM workloads and utilizes the quantified scores to drive scheduling decisions, considering the current system load and other inference jobs on different models and input frames. DREAM utilizes tunable parameters that provide fast and effective adaptivity to dynamic workload changes. In our evaluation of five scenarios of RTMM workload, DREAM reduces the overall UXCosT, which is an equivalent metric of the energy-delay product (EDP) for RTMM defined in the paper, by 32.2% and 50.0% in the geometric mean (up to 80.8% and 97.6%) compared to state-of-the-art baselines, which shows the efficacy of our scheduling methodology.

References

[1]

T.F. Abdelzaher and K.G. Shin. 1999. Combined task and message scheduling in distributed real-time systems. IEEE Transactions on Parallel and Distributed Systems (1999).

[2]

G. Bernat and A. Burns. 1997. Combining (/sub m//sup n/)-hard deadlines and dual priority scheduling. In Proceedings of the 18th IEEE Real-Time Systems Symposium (RTSS'97). IEEE, San Francisco, CA, USA, 46--57.

[3]

A. Burchard, J. Liebeherr, Yingfeng Oh, and S.H. Son. 1995. New strategies for assigning real-time tasks to multiprocessor systems. IEEE Trans. Comput. (1995).

[4]

Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once for All: Train One Network and Specialize it for Efficient Deployment. In International Conference on Learning Representations (ICLR 2020). https://openreview.net/pdf?id=HylxE1HKwS

[5]

Hyeonjoong Cho, Binoy Ravindran, and E. Douglas Jensen. 2006. An Optimal Real-Time Scheduling Algorithm for Multiprocessors. In IEEE International Real-Time Systems Symposium (RTSS 2006). IEEE, Rio de Janeiro, Brazil, 101--110.

Digital Library

[6]

Yujeong Choi and Minsoo Rhu. 2020. Prema: A predictive multi-task scheduling algorithm for preemptible neural processing units. In Proceedings of 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA 2020). IEEE, San Diego, CA, USA, 220--233.

[7]

Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proceedings of the ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA 2015). IEEE, Portland, OR, USA, 92--104.

Digital Library

[8]

Soroush Ghodrati, Byung Hoon Ahn, Joon Kyung Kim, Sean Kinzer, Brahmendra Reddy Yatham, Navateja Alla, Hardik Sharma, Mohammad Alian, Eiman Ebrahimi, Nam Sung Kim, et al. 2020. Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks. In Proceedings of The 53rd IEEE/ACM International Symposium on Microarchitecture (MICRO 2020). IEEE, Athens, Greece, 681--697.

[9]

Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, and Jonathan Mace. 2020. Serving {DNNs} like clockwork: Performance predictability from the bottom up. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, Banff, Canada, 443--462.

Digital Library

[10]

Lei He, Guanghui Wang, and Zhanyi Hu. 2018. Learning depth from single images with deep neural network embedding focal length. IEEE Transactions on Image Processing 27, 9 (2018), 4676--4689.

[11]

Sheng-Chun Kao and Tushar Krishna. 2022. Magma: An optimization framework for mapping multiple dnns on multiple accelerator cores. In Proceedings of the 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA 2022). IEEE, IEEE, Seoul, South Korea, 814--830.

[12]

Seah Kim, Hasan Genc, Vadim Vadimovich Nikiforov, Krste Asanović, Borivoje Nikolić, and Yakun Sophia Shao. 2023. MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks. In Proceedings of the 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA 2023). IEEE, Montreal, Canada, 828--841.

[13]

Gilad Koren and Dennis Shasha. 1995. Skip-over: Algorithms and complexity for overloaded systems that allow skips. In Proceedings of the 16th IEEE Real-Time Systems Symposium (RTSS'95). IEEE, Pisa, Italy, 110--117.

[14]

Adarsh Kumar Kosta, Malik Aqeel Anwar, Priyadarshini Panda, Arijit Raychowdhury, and Kaushik Roy. 2022. RAPID-RL: A Reconfigurable Architecture with Preemptive-Exits for Efficient Deep-Reinforcement Learning. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA 2022). IEEE, Philadelphia, PA, USA, 7492--7498.

Digital Library

[15]

Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. 2019. Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2019). IEEE, Columbus, OH, USA, 754--768.

Digital Library

[16]

Hyoukjun Kwon, Liangzhen Lai, Michael Pellauer, Tushar Krishna, Yu-Hsin Chen, and Vikas Chandra. 2021. Heterogeneous dataflow accelerators for multi-DNN workloads. In Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA 2021). IEEE, Seoul, South Korea, 71--83.

[17]

Hyoukjun Kwon, Krishnakumar Nair, Jamin Seo, Jason Yik, Debabrata Mohapatra, Dongyuan Zhan, Jinook Song, Peter Capak, Peizhao Zhang, Peter Vajda, Colby Banbury, Mark Mazumder, Liangzhen Lai, Ashish Sirasao, Tushar Krishna, Harshit Khaitan, Vikas Chandra, and Vijay Janapa Reddi. 2023. XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse. In Proceedings of the 5th Machine Learning and Systems Conference (MLSys 2023) (MLSys 2023). Miami, FL, USA. https://proceedings.mlsys.org/paper_files/paper/2023/hash/baf570e47e7f4e314a9ffb72c4a5459c-Abstract-mlsys2023.html

[18]

Colin Lea, Michael D Flynn, Rene Vidal, Austin Reiter, and Gregory D Hager. 2017. Temporal convolutional networks for action segmentation and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). Honolulu, HI, USA, 156--165. https://openaccess.thecvf.com/content_cvpr_2017/html/Lea_Temporal_Convolutional_Networks_CVPR_2017_paper.html

[19]

J.P. Lehoczky and S. Ramos-Thuel. 1992. An optimal algorithm for scheduling soft-aperiodic tasks in fixed-priority preemptive systems. In [1992] Proceedings Real-Time Systems Symposium (RTSS'92). IEEE, Phoenix, AZ, USA, 110--123.

[20]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. SSD: Single shot multibox detector. In Proceedings of the 14th European Conference (ECCV 2016). Amsterdam, the Netherlands, Portland, OR, USA, 21--37.

[21]

Zihan Liu, Jingwen Leng, Zhihui Zhang, Quan Chen, Chao Li, and Minyi Guo. 2022. VELTAIR: towards high-performance multi-tenant deep learning services via adaptive compilation and scheduling. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2022). Association for Computing Machinery (ACM), Lausanne, Switzerland, 388--401.

Digital Library

[22]

Meysam Madadi, Sergio Escalera, Xavier Baró, and Jordi Gonzàlez. 2022. End-to-end global to local convolutional neural network learning for hand pose recovery in depth data. IET Computer Vision 16, 1 (2022), 50--66.

Digital Library

[23]

Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2017. VoxCeleb: A Large-Scale Speaker Identification Dataset. In Procedings of the Interspeech 2017 (Interspeech 2017). Stockholm, Sweden, 2616--2620.

[24]

NVIDIA. 2017. NVDLA Deep Learning Accelerator. Retrived from http://nvdla.org.

[25]

Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W Keckler, and Joel Emer. 2019. Timeloop: A systematic approach to dnn accelerator evaluation. In 2019 IEEE international symposium on performance analysis of systems and software (ISPASS 2019). IEEE, Madison, WI, USA, 304--315.

[26]

K. Ramamritham. 1990. Allocation and scheduling of complex periodic tasks. In Proceedings of the 10th International Conference on Distributed Computing Systems (ICDCS 1990). IEEE, Los Alamitos, CA, USA, 108--109.

[27]

Krithi Ramamritham. 1995. Allocation and scheduling of precedence-related periodic tasks. IEEE Transactions on Parallel and Distributed Systems 6, 4 (1995), 412--420.

Digital Library

[28]

Parameswaran Ramanathan. 1999. Overload management in real-time control applications using (m, k)-firm guarantee. IEEE Transactions on parallel and distributed systems 10, 6 (1999), 549--559.

Digital Library

[29]

Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou. 2020. Mlperf inference benchmark. In Proceddings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA 2020). IEEE, IEEE, Valencia, Spain, 446--459.

Digital Library

[30]

Ismael Ripoll, Alfons Crespo, and Ana Garcia-Fornes. 1997. An optimal algorithm for scheduling soft aperiodic tasks in dynamic-priority preemptive systems. IEEE Transactions on Software Engineering 23, 6 (1997), 388--400.

Digital Library

[31]

Haichen Shen, Lequn Chen, Yuchen Jin, Liangyu Zhao, Bingyu Kong, Matthai Philipose, Arvind Krishnamurthy, and Ravi Sundaram. 2019. Nexus: A GPU Cluster Engine for Accelerating DNN-Based Video Analysis. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP'19). ACM, Huntsville, Ontario, Canada, 322--337.

Digital Library

[32]

Nikolai Smolyanskiy, Alexey Kamenev, Jeffrey Smith, and Stan Birchfield. 2017. Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017). IEEE, IEEE, Vancouver, BC, Canada, 4241--4247.

Digital Library

[33]

John A. Stankovic and Krithi Ramamritham. 1989. The Spring kernel: A new paradigm for real-time operating systems. ACM SIGOPS Operating Systems Review 23, 3 (1989), 54--71.

Digital Library

[34]

Jay K. Strosnider, John P. Lehoczky, and Lui Sha. 1995. The deferrable server algorithm for enhanced aperiodic responsiveness in hard real-time environments. IEEE Trans. Comput. 44, 1 (Jan. 1995), 73--91.

Digital Library

[35]

Raphael Tang and Jimmy Lin. 2018. Deep residual learning for small-footprint keyword spotting. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018). IEEE, IEEE, Calgary, AB, Canada, 5484--5488.

Digital Library

[36]

Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. 2016. Branchynet: Fast inference via early exiting from deep neural networks. In Proceedings of the 23rd international conference on pattern recognition (ICPR 2016). IEEE, Cancun, Mexico, 2464--2469.

[37]

Yurun Tian, Xin Yu, Bin Fan, Fuchao Wu, Huub Heijnen, and Vassileios Balntas. 2019. Sosnet: Second order similarity regularization for local descriptor learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019). IEEE, Long Beach, CA, USA, 11016--11025.

[38]

Jianming Tong, Yangyu Chen, Yue Pan, Abhimanyu Bambhaniya, Alind Khare, Taekyung Heo, Alexey Tumanov, and Tushar Krishna. 2022. Enabling Real-time DNN Switching via Weight-Sharing. In The 2nd Architecture, Compiler, and System Support for Multi-model DNN Workloads Workshop (ACSMD 2022). New York, NY, USA. https://research.facebook.com/file/703126461319360/enabling-real-time-dnn-switching-via-weight-sharing.pdf

[39]

Mihaela-Andreea Vasile, Florin Pop, Radu-Ioan Tutueanu, Valentin Cristea, and Joanna Kołodziej. 2015. Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Generation Computer Systems 51 (Oct. 2015), 61--71.

Digital Library

[40]

Andreas Veit and Serge Belongie. 2018. Convolutional networks with adaptive inference graphs. In Proceedings of the European Conference on Computer Vision (ECCV 2018). IEEE, Munich, Germany, 3--18.

Digital Library

[41]

Dilin Wang, Chengyue Gong, Meng Li, Qiang Liu, and Vikas Chandra. 2021. AlphaNet: Improved Training of Supernets with Alpha-Divergence. In Proceedings of the 38th International Conference on Machine Learning (ICML 2021). PMLR, 10760--10771. https://proceedings.mlr.press/v139/wang21i.html

[42]

Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E Gonzalez. 2018. Skipnet: Learning dynamic routing in convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV 2018). Springer, Munich, Germany, 409--424.

Digital Library

[43]

Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR 2019). IEEE, Long Beach, CA, USA, 10734--10742.

[44]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. (2016). arXiv:arXiv:1609.08144

[45]

Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S Davis, Kristen Grauman, and Rogerio Feris. 2018. Blockdrop: Dynamic inference paths in residual networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2018). IEEE, Salt Lake City, Utah, 8817--8826.

[46]

Jia Xu and David Lorge Parnas. 1990. Scheduling processes with release times, deadlines, precedence and exclusion relations. IEEE Transactions on software engineering 16, 3 (March 1990), 360--369.

Digital Library

[47]

Linjie Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2015. A large-scale car dataset for fine-grained categorization and verification. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2015). IEEE, Boston, MA, USA, 3973--3981.

[48]

Haoran You, Cheng Wan, Yang Zhao, Zhongzhi Yu, Yonggan Fu, Jiayi Yuan, Shang Wu, Shunyao Zhang, Yongan Zhang, Chaojian Li, et al. 2022. EyeCoD: eye tracking system acceleration via flatcam-based algorithm & accelerator co-design. In Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA 2022). ACM, New York, NY, USA, 610--622.

Digital Library

[49]

Haitao Yuan, Jing Bi, Wei Tan, and Bo Hu Li. 2016. Temporal task scheduling with constrained service delay for profit maximization in hybrid clouds. IEEE Transactions on Automation Science and Engineering 14, 1 (Feb. 2016), 337--348.

[50]

Haitao Yuan, Jing Bi, Wei Tan, MengChu Zhou, Bo Hu Li, and Jianqiang Li. 2016. TTSA: An effective scheduling approach for delay bounded tasks in hybrid clouds. IEEE transactions on cybernetics 47, 11 (July 2016), 3658--3668.

[51]

Qi Zhang, Mohamed Faten Zhani, Raouf Boutaba, and Joseph L Hellerstein. 2014. Dynamic heterogeneity-aware resource provisioning in the cloud. IEEE transactions on cloud computing 2, 1 (2014), 14--28.

Index Terms

DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
    2. Other architectures
      1. Heterogeneous (hybrid) systems

Recommendations

Dynamic Real-Time Scheduler for Large-Scale MPSoCs
GLSVLSI '16: Proceedings of the 26th edition on Great Lakes Symposium on VLSI

Large-scale MPSoCs requires a scalable and dynamic real-time (RT) task scheduler, able to handle non-deterministic computational behaviors. Current proposals for MPSoCs have limitations, as lack of scalability, complex static steps, validation with ...
Designing a New Real-Time Kernel with a Hybrid Scheduler
ICESS '08: Proceedings of the 2008 International Conference on Embedded Software and Systems

Traditional embedded systems employ a time-triggered scheduler or an event-triggered scheduler, however, the truth is that time-triggered scheduling architecture is more dependable but lack of responsiveness to external events and event-triggered ...
Dynamic Scheduling of Hard Real-Time Tasks and Real-Time Threads

The authors investigate the dynamic scheduling of tasks with well-defined timing constraints. They present a dynamic uniprocessor scheduling algorithm with an O(n log n) worst-case complexity. The preemptive scheduling performed by the algorithm is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4

March 2023

430 pages

ISBN:9798400703942

DOI:10.1145/3623278

Chair:
Tor Aamodt,
Program Chair:
Michael M Swift,
Program Co-chair:
Natalie Enright Jerger

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASPLOS '23

Sponsor:

ASPLOS '23: 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4

March 25 - 29, 2023

BC, Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
689
Total Downloads

Downloads (Last 12 months)689
Downloads (Last 6 weeks)123

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents