research-article

Idempotence-Based Preemptive GPU Kernel Scheduling for Embedded Systems

Authors:

Euiseong SeoAuthors Info & Claims

IEEE Transactions on Computers, Volume 70, Issue 3

Pages 332 - 346

https://doi.org/10.1109/TC.2020.2988251

Published: 01 March 2021 Publication History

Abstract

Mission-critical embedded systems simultaneously run multiple graphics-processing-unit (GPU) computing tasks with different criticality and timeliness requirements. Considerable research effort has been dedicated to supporting the preemptive priority scheduling of GPU kernels. However, hardware-supported preemption leads to lengthy scheduling delays and complicated designs, and most software approaches depend on the voluntary yielding of GPU resources from restructured kernels. We propose a preemptive GPU kernel scheduling scheme that harnesses the idempotence property of kernels. The proposed scheme distinguishes idempotent kernels through static source code analysis. If a kernel is not idempotent, then GPU kernels are transactionized at the operating system (OS) level. Both idempotent and transactionized kernels can be aborted at any point during their execution and rolled back to their initial state for reexecution. Therefore, low-priority kernel instances can be preempted for high-priority kernel instances and reexecuted after the GPU becomes available again. Our evaluation using the Rodinia benchmark suite showed that the proposed approach limits the preemption delay to 18 <inline-formula><tex-math notation="LaTeX">$\mu$</tex-math><alternatives><mml:math><mml:mi>μ</mml:mi></mml:math><inline-graphic xlink:href="seo-ieq1-2988251.gif"/></alternatives></inline-formula>s in the 99.9th percentile, with an average delay in execution time of less than 10 percent for high-priority tasks under a heavy load in most cases.

References

[1]

S. Kato, S. Brandt, Y. Ishikawa, and R. Rajkumar, “Operating systems challenges for GPU resource management,” in Proc. Int. Workshop Operating Syst. Platforms Embedded Real-Time Appl., 2011, pp. 23–32.

[2]

J. Kim, R. R. Rajkumar, and S. Kato,“Towards adaptive GPU resource management for embedded real-time systems,” ACM SIGBED Rev., vol. 10, no. 1, pp. 14–17, 2013.

Digital Library

[3]

I. Tanasic, I. Gelado, J. Cabezas, A. Ramirez, N. Navarro, and M. Valero, “Enabling preemptive multiprogramming on GPUs,” in Proc. 41st Annu. ACM/IEEE Int. Symp. Comput. Archit., 2014, pp. 193–204.

[4]

J. Park, J. Kyu, Y. Park, and S. Mahlke, “Chimera: Collaborative preemption for multitasking on a shared GPU,” in Proc. 20th Int. Conf. Archit. Support Program. Lang. Operating Syst., 2015, pp. 593–606.

[5]

Z. Lin, L. Nyland, and H. Zhou, “Enabling efficient preemption for SIMT architectures with lightweight context switching,” in Proc. Int. Conf. High Perform. Comput. Netw. Storage Anal., 2016, pp. 898–908.

[6]

G. Chen, Y. Zhao, X. Shen, and H. Zhou, “EffiSha: A software framework for enabling effficient preemptive scheduling of GPU,” in Proc. 22nd ACM SIGPLAN Symp. Princ. Practice Parallel Program., 2017, pp. 3–16.

[7]

B. Wu, X. Liu, X. Zhou, and C. Jiang, “FLEP: Enabling flexible and efficient preemption on GPUs,” in Proc. 22nd Int. Conf. Archit. Support Program. Lang. Operating Syst., 2017, pp. 483–496.

[8]

H. Lee, J. Roh, and E. Seo, “A GPU kernel transactionization scheme for preemptive priority scheduling,” in Proc. IEEE Real-Time Embedded Technol. Appl. Symp., 2018, pp. 202–213.

[9]

Khronos Group, “OpenCL,” 2009. [Online]. Available: https://www.khronos.org/opencl

[10]

P. Muyan-Özçelik and J. D. Owens, “Multitasking real-time embedded GPU computing tasks,” in Proc. 7th Int. Workshop Program. Models Appl. Multicores Manycores, 2016, pp. 78–87.

[11]

S. Azmat, L. Wills, and S. Wills, “Accelerating adaptive background modeling on low-power integrated GPUs,” in Proc. 41st Int. Conf. Parallel Process. Workshops, 2012, pp. 568–573.

[12]

A. Maghazeh, U. D. Bordoloi, P. Eles, and Z. Peng, “General purpose computing on low-power embedded GPUs: Has it come of age?” in Proc. Int. Conf. Embedded Comput. Syst., Archite. Model. Simul., 2013, pp. 1–10.

[13]

Heterogeneous System Architecture Foundation, “Heterogeneous system architecture,” 2016. [Online]. Available: http://hsafoundation.com

[14]

R. Koo and S. Toueg, “Checkpointing and rollback-recovery for distributed systems,” IEEE Trans. Softw. Eng., vol. SE-13, no. 1, pp. 23–31, Jan. 1987.

Digital Library

[15]

J. Duell, P. Hargrove, and E. Roman, “The design and implementation of berkeley lab's Linux checkpoint/restart,” 2002.

[16]

K. Maeng and B. Lucia, “Adaptive dynamic checkpointing for safe efficient intermittent computing,” in Proc. 13th USENIX Symp. Operating Syst. Des. Implementation, 2018, pp. 129–144.

[17]

G. Luan, Y. Bai, C. Wang, J. Zeng, and Q. Chen, “An efficient checkpoint and recovery mechanism for real-time embedded systems,” in Proc. IEEE Int. Conf. Parallel Distrib. Process. Appl. Ubiquitous Comput. Commun. Big Data Cloud Comput. Social Comput. Netw. Sustain. Comput. Commun., 2018, pp. 824–831.

[18]

M. De Kruijf and K. Sankaralingam, “Idempotent processor architecture,” in Proc. 44th Annu. IEEE/ACM Int. Symp. Microarchit., 2011, pp. 140–151.

[19]

M. De Kruijf, K. Sankaralingam, and S. Jha, “Static analysis and compiler design for idempotent processing,” in Proc. 33rd ACM SIGPLAN Conf. Program. Lang. Des. Implementation, 2012, pp. 475–486.

[20]

M. De Kruijf and K. Sankaralingam, “Idempotent code generation: Implementation, analysis, and evaluation,” in Proc. IEEE/ACM Int. Symp. Code Gener. Optim., 2013, pp. 1–12.

[21]

Q. Liu, C. Jung, D. Lee, and D. Tiwari, “Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery,” in Proc. Int. Conf. High Perform. Comput. Netw. Storage Anal., 2016, pp. 228–239.

[22]

M. Alshboul, H. Elnawawy, R. Elkhouly, K. Kimura, J. Tuck, and Y. Solihin, “Efficient checkpointing with recompute scheme for non-volatile main memory,” ACM Trans. Archit. Code Optim., vol. 16, no. 2, 2019, Art. no.

Digital Library

[23]

Q. Liu, J. Izraelevitz, S. K. Lee, M. L. Scott, S. H. Noh, and C. Jung, “iDO: Compiler-directed failure atomicity for nonvolatile memory,” in Proc. 51st Annu. IEEE/ACM Int. Symp. Microarchit., 2018, pp. 258–270.

[24]

J. Menon, M. De Kruijf, and K. Sankaralingam, “iGPU: Exception support and speculative execution on GPUs,” in Proc. 39th Annu. Int. Symp. Comput. Archit., 2012, pp. 72–83.

[25]

Z. Lin, M. Alshboul, Y. Solihin, and H. Zhou, “Exploring memory persistency models for GPUs,” in Proc. IEEE 28th Int. Conf. Parallel Archit. Compilation Techn., 2019, pp. 311–323.

[26]

T. Parr, The Definitive ANTLR 4 Reference. Pragmatic Bookshelf, 2013.

Digital Library

[27]

NVIDIA, “NVIDIA GeForce GTX 1080,” White Paper, 2016.

[28]

C. Basaran and K. Kang, “Supporting preemptive task executions and memory copies in GPGPUs,” in Proc. 24th Euromicro Conf. Real-Time Syst., 2012, pp. 287–296.

[29]

Y. Suzuki, H. Yamada, S. Kato, and K. Kono, “Towards multi-tenant GPGPU: Event-driven programming model for system-wide scheduling on shared GPUs,” in Proc. Workshop Multicore Rack-Scale Syst., 2016.

[30]

C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel, “PTask: Operating system abstractions to manage GPUs as compute devices,” in Proc. 23rd ACM Symp. Operating Syst. Princ., 2011, pp. 233–248.

[31]

ARM Co., Ltd., “Open source mali midgard GPU kernel drivers (rel. r5p0-06rel0),” 2014. [Online]. Available: https://developer.arm.com/tools-and-software/graphics-and-gaming/mali-drivers/midgard-kernel

[32]

S. Che, et al., “Rodinia: A benchmark suite for heterogeneous computing,” in Proc. IEEE Int. Symp. Workload Characterization, 2009, pp. 44–54.

[33]

S. Abraham, G. Peter, Baer, and G. Greg, Operating System Concepts, 9th ed. Hoboken, NJ, USA: Wiley, 2013, Art. no.

[34]

Samsung Electronics Co., Ltd., “Exynos specification,” 2019. [Online]. Available: https://www.samsung.com/semiconductor/minisite/exynos/products/all-processors/

[35]

M. Motoyoshi, “Through-Silicon Via (TSV),” Proc. IEEE, vol. 97, no. 1, pp. 43–48, Jan. 2009.

[36]

N. Agarwal, D. Nellans, M. Stephenson, M. O'Connor, and S. W. Keckler, “Page placement strategies for GPUs within heterogeneous memory systems,” in Proc. 20th Int. Conf. Archit. Support Program. Lang. Operating Syst., 2015, pp. 607–618.

Cited By

Wang CDeng YNing ZLeach KLi JYan SHe ZCao JZhang F(2024)Building a Lightweight Trusted Execution Environment for Arm GPUsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.333427721:4(3801-3816)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TDSC.2023.3334277
Wang JWang YZhang NMeng WJensen CCremers CKirda E(2023)Secure and Timely GPU Execution in Cyber-physical SystemsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623197(2591-2605)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1145/3576915.3623197
Singh JOlmedo ICapodieci NMarongiu ACaccamo MBolchini CO'Connor IVerbauwhede IWille R(2022)Reconciling QoS and concurrency in NVIDIA GPUs via warp-level schedulingProceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe10.5555/3539845.3540146(1275-1280)Online publication date: 14-Mar-2022
https://dl.acm.org/doi/10.5555/3539845.3540146

Index Terms

Idempotence-Based Preemptive GPU Kernel Scheduling for Embedded Systems
1. Computer systems organization
  1. Embedded and cyber-physical systems
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling
    2. Software system structures
      1. Embedded software
      2. Real-time systems software

Index terms have been assigned to the content through auto-classification.

Recommendations

GPUart - An application-based limited preemptive GPU real-time scheduler for embedded systems
- Presentation of a software-based approach for limited preemptive GPGPU kernels.
- ...
Graphical abstract

Display Omitted

Abstract
Emerging technologies like autonomous driving entail computational intense software solutions. More and more companies accelerate their embedded applications by General Purpose Computing on the Graphics Processing Unit (GPGPU), in ...
Preemptive and non-preemptive scheduling on two unrelated parallel machines
Abstract
In this paper, for the problem of minimizing the makespan on two unrelated parallel machines we compare the quality of preemptive and non-preemptive schedules. It is known that there exists an optimal preemptive schedule with at most two ...
From preemptive to non-preemptive speed-scaling scheduling

We are given a set of jobs, each one specified by its release date, its deadline and its processing volume (work), and a single (or a set of) speed-scalable processor(s). We adopt the standard model in speed-scaling in which if a processor runs at speed ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 70, Issue 3

March 2021

179 pages

ISSN:0018-9340

Issue’s Table of Contents

0018-9340 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 March 2021

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang CDeng YNing ZLeach KLi JYan SHe ZCao JZhang F(2024)Building a Lightweight Trusted Execution Environment for Arm GPUsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.333427721:4(3801-3816)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TDSC.2023.3334277
Wang JWang YZhang NMeng WJensen CCremers CKirda E(2023)Secure and Timely GPU Execution in Cyber-physical SystemsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623197(2591-2605)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1145/3576915.3623197
Singh JOlmedo ICapodieci NMarongiu ACaccamo MBolchini CO'Connor IVerbauwhede IWille R(2022)Reconciling QoS and concurrency in NVIDIA GPUs via warp-level schedulingProceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe10.5555/3539845.3540146(1275-1280)Online publication date: 14-Mar-2022
https://dl.acm.org/doi/10.5555/3539845.3540146
Deng YWang CYu SLiu SNing ZLeach KLi JYan SHe ZCao JZhang FYin HStavrou ACremers CShi E(2022)StrongBoxProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security10.1145/3548606.3560627(769-783)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3548606.3560627

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents