Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Idempotence-Based Preemptive GPU Kernel Scheduling for Embedded Systems

Published: 01 March 2021 Publication History

Abstract

Mission-critical embedded systems simultaneously run multiple graphics-processing-unit (GPU) computing tasks with different criticality and timeliness requirements. Considerable research effort has been dedicated to supporting the preemptive priority scheduling of GPU kernels. However, hardware-supported preemption leads to lengthy scheduling delays and complicated designs, and most software approaches depend on the voluntary yielding of GPU resources from restructured kernels. We propose a preemptive GPU kernel scheduling scheme that harnesses the idempotence property of kernels. The proposed scheme distinguishes idempotent kernels through static source code analysis. If a kernel is not idempotent, then GPU kernels are transactionized at the operating system (OS) level. Both idempotent and transactionized kernels can be aborted at any point during their execution and rolled back to their initial state for reexecution. Therefore, low-priority kernel instances can be preempted for high-priority kernel instances and reexecuted after the GPU becomes available again. Our evaluation using the Rodinia benchmark suite showed that the proposed approach limits the preemption delay to 18 <inline-formula><tex-math notation="LaTeX">$\mu$</tex-math><alternatives><mml:math><mml:mi>&#x03BC;</mml:mi></mml:math><inline-graphic xlink:href="seo-ieq1-2988251.gif"/></alternatives></inline-formula>s in the 99.9th percentile, with an average delay in execution time of less than 10 percent for high-priority tasks under a heavy load in most cases.

References

[1]
S. Kato, S. Brandt, Y. Ishikawa, and R. Rajkumar, “Operating systems challenges for GPU resource management,” in Proc. Int. Workshop Operating Syst. Platforms Embedded Real-Time Appl., 2011, pp. 23–32.
[2]
J. Kim, R. R. Rajkumar, and S. Kato,“Towards adaptive GPU resource management for embedded real-time systems,” ACM SIGBED Rev., vol. 10, no. 1, pp. 14–17, 2013.
[3]
I. Tanasic, I. Gelado, J. Cabezas, A. Ramirez, N. Navarro, and M. Valero, “Enabling preemptive multiprogramming on GPUs,” in Proc. 41st Annu. ACM/IEEE Int. Symp. Comput. Archit., 2014, pp. 193–204.
[4]
J. Park, J. Kyu, Y. Park, and S. Mahlke, “Chimera: Collaborative preemption for multitasking on a shared GPU,” in Proc. 20th Int. Conf. Archit. Support Program. Lang. Operating Syst., 2015, pp. 593–606.
[5]
Z. Lin, L. Nyland, and H. Zhou, “Enabling efficient preemption for SIMT architectures with lightweight context switching,” in Proc. Int. Conf. High Perform. Comput. Netw. Storage Anal., 2016, pp. 898–908.
[6]
G. Chen, Y. Zhao, X. Shen, and H. Zhou, “EffiSha: A software framework for enabling effficient preemptive scheduling of GPU,” in Proc. 22nd ACM SIGPLAN Symp. Princ. Practice Parallel Program., 2017, pp. 3–16.
[7]
B. Wu, X. Liu, X. Zhou, and C. Jiang, “FLEP: Enabling flexible and efficient preemption on GPUs,” in Proc. 22nd Int. Conf. Archit. Support Program. Lang. Operating Syst., 2017, pp. 483–496.
[8]
H. Lee, J. Roh, and E. Seo, “A GPU kernel transactionization scheme for preemptive priority scheduling,” in Proc. IEEE Real-Time Embedded Technol. Appl. Symp., 2018, pp. 202–213.
[9]
Khronos Group, “OpenCL,” 2009. [Online]. Available: https://www.khronos.org/opencl
[10]
P. Muyan-Özçelik and J. D. Owens, “Multitasking real-time embedded GPU computing tasks,” in Proc. 7th Int. Workshop Program. Models Appl. Multicores Manycores, 2016, pp. 78–87.
[11]
S. Azmat, L. Wills, and S. Wills, “Accelerating adaptive background modeling on low-power integrated GPUs,” in Proc. 41st Int. Conf. Parallel Process. Workshops, 2012, pp. 568–573.
[12]
A. Maghazeh, U. D. Bordoloi, P. Eles, and Z. Peng, “General purpose computing on low-power embedded GPUs: Has it come of age?” in Proc. Int. Conf. Embedded Comput. Syst., Archite. Model. Simul., 2013, pp. 1–10.
[13]
Heterogeneous System Architecture Foundation, “Heterogeneous system architecture,” 2016. [Online]. Available: http://hsafoundation.com
[14]
R. Koo and S. Toueg, “Checkpointing and rollback-recovery for distributed systems,” IEEE Trans. Softw. Eng., vol. SE-13, no. 1, pp. 23–31, Jan. 1987.
[15]
J. Duell, P. Hargrove, and E. Roman, “The design and implementation of berkeley lab's Linux checkpoint/restart,” 2002.
[16]
K. Maeng and B. Lucia, “Adaptive dynamic checkpointing for safe efficient intermittent computing,” in Proc. 13th USENIX Symp. Operating Syst. Des. Implementation, 2018, pp. 129–144.
[17]
G. Luan, Y. Bai, C. Wang, J. Zeng, and Q. Chen, “An efficient checkpoint and recovery mechanism for real-time embedded systems,” in Proc. IEEE Int. Conf. Parallel Distrib. Process. Appl. Ubiquitous Comput. Commun. Big Data Cloud Comput. Social Comput. Netw. Sustain. Comput. Commun., 2018, pp. 824–831.
[18]
M. De Kruijf and K. Sankaralingam, “Idempotent processor architecture,” in Proc. 44th Annu. IEEE/ACM Int. Symp. Microarchit., 2011, pp. 140–151.
[19]
M. De Kruijf, K. Sankaralingam, and S. Jha, “Static analysis and compiler design for idempotent processing,” in Proc. 33rd ACM SIGPLAN Conf. Program. Lang. Des. Implementation, 2012, pp. 475–486.
[20]
M. De Kruijf and K. Sankaralingam, “Idempotent code generation: Implementation, analysis, and evaluation,” in Proc. IEEE/ACM Int. Symp. Code Gener. Optim., 2013, pp. 1–12.
[21]
Q. Liu, C. Jung, D. Lee, and D. Tiwari, “Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery,” in Proc. Int. Conf. High Perform. Comput. Netw. Storage Anal., 2016, pp. 228–239.
[22]
M. Alshboul, H. Elnawawy, R. Elkhouly, K. Kimura, J. Tuck, and Y. Solihin, “Efficient checkpointing with recompute scheme for non-volatile main memory,” ACM Trans. Archit. Code Optim., vol. 16, no. 2, 2019, Art. no.
[23]
Q. Liu, J. Izraelevitz, S. K. Lee, M. L. Scott, S. H. Noh, and C. Jung, “iDO: Compiler-directed failure atomicity for nonvolatile memory,” in Proc. 51st Annu. IEEE/ACM Int. Symp. Microarchit., 2018, pp. 258–270.
[24]
J. Menon, M. De Kruijf, and K. Sankaralingam, “iGPU: Exception support and speculative execution on GPUs,” in Proc. 39th Annu. Int. Symp. Comput. Archit., 2012, pp. 72–83.
[25]
Z. Lin, M. Alshboul, Y. Solihin, and H. Zhou, “Exploring memory persistency models for GPUs,” in Proc. IEEE 28th Int. Conf. Parallel Archit. Compilation Techn., 2019, pp. 311–323.
[26]
T. Parr, The Definitive ANTLR 4 Reference. Pragmatic Bookshelf, 2013.
[27]
NVIDIA, “NVIDIA GeForce GTX 1080,” White Paper, 2016.
[28]
C. Basaran and K. Kang, “Supporting preemptive task executions and memory copies in GPGPUs,” in Proc. 24th Euromicro Conf. Real-Time Syst., 2012, pp. 287–296.
[29]
Y. Suzuki, H. Yamada, S. Kato, and K. Kono, “Towards multi-tenant GPGPU: Event-driven programming model for system-wide scheduling on shared GPUs,” in Proc. Workshop Multicore Rack-Scale Syst., 2016.
[30]
C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel, “PTask: Operating system abstractions to manage GPUs as compute devices,” in Proc. 23rd ACM Symp. Operating Syst. Princ., 2011, pp. 233–248.
[31]
ARM Co., Ltd., “Open source mali midgard GPU kernel drivers (rel. r5p0-06rel0),” 2014. [Online]. Available: https://developer.arm.com/tools-and-software/graphics-and-gaming/mali-drivers/midgard-kernel
[32]
S. Che, et al., “Rodinia: A benchmark suite for heterogeneous computing,” in Proc. IEEE Int. Symp. Workload Characterization, 2009, pp. 44–54.
[33]
S. Abraham, G. Peter, Baer, and G. Greg, Operating System Concepts, 9th ed. Hoboken, NJ, USA: Wiley, 2013, Art. no.
[34]
Samsung Electronics Co., Ltd., “Exynos specification,” 2019. [Online]. Available: https://www.samsung.com/semiconductor/minisite/exynos/products/all-processors/
[35]
M. Motoyoshi, “Through-Silicon Via (TSV),” Proc. IEEE, vol. 97, no. 1, pp. 43–48, Jan. 2009.
[36]
N. Agarwal, D. Nellans, M. Stephenson, M. O'Connor, and S. W. Keckler, “Page placement strategies for GPUs within heterogeneous memory systems,” in Proc. 20th Int. Conf. Archit. Support Program. Lang. Operating Syst., 2015, pp. 607–618.

Cited By

View all
  • (2024)Building a Lightweight Trusted Execution Environment for Arm GPUsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.333427721:4(3801-3816)Online publication date: 1-Jul-2024
  • (2023)Secure and Timely GPU Execution in Cyber-physical SystemsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623197(2591-2605)Online publication date: 15-Nov-2023
  • (2022)Reconciling QoS and concurrency in NVIDIA GPUs via warp-level schedulingProceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe10.5555/3539845.3540146(1275-1280)Online publication date: 14-Mar-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 70, Issue 3
March 2021
179 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 March 2021

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Building a Lightweight Trusted Execution Environment for Arm GPUsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.333427721:4(3801-3816)Online publication date: 1-Jul-2024
  • (2023)Secure and Timely GPU Execution in Cyber-physical SystemsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623197(2591-2605)Online publication date: 15-Nov-2023
  • (2022)Reconciling QoS and concurrency in NVIDIA GPUs via warp-level schedulingProceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe10.5555/3539845.3540146(1275-1280)Online publication date: 14-Mar-2022
  • (2022)StrongBoxProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security10.1145/3548606.3560627(769-783)Online publication date: 7-Nov-2022

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media