Globally scheduled real-time multiprocessor systems with GPUs

Elliott, Glenn A.; Anderson, James H.

doi:10.1007/s11241-011-9140-y

Globally scheduled real-time multiprocessor systems with GPUs

Published: 21 October 2011

Volume 48, pages 34–74, (2012)
Cite this article

Real-Time Systems Aims and scope Submit manuscript

Glenn A. Elliott¹ &
James H. Anderson¹

790 Accesses
54 Citations
Explore all metrics

Abstract

Graphics processing units, GPUs, are powerful processors that can offer significant performance advantages over traditional CPUs. The last decade has seen rapid advancement in GPU computational power and generality. Recent technologies make it possible to use GPUs as co-processors to CPUs. The performance advantages of GPUs can be great, often outperforming traditional CPUs by orders of magnitude. While the motivations for developing systems with GPUs are clear, little research in the real-time systems field has been done to integrate GPUs into real-time multiprocessor systems. We present two real-time analysis methods, addressing real-world platform constraints, for such an integration into a soft real-time multiprocessor system and show that a GPU can be exploited to achieve greater levels of total system performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-Performance Simulations on GPUs Using Adaptive Time Steps

Concurrent Parallel Processing on Graphics and Multicore Processors with OpenACC and OpenMP

An Introduction to GPU Computing for Numerical Simulation

Notes

Notable platforms include the Compute Unified Device Architecture (CUDA) from NVIDIA (CUDA Zone, URL http://www.nvidia.com/object/cuda_home_new.html), Stream from AMD/ATI (ATI Stream Technology, URL http://www.amd.com/US/PRODUCTS/TECHNOLOGIES/STREAM-TECHNOLOGY/Pages/stream-technology.aspx), OpenCL from Apple and the Khronos Group (OpenCL. URL http://www.khronos.org/opencl/), and DirectCompute from Microsoft (Microsoft DirectX, URL http://www.gamesforwindows.com/en-US/directx/).
China’s new nebulae supercomputer is no. 2, URL http://www.top500.org/lists/2010/06/press-release.
Parallel computing with SciFinance, URL http://www.scicomp.com/parallel_computing/SciComp_NVIDIA_CUDA_OpenMP.pdf.
GeForce graphics processors, URL http://www.nvidia.com/object/geforce_family.html.
Intel microprocessor export compliance metrics, URL http://www.intel.com/support/processors/xeon/sb/CS-020863.htm.
CUDA community showcase, URL http://www.nvidia.com/object/cuda_apps_flash_new.html.
AMD Fusion Family of APUs, URL http://sites.amd.com/us/Documents/48423B_fusion_whitepaper_WEB.pdf.
Intel details 2011 processor features, offers stunning visuals build-in, URL http://download.intel.com/newsroom/kits/idf/2010_fall/pdfs/Day1_IDF_SNB_Factsheet.pdf.
The sample NVIDIA CUDA SDK programs were modified to use pinned memory, which prevents these memory segments from being potentially paged to disk. The use of pinned memory can significantly reduce communication overheads as the system can take advantage of direct memory access (DMA) data transfers. For example, the communication-to-execution ratio for the eigenvalue program increases to about 30% without it.
NVIDIA’s Fermi architecture allows limited simultaneous execution of kernels as long as these kernels are sourced from the same host-side context/thread. In this work, we will not consider such uses.
The GTX-295 actually provides two independent GPUs on a single card, though only one GPU was used in this work.
Some have recently speculated that the earliest-deadline-zero-laxity (EDZL) algorithm may be better suited to accounting for self-suspensions (caused, for example, by using a GPU) (Lakshmanan et al. 2010), though actionable results have yet to be presented, so better suspension accounting remains an open problem.
For performance, GPU operations may be performed asynchronously by the GPU-using job. This allows several GPU operations to be batched together and treated as a single operation, reducing the number of times the job must suspend to wait for GPU results. No changes to our task model are necessary to support this type of operation.
A window-constrained scheduling algorithm prioritizes a job by a time point contained within an interval window that also contains the job’s release and deadline.
Common workload profiles were solicited from research groups at UNC that frequently make use of CUDA. A poll was also informally taken at the NVIDIA CUDA online forums. Similar timing characteristics were later confirmed in the domain of computer vision for real-time automotive applications (Muyan-Ozcelik et al. 2011).
Please note that some graphs appear to be missing data points at lower and upper system utilization ranges. This is caused by the occasional inability to generate task sets meeting particular scenario constraints. This was usually due to the inability to generate a task set with at least two GPU-using tasks under the given constraints.
Graphs for all scenarios are available at http://www.cs.unc.edu/~anderson/papers.html.
k-exclusion locks protect a resource or resource pool, allowing up to k simultaneous accesses.

References

Abhijeet G, Muni TI (2009) GPU based sparse grid technique for solving multidimensional options pricing PDEs. In: Proceedings of the 2nd workshop on high performance computational finance, pp 1–9
Google Scholar
Aila T, Laine S (2009) Understanding the efficiency of ray traversal on GPUs. In: Proceedings of the conference on high performance graphics, pp 145–149
Chapter Google Scholar
Baruah S (2000) Scheduling periodic tasks on uniform processors. In: Proceedings of the EuroMicro conference on real-time systems, pp 7–14
Google Scholar
Baruah S (2004) Feasibility analysis of preemptive real-time systems upon heterogeneous multiprocessor platforms. In: Proceedings of the 25th IEEE real-time systems symposium, pp 37–46
Chapter Google Scholar
Block A, Leontyev H, Brandenburg B, Anderson J (2007) A flexible real-time locking protocol for multiprocessors. In: Proceedings of the 13th IEEE international conference on embedded and real-time computing systems and applications, pp 47–57
Google Scholar
Brandenburg B, Anderson J (2010) Optimality results for multiprocessor real-time locking. In: Proceedings of the 31st IEEE real-time systems symposium, pp 49–60
Chapter Google Scholar
Calandrino J, Leontyev H, Block A, Devi U, Anderson J (2006) LITMUS^RT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE real-time systems symposium, pp 111–123
Google Scholar
Childs S, Ingram D (2001) The Linux-SRT integrated multimedia operating system: bringing QoS to the desktop. In: Proceedings of the 7th real-time technology and applications symposium, p 135
Chapter Google Scholar
Devi U, Anderson J (2008) Tardiness bounds under global EDF scheduling on a multiprocessor. In: Real-time systems, vol 38, pp 133–189
Google Scholar
Dwarakinath A (2008) A fair-share scheduler for the graphics processing unit. Master’s thesis, Stony Brook University
Erickson J, Devi U, Baruah S (2010) Improved tardiness bounds for global EDF. In: Proceedings of the 22nd EuroMicro conference on real-time systems, pp 14–23
Chapter Google Scholar
Funk S, Goossens J, Baruah S (2001) On-line scheduling on uniform multiprocessors. In: Proceedings of the 22nd IEEE real-time systems symposium, pp 183–202
Google Scholar
Gai P, Abeni L, Buttazzo G (2002) Multiprocessor DSP scheduling in system-on-a-chip architectures. In: Proceedings of the 14th EuroMicro conference on real-time systems, pp 231–238
Chapter Google Scholar
Harrison O, Waldron J (2008) Practical symmetric key cryptography on modern graphics hardware. In: Proceedings of the 17th conference on security symposium, pp 195–209
Google Scholar
Kang W, Son SH, Stankovic JA, Amirijoo M (2007) I/O-aware deadline miss ratio management in real-time embedded databases. In: Proceedings of the 28th IEEE real-time systems symposium, pp 277–287
Google Scholar
Kato S, Ishikawa Y (2009) Gang EDF scheduling of parallel task systems. In: Proceedings of the 30th IEEE real-time systems symposium, pp 459–468
Chapter Google Scholar
Kato S, Lakshmanan K, Rajkumar R, Ishikawa Y (2011a) Resource sharing in GPU-accelerated windowing systems. In: Proceedings of the 17th IEEE real-time and embedded technology and application symposium
Google Scholar
Kato S, Lakshmanan K, Rajkumar R, Ishikawa Y (2011b) TimeGraph: GPU scheduling for real-time multi-tasking environments. In: Proceedings of the USENIX annual technical conference
Google Scholar
Lakshmanan K, Kato S, Rajkumar R (2010) Open problems in scheduling self-suspending tasks. In: Proceedings of the 1st international real-time scheduling open problems seminar, pp 12–13
Google Scholar
Leontyev H, Anderson J (2009) A hierarchical multiprocessor bandwidth reservation scheme with timing guarantees. Real-Time Syst 43(1):60–92
Article MATH Google Scholar
Lipari RPG (2007) Holistic analysis of asynchronous real-time transactions with earliest deadline scheduling. J Comput Syst Sci 73:186–206
Article MATH MathSciNet Google Scholar
Manica N, Abeni L, Palopoli L (2008) QoS support in the ×11 window system. In: Proceedings of the 14th IEEE real-time and embedded technology and applications symposium, pp 103–112
Chapter Google Scholar
Muyan-Ozcelik P, Glavtchev V, Ota JM, Owens JD (2011) Real-time speed-limit-sign recognition an embedded system using a GPU. In: GPU Computing Gems, pp 473–496
Google Scholar
Ong CY, Weldon M, Quiring S, Maxwell L, Hughes M, Whelan C, Okoniewski M (2010) Speed it up. IEEE Microw Mag 11(2):70–78
Article Google Scholar
Pieters B, Hollemeersch CF, Lambert P, de Walle RV (2009) Motion estimation for H.264/AVC on multiple GPUs using NVIDIA CUDA. In: Applications of digital image processing XXII, vol 7443, p 74430X
Google Scholar
Raravi G, Andersson B (2010) Calculating an upper bound on the finishing time of a group of threads executing on a GPU: a preliminary case study. In: Work-in-progress session of the 16th IEEE international conference on embedded and real-time computing systems and applications, pp 5–8
Google Scholar
Sasinowski JE, Strosnider JK (1995) ARTIFACT: a platform for evaluating real-time window system designs. In: Proceedings of the 16th IEEE real-time systems symposium, pp 342–352
Chapter Google Scholar
Watanabe Y, Itagaki T (2009) Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit. J Biomed Opt 14, 060506
Article Google Scholar

Download references

Acknowledgements

Work supported by NSF grants CNS 0834270, CNS 0834132, and CNS 1016954; ARO grant W911NF-09-0535; AFOSR grant FA9550-09-1-0549; and AFRL grant FA8750-11-1-0033.

Author information

Authors and Affiliations

Department of Computer Science, University of North Carolina at Chapel Hill, Campus Box 3175, Brooks Computer Science Building, 201 South Columbia Street, UNC-Chapel Hill, 27599-3175, Chapel Hill, NC, USA
Glenn A. Elliott & James H. Anderson

Authors

Glenn A. Elliott
View author publications
You can also search for this author in PubMed Google Scholar
James H. Anderson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Glenn A. Elliott.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elliott, G.A., Anderson, J.H. Globally scheduled real-time multiprocessor systems with GPUs. Real-Time Syst 48, 34–74 (2012). https://doi.org/10.1007/s11241-011-9140-y

Download citation

Published: 21 October 2011
Issue Date: January 2012
DOI: https://doi.org/10.1007/s11241-011-9140-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Globally scheduled real-time multiprocessor systems with GPUs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

High-Performance Simulations on GPUs Using Adaptive Time Steps

Concurrent Parallel Processing on Graphics and Multicore Processors with OpenACC and OpenMP

An Introduction to GPU Computing for Numerical Simulation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Globally scheduled real-time multiprocessor systems with GPUs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

High-Performance Simulations on GPUs Using Adaptive Time Steps

Concurrent Parallel Processing on Graphics and Multicore Processors with OpenACC and OpenMP

An Introduction to GPU Computing for Numerical Simulation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation