Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Globally scheduled real-time multiprocessor systems with GPUs

  • Published:
Real-Time Systems Aims and scope Submit manuscript

Abstract

Graphics processing units, GPUs, are powerful processors that can offer significant performance advantages over traditional CPUs. The last decade has seen rapid advancement in GPU computational power and generality. Recent technologies make it possible to use GPUs as co-processors to CPUs. The performance advantages of GPUs can be great, often outperforming traditional CPUs by orders of magnitude. While the motivations for developing systems with GPUs are clear, little research in the real-time systems field has been done to integrate GPUs into real-time multiprocessor systems. We present two real-time analysis methods, addressing real-world platform constraints, for such an integration into a soft real-time multiprocessor system and show that a GPU can be exploited to achieve greater levels of total system performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. Notable platforms include the Compute Unified Device Architecture (CUDA) from NVIDIA (CUDA Zone, URL http://www.nvidia.com/object/cuda_home_new.html), Stream from AMD/ATI (ATI Stream Technology, URL http://www.amd.com/US/PRODUCTS/TECHNOLOGIES/STREAM-TECHNOLOGY/Pages/stream-technology.aspx), OpenCL from Apple and the Khronos Group (OpenCL. URL http://www.khronos.org/opencl/), and DirectCompute from Microsoft (Microsoft DirectX, URL http://www.gamesforwindows.com/en-US/directx/).

  2. China’s new nebulae supercomputer is no. 2, URL http://www.top500.org/lists/2010/06/press-release.

  3. Parallel computing with SciFinance, URL http://www.scicomp.com/parallel_computing/SciComp_NVIDIA_CUDA_OpenMP.pdf.

  4. GeForce graphics processors, URL http://www.nvidia.com/object/geforce_family.html.

  5. Intel microprocessor export compliance metrics, URL http://www.intel.com/support/processors/xeon/sb/CS-020863.htm.

  6. CUDA community showcase, URL http://www.nvidia.com/object/cuda_apps_flash_new.html.

  7. AMD Fusion Family of APUs, URL http://sites.amd.com/us/Documents/48423B_fusion_whitepaper_WEB.pdf.

  8. Intel details 2011 processor features, offers stunning visuals build-in, URL http://download.intel.com/newsroom/kits/idf/2010_fall/pdfs/Day1_IDF_SNB_Factsheet.pdf.

  9. The sample NVIDIA CUDA SDK programs were modified to use pinned memory, which prevents these memory segments from being potentially paged to disk. The use of pinned memory can significantly reduce communication overheads as the system can take advantage of direct memory access (DMA) data transfers. For example, the communication-to-execution ratio for the eigenvalue program increases to about 30% without it.

  10. NVIDIA’s Fermi architecture allows limited simultaneous execution of kernels as long as these kernels are sourced from the same host-side context/thread. In this work, we will not consider such uses.

  11. The GTX-295 actually provides two independent GPUs on a single card, though only one GPU was used in this work.

  12. Some have recently speculated that the earliest-deadline-zero-laxity (EDZL) algorithm may be better suited to accounting for self-suspensions (caused, for example, by using a GPU) (Lakshmanan et al. 2010), though actionable results have yet to be presented, so better suspension accounting remains an open problem.

  13. For performance, GPU operations may be performed asynchronously by the GPU-using job. This allows several GPU operations to be batched together and treated as a single operation, reducing the number of times the job must suspend to wait for GPU results. No changes to our task model are necessary to support this type of operation.

  14. A window-constrained scheduling algorithm prioritizes a job by a time point contained within an interval window that also contains the job’s release and deadline.

  15. Common workload profiles were solicited from research groups at UNC that frequently make use of CUDA. A poll was also informally taken at the NVIDIA CUDA online forums. Similar timing characteristics were later confirmed in the domain of computer vision for real-time automotive applications (Muyan-Ozcelik et al. 2011).

  16. Please note that some graphs appear to be missing data points at lower and upper system utilization ranges. This is caused by the occasional inability to generate task sets meeting particular scenario constraints. This was usually due to the inability to generate a task set with at least two GPU-using tasks under the given constraints.

  17. Graphs for all scenarios are available at http://www.cs.unc.edu/~anderson/papers.html.

  18. k-exclusion locks protect a resource or resource pool, allowing up to k simultaneous accesses.

References

  • Abhijeet G, Muni TI (2009) GPU based sparse grid technique for solving multidimensional options pricing PDEs. In: Proceedings of the 2nd workshop on high performance computational finance, pp 1–9

    Google Scholar 

  • Aila T, Laine S (2009) Understanding the efficiency of ray traversal on GPUs. In: Proceedings of the conference on high performance graphics, pp 145–149

    Chapter  Google Scholar 

  • Baruah S (2000) Scheduling periodic tasks on uniform processors. In: Proceedings of the EuroMicro conference on real-time systems, pp 7–14

    Google Scholar 

  • Baruah S (2004) Feasibility analysis of preemptive real-time systems upon heterogeneous multiprocessor platforms. In: Proceedings of the 25th IEEE real-time systems symposium, pp 37–46

    Chapter  Google Scholar 

  • Block A, Leontyev H, Brandenburg B, Anderson J (2007) A flexible real-time locking protocol for multiprocessors. In: Proceedings of the 13th IEEE international conference on embedded and real-time computing systems and applications, pp 47–57

    Google Scholar 

  • Brandenburg B, Anderson J (2010) Optimality results for multiprocessor real-time locking. In: Proceedings of the 31st IEEE real-time systems symposium, pp 49–60

    Chapter  Google Scholar 

  • Calandrino J, Leontyev H, Block A, Devi U, Anderson J (2006) LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE real-time systems symposium, pp 111–123

    Google Scholar 

  • Childs S, Ingram D (2001) The Linux-SRT integrated multimedia operating system: bringing QoS to the desktop. In: Proceedings of the 7th real-time technology and applications symposium, p 135

    Chapter  Google Scholar 

  • Devi U, Anderson J (2008) Tardiness bounds under global EDF scheduling on a multiprocessor. In: Real-time systems, vol 38, pp 133–189

    Google Scholar 

  • Dwarakinath A (2008) A fair-share scheduler for the graphics processing unit. Master’s thesis, Stony Brook University

  • Erickson J, Devi U, Baruah S (2010) Improved tardiness bounds for global EDF. In: Proceedings of the 22nd EuroMicro conference on real-time systems, pp 14–23

    Chapter  Google Scholar 

  • Funk S, Goossens J, Baruah S (2001) On-line scheduling on uniform multiprocessors. In: Proceedings of the 22nd IEEE real-time systems symposium, pp 183–202

    Google Scholar 

  • Gai P, Abeni L, Buttazzo G (2002) Multiprocessor DSP scheduling in system-on-a-chip architectures. In: Proceedings of the 14th EuroMicro conference on real-time systems, pp 231–238

    Chapter  Google Scholar 

  • Harrison O, Waldron J (2008) Practical symmetric key cryptography on modern graphics hardware. In: Proceedings of the 17th conference on security symposium, pp 195–209

    Google Scholar 

  • Kang W, Son SH, Stankovic JA, Amirijoo M (2007) I/O-aware deadline miss ratio management in real-time embedded databases. In: Proceedings of the 28th IEEE real-time systems symposium, pp 277–287

    Google Scholar 

  • Kato S, Ishikawa Y (2009) Gang EDF scheduling of parallel task systems. In: Proceedings of the 30th IEEE real-time systems symposium, pp 459–468

    Chapter  Google Scholar 

  • Kato S, Lakshmanan K, Rajkumar R, Ishikawa Y (2011a) Resource sharing in GPU-accelerated windowing systems. In: Proceedings of the 17th IEEE real-time and embedded technology and application symposium

    Google Scholar 

  • Kato S, Lakshmanan K, Rajkumar R, Ishikawa Y (2011b) TimeGraph: GPU scheduling for real-time multi-tasking environments. In: Proceedings of the USENIX annual technical conference

    Google Scholar 

  • Lakshmanan K, Kato S, Rajkumar R (2010) Open problems in scheduling self-suspending tasks. In: Proceedings of the 1st international real-time scheduling open problems seminar, pp 12–13

    Google Scholar 

  • Leontyev H, Anderson J (2009) A hierarchical multiprocessor bandwidth reservation scheme with timing guarantees. Real-Time Syst 43(1):60–92

    Article  MATH  Google Scholar 

  • Lipari RPG (2007) Holistic analysis of asynchronous real-time transactions with earliest deadline scheduling. J Comput Syst Sci 73:186–206

    Article  MATH  MathSciNet  Google Scholar 

  • Manica N, Abeni L, Palopoli L (2008) QoS support in the ×11 window system. In: Proceedings of the 14th IEEE real-time and embedded technology and applications symposium, pp 103–112

    Chapter  Google Scholar 

  • Muyan-Ozcelik P, Glavtchev V, Ota JM, Owens JD (2011) Real-time speed-limit-sign recognition an embedded system using a GPU. In: GPU Computing Gems, pp 473–496

    Google Scholar 

  • Ong CY, Weldon M, Quiring S, Maxwell L, Hughes M, Whelan C, Okoniewski M (2010) Speed it up. IEEE Microw Mag 11(2):70–78

    Article  Google Scholar 

  • Pieters B, Hollemeersch CF, Lambert P, de Walle RV (2009) Motion estimation for H.264/AVC on multiple GPUs using NVIDIA CUDA. In: Applications of digital image processing XXII, vol 7443, p 74430X

    Google Scholar 

  • Raravi G, Andersson B (2010) Calculating an upper bound on the finishing time of a group of threads executing on a GPU: a preliminary case study. In: Work-in-progress session of the 16th IEEE international conference on embedded and real-time computing systems and applications, pp 5–8

    Google Scholar 

  • Sasinowski JE, Strosnider JK (1995) ARTIFACT: a platform for evaluating real-time window system designs. In: Proceedings of the 16th IEEE real-time systems symposium, pp 342–352

    Chapter  Google Scholar 

  • Watanabe Y, Itagaki T (2009) Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit. J Biomed Opt 14, 060506

    Article  Google Scholar 

Download references

Acknowledgements

Work supported by NSF grants CNS 0834270, CNS 0834132, and CNS 1016954; ARO grant W911NF-09-0535; AFOSR grant FA9550-09-1-0549; and AFRL grant FA8750-11-1-0033.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Glenn A. Elliott.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elliott, G.A., Anderson, J.H. Globally scheduled real-time multiprocessor systems with GPUs. Real-Time Syst 48, 34–74 (2012). https://doi.org/10.1007/s11241-011-9140-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11241-011-9140-y

Keywords