Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2925426.2926265acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Scheduling Tasks with Mixed Timing Constraints in GPU-Powered Real-Time Systems

Published: 01 June 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Due to the cost-effective, massive computational power of graphics processing units (GPUs), there is a growing interest of utilizing GPUs in real-time systems. For example GPUs have been applied to automotive systems to enable new advanced and intelligent driver assistance technologies, accelerating the path to self-driving cars. In such systems, GPUs are shared among tasks with mixed timing constraints: real-time (RT) tasks that have to be accomplished before specified deadlines, and non-real-time, best-effort (BE) tasks. In this paper, (1) we propose resource-aware non-uniform slack distribution to enhance the schedulability of RT tasks (the total amount of work of RT tasks whose deadlines can be satisfied on a given amount of resources) in GPU-enabled systems; (2) we propose deadline-aware dynamic GPU partitioning to allow RT and BE tasks to run on a GPU simultaneously, such that BE tasks are not blocked for a long time.
    We evaluate the effectiveness of the proposed approaches by using both synthetic benchmarks and a real-world workload that consists of a set of emerging automotive tasks. Experimental results show that the proposed approaches yield significant schedulability improvement for RT tasks and turnaround time decrement for BE tasks. Moreover, the analysis of two driving scenarios shows that such schedulability improvement and turnaround time decrement can significantly enhance the driving safety and experience. For example, when the resource-aware non-uniform slack distribution approach is used, the distance that a car travels during the time between a traffic sign (pedestrian) is "seen and recognized" is decreased from 44.4m to 22.2m (from 4.4m to 2.2m); when the deadline-aware dynamic GPU partitioning approach is used, the distance that the car has traveled before a drowsy driver is woken up is reduced from 56.2m to 29.2m.

    References

    [1]
    The German Traffic Sign Detection Benchmark. http://benchmark.ini.rub.de/?section=gtsdb&subsection=dataset.
    [2]
    NVIDIA CUDA Programming Guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf.
    [3]
    NVIDIA Visual Profiler. http://docs.nvidia.com/cuda/pdf/CUDA_Profiler_Users_Guide.pdf.
    [4]
    NVIDIA MPS. Sharing a GPU between MPI processes: Multi-process service (MPS) overview. http://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf.
    [5]
    NVIDIA Tegra X1: NVIDIA'S New Mobile Superchip. http://international.download.nvidia.com/pdf/tegra/Tegra-X1-whitepaper-v1.0.pdf.
    [6]
    The INRIA pedestrians dataset. http://pascal.inrialpes.fr/data/human/.
    [7]
    Audio Caffe. https://github.com/ashrafk/audioCaffeInitial.
    [8]
    The 9th DIMACS Implementation Challenge - Shortest Paths. http://www.dis.uniroma1.it/~challenge9/download.shtml.
    [9]
    Khronos opencl. http://www.khronos.org/opencl/.
    [10]
    NVIDIA Tegra K1: A New Era in Mobile Computing. http://www.nvidia.com/content/pdf/tegra_white_papers/tegra_k1_whitepaper_v1.0.pdf.
    [11]
    NVIDIA Automotive. http://www.nvidia.in/object/automotive-partner-innovation-in.html.
    [12]
    J. Adriaens, K. Compton, N. Kim, and M. Schulte. The case for GPGPU spatial multitasking. In Proc. of the 18th IEEE International Symposium on High Performance Computer Architecture (HPCA), 2012.
    [13]
    B. Andersson, G. Raravi, and K. Bletsas. Assigning Real-Time Tasks on Heterogeneous Multiprocessors with Two Unrelated Types of Processors. In Proc. of the 31th IEEE Real-Time Systems Symposium (RTSS), 2010.
    [14]
    A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt. Analyzing CUDA workloads using a detailed GPU simulator. In Proc. of the IEEE Symposium on Performance Analysis of Systems and Software (ISPASS), 2009.
    [15]
    S. Banachowski, T. Bisson, and S. A. Brandt. Integrating Best-effort Scheduling into a Real-time System. In Proc. of the 25th IEEE Real-Time Systems Symposium (RTSS), 2004.
    [16]
    C. Basaran and K. Kang. Supporting Preemptive Task Executions and Memory Copies in GPGPUs. In Proc. of the 24th Euromicro Conference on Real-Time Systems (ECRTS), 2012.
    [17]
    R. Benenson, M. Mathias, R. Timofte, and L. Gool. Pedestrian detection at 100 frames per second. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. https://bitbucket.org/rodrigob/doppia.
    [18]
    K. Berezovskyi, L. Santinelli, K. Bletsas, and E. Tovar. WCET Measurement-based and Extreme Value Theory Characterisation of CUDA Kernels. In Proc. of the 22nd International Conference on Real-Time Networks and Systems (RTNS), 2014.
    [19]
    J. Bernd, D. Borth, B. Elizalde, G. Friedland, H. Gallagher, L. Gottlieb, A. Janin, S. Karabashlieva, J. Takahashi, and J. Won. The YLI-MED Corpus: Characteristics, Procedures, and Plans. ICSI Technical Report TR-15-001. In Computing Research Repository, arXiv:1503.04250, 2015.
    [20]
    A. Betts and A. Donaldson. Estimating the WCET of GPU-Accelerated Applications using Hybrid Analysis. In Proc. of the 25th Euromicro Conference on Real-Time Systems (ECRTS), 2013.
    [21]
    B. Brandenburg and J. Anderson. Integrating Hard/Soft Real-Time Tasks and Best-Effort Jobs on Multiprocessors. In Proc. of the 19th Euromicro Conference on Real-Time Systems (ECRTS), 2007.
    [22]
    A. Branover, D. Foley, and M. Steinman. AMD Fusion APU: Llano. In IEEE Micro, volume 32, page 28, 2012.
    [23]
    D. Ciresan, U. Meier, J. Masci, and J. Schmidhuber. Multi-Column Deep Neural Network for Traffic Sign Classification. In Neural Networks, 2012.
    [24]
    G. Dahl, D. Yu, L. Deng, and A. Acero. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. In IEEE Transactions on Audio, Speech, and Language Processing, volume 20, page 30, 2012.
    [25]
    R. DAVIS and A. BURNSA. Survey of Hard Real-Time Scheduling for Multiprocessor Systems. In ACM Computing Surveys, volume 43, 2011.
    [26]
    D. Delling, M. Kobitzsch, and R. Werneck. Customizing Driving Directions with GPUs. In Proc. of the 20th European Conference on Parallel Processing (Euro-Par), 2014.
    [27]
    G. Elliott and J. Anderson. Globally Scheduled Real-Time Multiprocessor Systems with GPUs. In Proc. of the 18th International Conference on Real-Time and Network Systems (RTNS), 2010.
    [28]
    G. Elliott and J. Anderson. Building a real-time multi-GPU platform: Robust real-time interrupt handling despite closed source drivers. In Proc. of the 24th Euromicro Conference on Real-Time Systems (ECRTS), 2012.
    [29]
    G. Elliott and J. Anderson. An optimal k-exclusion real-time locking protocol motivated by multi-GPU systems. In Real-Time Systems (RTS), volume 49, 2013.
    [30]
    G. Elliott, B. Ward, and J. Anderson. GPUSync: A framework for real-time GPU management. In Proc. of the IEEE 34th Real-Time Systems Symposium (RTSS), 2013.
    [31]
    G. Elliott, B. Ward, and J. Anderson. Exploring the Multitude of Real-Time Multi-GPU Configurations. In Proc. of the IEEE 35th Real-Time Systems Symposium (RTSS), 2014.
    [32]
    K. Gupta and J. Owens. Compute & Memory Optimizations for High-Quality Speech Recognition on Low-End GPU Processors. In Proc. of the International Conference on High Performance Computing (HiPC), 2011.
    [33]
    G. Hinton, L. Deng, G. D. D. Yu, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. In IEEE Signal Processing Magazine, volume 29, 2012.
    [34]
    F. Homm, N. Kaempchen, J. Ota, and D. Burschka. Efficient occupancy grid computation on the GPU with LIDAR and radar for road boundary detection. In Proc. of the IEEE Intelligent Vehicles Symposium (IV), 2010.
    [35]
    R. Inam, N. Mahmud, M. Behnam, T. Nolte, and M. Sjodin. The Multi-Resource Server for Predictable Execution on Multi-core Platforms. In Proc. of IEEE 20th Real-Time and Embedded Technology and Applications Symposium (RTAS), 2014.
    [36]
    S. Kato, K. Lakshmanan, A. Kumar, M. Kelkar, Y. Ishikawa, and R. Rajkumar. RGEM: A Responsive GPGPU Execution Model for Runtime Engines. In Proc. of the IEEE 32nd Real-Time Systems Symposium (RTSS), 2011.
    [37]
    S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments. In Proc. of the USENIX conference on USENIX annual technical conference (ATC), 2011.
    [38]
    H. Kim, D. de Niz, B. Andersson, M. Klein, O. Mutlu, and R. Rajkumar. Bounding Memory Interference Delay in COTS-based Multi-Core Systems. In Proc. 20th IEEE Real-Time Technology and Applications Symposium (RTAS), 2014.
    [39]
    T. Kuhnl, F. Kummert, and J. Fritsch. Visual ego-vehicle lane assignment using Spatial Ray features. In Proc. of the IEEE Intelligent Vehicles Symposium (IV), 2013.
    [40]
    M. Lalonde, D. Byrns, L. Gagnon, N. Teasdale, and D. Laurendeau. Real-time eye blink detection with GPU-based SIFT tracking. In Proc. of the Fourth Canadian Conference on Computer and Robot Vision (CRV), 2007.
    [41]
    C. Liu, J. Li, W. Huang, J. Rubio, E. Speight, and X. Lin. Power-Efficient Time-Sensitive Mapping in Heterogeneous Systems. In Proc. of the 21st international conference on Parallel architectures and compilation techniques (PACT), 2012.
    [42]
    C. Luk, S. Hong, and H. Kim. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proc. of the 42nd International Symp. on Microarchitecture (MICRO), 2009.
    [43]
    R. Mancuso, R. Dudko, E. Betti, M. Cesati, M. Caccamo, and R. Pellizzoni. Real-time cache management framework for multi-core architectures. In Proc. 19th IEEE Real-Time Technology and Applications Symposium (RTAS), 2013.
    [44]
    M. Mathias, R. Timofte, R. Benenson, and L. Gool. Traffic Sign Recognition - How far are we from the solution? In Proc. of the International Joint Conference on Neural Networks (IJCNN), 2013.
    [45]
    P. Muyan-Ozcelik, V. Glavtchev, J. Ota, and J. Owens. Real-Time Speed-Limit-Sign Recognition on an Embedded System Using a GPU. In GPU Computing Gems, volume 1, 2011.
    [46]
    J. Nickolls and W. Dally. The GPU Computing Era. In IEEE Micro, volume 30, page 56, 2010.
    [47]
    S. Pai, M. J. Thazhuthaveetil, and R. Govindarajan. Improving GPGPU Concurrency with Elastic Kernels. In Proc. of the eighteenth international conference on Architectural support for programming languages and operating systems (ASPLOS), 2013.
    [48]
    M. Paolieri, E. Quinones, F. Cazorla, G. Bernat, and M. Valero. Hardware support for WCET analysis of hard real-time multicore systems. In Proc. of the 36th annual international symposium on Computer architecture (ISCA), 2009.
    [49]
    J. Park, Y. Park, and S. Mahlke. Chimera: Collaborative Preemption for Multitasking on a Shared GPU. In Proc. of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015.
    [50]
    G. Raravi, B. Andersson, K. Bletsas, and V. Nelis. Task Assignment Algorithms for Two-type Heterogeneous Multiprocessors. In Proc. of the 24th Euromicro Conference on Real-Time Systems (ECRTS), 2012.
    [51]
    I. Sato and H. Niihara. Beyond Pedestrian Detection: Deep Neural Networks Level-Up Automotive Safety. In GPU Technology Conference, 2014. http://on-demand.gputechconf.com/gtc/2014/presentations/S4621-deep-neural-networks-automotive-safety.pdf.
    [52]
    P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun. Pedestrian Detection with Unsupervised Multi-Stage Feature Learning. In Proc. of International Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
    [53]
    P. Sermanet and Y. LeCun. Traffic sign recognition with multi-scale convolutional networks. In Proc. of International Joint Conference on Neural Networks (IJCNN), 2011.
    [54]
    L. Sha, T. Abdelzaher, K. E. Arzen, A. Cervin, T. Baker, A. Burns, G. Buttazzo, M. Caccamo, J. Lehoczky, and A. K. Mok. Real Time Scheduling Theory: A Historical Perspective. In Real-Time Systems, volume 28, 2004.
    [55]
    I. Tanasic, I. Gelado, J. Cabezas, A. Ramirez, N. Navarro, and M. Valero. Enabling Preemptive Multiprogramming on GPUs. In Proc. of the 41st International Symposium on Computer Architecture (ISCA), 2014.
    [56]
    T. Ungerer, F. J. Cazorla, P. Sainrat, G. Bernat, Z. Petrov, C. Rochange, E. Quinones, M. Gerdes, M. Paolieri, J. Wolf, H. Casse, S. Uhrig, I. Guliashvili, M. Houston, F. Kluge, S. Metzlaff, and J. Mische. Merasa: Multicore Execution of Hard Real-Time Applications Supporting Analyzability. In IEEE Micro, volume 20, 2010.
    [57]
    B. Ward, J. Herman, C. Kenna, and J. Anderson. Making Shared Caches More Predictable on Multicore Platforms. In Proc. of the 25th Euromicro Conference on Real-Time Systems (ECRTS), 2013.
    [58]
    T. Weisswange, B. Bolder, J. Fritsch, S. Hasler, and C. Goerick. An Integrated ADAS for Assessing Risky Situations in Urban Driving. In Proc. of the IEEE Intelligent Vehicles Symposium (IV), 2013.
    [59]
    R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, G. B. D. Whalley, C. Ferdinand, R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, and P. Stenstrom. The worst-case execution-time problem - overview of methods and survey of tools. In ACM Transaction on Embedded Computing Systems, volume 7, 2008.
    [60]
    B. Wu, G. Chen, D. Li, X. Shen, and J. Vetter. Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations. In Proc. of the 29th ACM on International Conference on Supercomputing (ICS), 2015.

    Cited By

    View all
    • (2023)Uso de GPUs en aplicaciones de tiempo real: Una revisión de técnicas para el análisis y optimización de parámetros temporalesRevista Iberoamericana de Automática e Informática industrial10.4995/riai.2023.2032121:1(1-16)Online publication date: 7-Nov-2023
    • (2022)EAISFuture Generation Computer Systems10.1016/j.future.2022.01.004130:C(253-268)Online publication date: 1-May-2022
    • (2019)MV-NetACM Journal on Emerging Technologies in Computing Systems10.1145/335869615:4(1-25)Online publication date: 3-Oct-2019
    • Show More Cited By
    1. Scheduling Tasks with Mixed Timing Constraints in GPU-Powered Real-Time Systems

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICS '16: Proceedings of the 2016 International Conference on Supercomputing
      June 2016
      547 pages
      ISBN:9781450343619
      DOI:10.1145/2925426
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 June 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. GPU
      2. multitasking
      3. real-time system

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ICS '16
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 629 of 2,180 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)27
      • Downloads (Last 6 weeks)2

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Uso de GPUs en aplicaciones de tiempo real: Una revisión de técnicas para el análisis y optimización de parámetros temporalesRevista Iberoamericana de Automática e Informática industrial10.4995/riai.2023.2032121:1(1-16)Online publication date: 7-Nov-2023
      • (2022)EAISFuture Generation Computer Systems10.1016/j.future.2022.01.004130:C(253-268)Online publication date: 1-May-2022
      • (2019)MV-NetACM Journal on Emerging Technologies in Computing Systems10.1145/335869615:4(1-25)Online publication date: 3-Oct-2019
      • (2019)Real Time Automatic Andon Alerts for Android Platforms Applied in Footwear ManufacturingComputer and Communication Engineering10.1007/978-3-030-12018-4_4(43-56)Online publication date: 23-Jan-2019
      • (2018)Making OpenVX Really "Real Time"2018 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS.2018.00018(80-93)Online publication date: Dec-2018
      • (2017)GPU Scheduling on the NVIDIA TX2: Hidden Details Revealed2017 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS.2017.00017(104-115)Online publication date: Dec-2017
      • (2017)An Evaluation of the NVIDIA TX1 for Supporting Real-Time Computer-Vision Workloads2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS.2017.3(353-364)Online publication date: Apr-2017
      • (2017)Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2017.52(1-12)Online publication date: Feb-2017
      • (2016)Towards Safe and Secure Autonomous and Cooperative Vehicle EcosystemsProceedings of the 2nd ACM Workshop on Cyber-Physical Systems Security and Privacy10.1145/2994487.2994489(59-70)Online publication date: 28-Oct-2016
      • (2016)Bridging the Semantic Gaps of GPU Acceleration for Scale-out CNN-based Big Data ProcessingProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2967944(315-326)Online publication date: 11-Sep-2016

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media