research-article

Processing data streams with hard real-time constraints on heterogeneous systems

Authors:

Assaf Schuster,

Mark SilbersteinAuthors Info & Claims

ICS '11: Proceedings of the international conference on Supercomputing

Pages 120 - 129

https://doi.org/10.1145/1995896.1995915

Published: 31 May 2011 Publication History

Abstract

Data stream processing applications such as stock exchange data analysis, VoIP streaming, and sensor data processing pose two conflicting challenges: short per-stream latency -- to satisfy the milliseconds-long, hard real-time constraints of each stream, and high throughput -- to enable efficient processing of as many streams as possible. High-throughput programmable accelerators such as modern GPUs hold high potential to speed up the computations. However, their use for hard real-time stream processing is complicated by slow communications with CPUs, variable throughput changing non-linearly with the input size, and weak consistency of their local memory with respect to CPU accesses. Furthermore, their coarse grain hardware scheduler renders them unsuitable for unbalanced multi-stream workloads.

We present a general, efficient and practical algorithm for hard real-time stream scheduling in heterogeneous systems. The algorithm assigns incoming streams of different rates and deadlines to CPUs and accelerators. By employing novel stream schedulability criteria for accelerators, the algorithm finds the assignment which simultaneously satisfies the aggregate throughput requirements of all the streams and the deadline constraint of each stream alone.

Using the AES-CBC encryption kernel, we experimented extensively on thousands of streams with realistic rate and deadline distributions. Our framework outperformed the alternative methods by allowing 50% more streams to be processed with provably deadline-compliant execution even for deadlines as short as tens milliseconds. Overall, the combined GPU-CPU execution allows for up to 4-fold throughput increase over highly-optimized multi-threaded CPU-only implementations.

References

[1]

C. Augonnet, S. Thibault, R. Namyst, and P. A. Wacrenier. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Euro-Par 2009 Parallel Processing, pages 863--874, 2009.

Digital Library

[2]

S. K. Baruah. The non-preemptive scheduling of periodic tasks upon multiprocessors. Real-Time Syst., 32:9--20, 2006.

Digital Library

[3]

S. K. Baruah, N. K. Cohen, C. G. Plaxton, and D. A. Varvel. Proportionate progress: A notion of fairness in resource allocation. Algorithmica, 15(6):600--625, 1996.

Digital Library

[4]

D. Cederman and P. Tsigas. On sorting and load balancing on GPUs. SIGARCH Comput. Archit. News, 36:11--18, 2009.

Digital Library

[5]

L. Chen, O. Villa, S. Krishnamoorthy, and G. Gao. Dynamic load balancing on single- and multi-GPU systems. In IEEE Intl. Symp. on Parallel and Distributed Processing (IPDPS), pages 1--12, 2010.

[6]

S. Davari and S. K. Dhall. An on line algorithm for real-time tasks allocation. In IEEE Real-Time Systems Symp., pages 194--200, 1986.

[7]

U. C. Devi. An improved schedulability test for uniprocessor periodic task systems. Euromicro Conf. on Real-Time Systems, 0:23, 2003.

[8]

F. Eisenbrand and T. Rothvoβ. EDF-schedulability of synchronous periodic task systems is coNP-hard. In SODA, pages 1029--1034, 2010.

Digital Library

[9]

O. Harrison and J. Waldron. AES encryption implementation and analysis on commodity graphics processing units. In CHES, pages 209--226, 2007.

Digital Library

[10]

D. A. O. Joppe W. Bos and D. Stefan. Fast implementations of aes on various platforms. Cryptology ePrint Archive, Report 2009/501, 2009. http://eprint.iacr.org/.

[11]

M. Joselli, M. Zamith, E. Clua, A. Montenegro, A. Conci, R. Leal-Toledo, L. Valente, B. Feijó, M. d'Ornellas, and C. Pozzer. Automatic dynamic task distribution between CPU and GPU for real-time systems. 11th IEEE Intl. Conf. on Comp. Science and Engineering (CSE 08)., 0:48--55, 2008.

Digital Library

[12]

M. Joselli, M. Zamith, E. Clua, A. Montenegro, R. Leal-Toledo, A. Conci, P. Pagliosa, L. Valente, and B. Feijó. An adaptative game loop architecture with automatic distribution of tasks between CPU and GPU. Comput. Entertain., 7, 2009.

Digital Library

[13]

A. Kerr, G. Diamos, and S. Yalamanchili. Modeling GPU-CPU workloads and systems. In GPGPU, pages 31--42, 2010.

Digital Library

[14]

C.-F. Kuo and Y.-C. Hai. Real-time task scheduling on heterogeneous two-processor systems. In C.-H. Hsu, L. Yang, J. Park, and S.-S. Yeo, editors, Algorithms and Architectures for Parallel Processing. 2010.

Digital Library

[15]

S. Lee, S. Min, and R. Eigenmann. OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In PPOPP, pages 101--110, 2009.

Digital Library

[16]

C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 20:46--61, 1973.

Digital Library

[17]

S. Manavski. CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In Signal Processing and Communications, 2007., 2007.

[18]

Y. Ogata, T. Endo, N. Maruyama, and S. Matsuoka. An efficient, model-based CPU-GPU heterogeneous FFT library. In IPDPS, pages 1--10, 2008.

[19]

S. Ohshima, K. Kise, T. Katagiri, and T. Yuba. Parallel processing of matrix multiplication in a CPU and GPU heterogeneous environment. In Proc. of the 7th intl. conf. on High performance computing for comp. science, VECPAR'06, pages 305--318, 2007.

Digital Library

[20]

S. Ramamurthy. Scheduling periodic hard real-time tasks with arbitrary deadlines on multiprocessors. In Proc. of the 23rd IEEE Real-Time Systems Symp., RTSS '02. IEEE Computer Society, 2002.

Digital Library

[21]

S. Rarnarnurthy and M. Moir. Static-priority periodic scheduling on multiprocessors. Proc. of the IEEE Real-Time Systems Symp., 0:69, 2000.

Digital Library

[22]

L. D. Rose, B. Homer, and D. Johnson. Detecting application load imbalance on high end massively parallel systems. In Euro-Par, pages 150--159, 2007.

Digital Library

[23]

S. Schneider, H. Andrade, B. Gedik, K.-L. Wu, and D. S. Nikolopoulos. Evaluation of streaming aggregation on parallel hardware architectures. In DEBS, pages 248--257, 2010.

Digital Library

[24]

M. Själander, A. Terechko, and M. Duranton. A look-ahead task management unit for embedded multi-core architectures. In DSD, pages 149--157, 2008.

Digital Library

[25]

N. R. Tallent and J. M. Mellor-Crummey. Identifying performance bottlenecks in work-stealing computations. IEEE Computer, 42(11):44--50, 2009.

Digital Library

[26]

W. Tang, Z. Lan, N. Desai, and D. Buettner. Fault-aware, utility-based job scheduling on Blue Gene/P systems. In CLUSTER, pages 1--10, 2009.

[27]

S. Tzeng, A. Patney, and J. D. Owens. Task management for irregular-parallel workloads on the GPU. In High Performance Graphics, pages 29--37, 2010.

Digital Library

Cited By

Kuhrt MKörber MSeeger B(2022)iGPU-Accelerated Pattern Matching on Event StreamsProceedings of the 18th International Workshop on Data Management on New Hardware10.1145/3533737.3535099(1-7)Online publication date: 12-Jun-2022
https://dl.acm.org/doi/10.1145/3533737.3535099
Zhang FZhang CYang LZhang SHe BLu WDu X(2021)Fine-Grained Multi-Query Stream Processing on Integrated ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306640732:9(2303-2320)Online publication date: 1-Sep-2021
https://doi.org/10.1109/TPDS.2021.3066407
Yeh TSinclair MBeckmann BRogers T(2021)Deadline-Aware Offloading for High-Throughput Accelerators2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00048(479-492)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00048
Show More Cited By

Index Terms

Processing data streams with hard real-time constraints on heterogeneous systems
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems

Recommendations

Scheduling processing of real-time data streams on heterogeneous multi-GPU systems
SYSTOR '12: Proceedings of the 5th Annual International Systems and Storage Conference

Processing vast numbers of data streams is a common problem in modern computer systems and is known as the "online big data problem." Adding hard real-time constraints to the processing makes the scheduling problem a very challenging task that this ...
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Euro-Par 2009

In the field of HPC, the current hardware trend is to design multiprocessor architectures featuring heterogeneous technologies such as specialized coprocessors (e.g. Cell/BE) or data-parallel accelerators (e.g. GPUs). Approaching the theoretical ...
Efficient heterogeneous execution on large multicore and accelerator platforms: Case study using a block tridiagonal solver

The algorithmic and implementation principles are explored in gainfully exploiting GPU accelerators in conjunction with multicore processors on high-end systems with large numbers of compute nodes, and evaluated in an implementation of a scalable block ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '11: Proceedings of the international conference on Supercomputing

May 2011

398 pages

ISBN:9781450301022

DOI:10.1145/1995896

General Chair:
David K. Lowenthal
University of Arizona
,
Program Chairs:
Bronis R. de Supinski
Lawrence Livermore National Laboratory
,
Sally A. McKee
Chalmers University of Technology

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICS '11

Sponsor:

SIGARCH

ICS '11: International Conference on Supercomputing

May 31 - June 4, 2011

Arizona, Tucson, USA

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
657
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)3

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kuhrt MKörber MSeeger B(2022)iGPU-Accelerated Pattern Matching on Event StreamsProceedings of the 18th International Workshop on Data Management on New Hardware10.1145/3533737.3535099(1-7)Online publication date: 12-Jun-2022
https://dl.acm.org/doi/10.1145/3533737.3535099
Zhang FZhang CYang LZhang SHe BLu WDu X(2021)Fine-Grained Multi-Query Stream Processing on Integrated ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306640732:9(2303-2320)Online publication date: 1-Sep-2021
https://doi.org/10.1109/TPDS.2021.3066407
Yeh TSinclair MBeckmann BRogers T(2021)Deadline-Aware Offloading for High-Throughput Accelerators2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00048(479-492)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00048
Zhang FYang LZhang SHe BLu WDu XGavrilovska AZadok E(2020)FineStreamProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489189(633-647)Online publication date: 15-Jul-2020
https://dl.acm.org/doi/10.5555/3489146.3489189
Zhang SZhang FWu YHe BJohns P(2020)Hardware-Conscious Stream ProcessingACM SIGMOD Record10.1145/3385658.338566248:4(18-29)Online publication date: 25-Feb-2020
https://dl.acm.org/doi/10.1145/3385658.3385662
Körber MEckstein JGlombiewski NSeeger B(2019)Event Stream Processing on Heterogeneous System ArchitectureProceedings of the 15th International Workshop on Data Management on New Hardware10.1145/3329785.3329933(1-10)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1145/3329785.3329933
Röger HMayer R(2019)A Comprehensive Survey on Parallelization and Elasticity in Stream ProcessingACM Computing Surveys10.1145/330384952:2(1-37)Online publication date: 30-Apr-2019
https://dl.acm.org/doi/10.1145/3303849
Belviranli MLee SVetter JBhuyan L(2018)JugglerACM SIGPLAN Notices10.1145/3200691.317849253:1(54-67)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178492
Belviranli MLee SVetter JBhuyan LKrall AGross T(2018)JugglerProceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3178487.3178492(54-67)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3178487.3178492
Dayarathna MPerera S(2018)Recent Advancements in Event ProcessingACM Computing Surveys10.1145/317043251:2(1-36)Online publication date: 13-Feb-2018
https://dl.acm.org/doi/10.1145/3170432
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents