Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

C-Stream: A Co-routine-Based Elastic Stream Processing Engine

Published: 27 April 2018 Publication History

Abstract

Stream processing is a computational paradigm for on-the-fly processing of live data. This paradigm lends itself to implementations that can provide high throughput and low latency by taking advantage of various forms of parallelism that are naturally captured by the stream processing model of computation, such as pipeline, task, and data parallelism. In this article, we describe the design and implementation of C-Stream, which is an elastic stream processing engine. C-Stream encompasses three unique properties. First, in contrast to the widely adopted event-based interface for developing streaming operators, C-Stream provides an interface wherein each operator has its own driver loop and relies on data availability application programming interfaces (APIs) to decide when to perform its computations. This self-control-based model significantly simplifies the development of operators that require multiport synchronization. Second, C-Stream contains a dynamic scheduler that manages the multithreaded execution of the operators. The scheduler, which is customizable via plug-ins, enables the execution of the operators as co-routines, using any number of threads. The base scheduler implements back-pressure, provides data availability APIs, and manages preemption and termination handling. Last, C-Stream varies the degree of parallelism to resolve bottlenecks by both dynamically changing the number of threads used to execute an application and adjusting the number of replicas of data-parallel operators. We provide an experimental evaluation of C-Stream. The results show that C-Stream is scalable, highly customizable, and can resolve bottlenecks by dynamically adjusting the level of data parallelism used.

References

[1]
Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag S. Maskey, Er Rasin, Esther Ryvkina, Nesime Tatbul, Ying Xing, and Stan Zdonik. 2005. The design of the Borealis stream processing engine. In Biennial Conference on Innovative Data Systems Research (CIDR’05). 277--289.
[2]
Henrique Andrade, Buğra Gedik, and Deepak Turaga. 2014. Application development--data flow programming. In Fundamentals of Stream Processing: Application Design, Systems, and Analytics. Cambridge University Press, New York, NY.
[3]
Arvind Arasu, Brian Babcock, Shivnath Babu, Mayur Datar, Keith Ito, Itaru Nishizawa, Justin Rosenstein, and Jennifer Widom. 2003. STREAM: The Stanford stream data manager. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’03). 665--665.
[4]
Andrey Brito, Christof Fetzer, Heiko Sturzrehm, and Pascal Felber. 2008. Speculative out-of-order event processing with software transaction memory. In Proceedings of the ACM International Conference on Distributed Event-based Systems (DEBS’08). 265--275.
[5]
Don Carney, Uğur Çetintemel, Alex Rasin, and Stan Zdonik. 2003. Operator scheduling in a data stream manager. In Proceedings of the International Conference on Very Large Databases (VLDB’03).
[6]
Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel R. Madden, Fred Reiss, and Mehul A. Shah. 2003. TelegraphCQ: Continuous dataflow processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’03). 668--668.
[7]
Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An object-oriented approach to non-uniform cluster computing. In Proceedings of the ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA’05). 519--538.
[8]
Cilk++. 2015. Cilk++. Retrieved May 2015 from https://cilkplus.org.
[9]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51, 1, 107--113.
[10]
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, and Rick Rasmussen. 1990. The gamma database machine project. IEEE Transactions on Knowledge and Data Engineering 2, 1, 44--62.
[11]
Saturnino Garcia, Donghwan Jeon, Christopher M. Louie, and Michael Bedford Taylor. 2011. Kremlin: Rethinking and rebooting gprof for the multicore age. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). 458--469.
[12]
Manoj K. Garg, Duk-Jin Kim, Deepak S. Turaga, and Balakrishnan Prabhakaran. 2010. Multimodal analysis of body sensor network data streams for real-time healthcare. In Multimedia Information Retrieval. 469--478.
[13]
Buğra Gedik and Henrique Andrade. 2012. A model-based framework for building extensible, high performance stream processing middleware and programming language for IBM infosphere streams. Software: Practice and Experience Journal 11, 42.
[14]
Buğra Gedik, Henrique Andrade, Kun-Lung Wu, Philip S. Yu, and Myungcheol Doo. 2008. SPADE: The system s declarative stream processing engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’08). 1123--1134.
[15]
Buğra Gedik, Scott Schneider, Martin Hirzel, and Kun-Lung Wu. 2014. Elastic scaling for data stream processing. IEEE Transactions on Parallel and Distributed Systems 25, 6, 1447--1463.
[16]
Goetz Graefe. 1990. Encapsulation of parallelism in the volcano query processing system. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’90). 102--111.
[17]
Thomas Heinze, Zbigniew Jerzak, Gregor Hackenbroich, and Christof Fetzer. 2014. Latency-aware elastic scaling for distributed data stream processing systems. In Proceedings of the ACM International Conference on Distributed Event-based Systems (DEBS’14). 13--22.
[18]
Thomas Heinze, Yuanzhen Ji, Yinying Pan, Franz Josef, Grueneberger Zbigniew, and Jerzak Christof Fetzer. 2013. Elastic complex event processing under varying query load. In International Workshop on Big Dynamic Distributed Data (BD3’13). 25.
[19]
Thomas Heinze, Valerio Pappalardo, Zbigniew Jerzak, and Christof Fetzer. 2014. Auto-scaling techniques for elastic data stream processing. In Proceedings of the ACM International Conference on Distributed Event-based Systems (DEBS’14). 318--321.
[20]
Martin Hirzel, Henrique Andrade, Buğra Gedik, Gabriela Jacques-Silva, Rohit Khandekar, Vibhore Kumar, Mark Mendell, Howard Nasgaard, Scott Schneider, Robert Soule, and Kun-Lung Wu. 2013. Streams processing language: Analyzing big data in motion. IBM Journal of Research and Development 57, 3/4, 7:1--7:11.
[21]
Martin Hirzel, Robert Soule, Scott Schneider, Buğra Gedik, and Robert Grimm. 2014. A catalog of streaming optimizations. ACM Computing Surveys 4, 46.
[22]
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed data-parallel programs from sequential building blocks. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys’07). 59--72.
[23]
Simon Loesing, Martin Hentschel, Tim Kraska, and Donald Kossmann. 2012. Stormy: An elastic and highly available streaming service in the cloud. In EDBT/ICDT Workshops. 55--60.
[24]
André Martin, Andrey Brito, and Christof Fetzer. 2014. Scalable and elastic realtime click stream analysis using streammine3G. In DEBS’14. 198--205.
[25]
OpenMP. 2015. OpenMP. Retrieved May 2015 from http://openmp.org.
[26]
S4. 2015. S4 project. Retrieved May 2015 from http://incubator.apache.org/s4.
[27]
Samza. 2015. Apache Samza project. Retrieved May 2015 from http://incubator.apache.org/samza.
[28]
Douglas L. Schales, Mihai Christodorescu, Josyula R. Rao, Reiner Sailer, Marc Ph. Stoecklin, Wietse Venema, and Ting Wang. 2014. Stream computing for large-scale, multi-channel cyber threat analytics. In IEEE International Conference on Information Reuse and Integration (IRI’14). 8--15.
[29]
Scott Schneider, Martin Hirzel, and Buğra Gedik an Kun-Lung Wu. 2015. Safe data parallelism for general streaming. IEEE Transactions on Computers 64, 2, 504--517.
[30]
Storm. 2015. Apache Storm project. Retrieved May 2015 from http://storm-project.net/.
[31]
StreamBase. 2015. Tibco Streambase. Retrieved May 2015 from http://www.streambase.com.
[32]
Yuzhe Tang and Buğra Gedik. 2013. Autopipelining for data stream processing. IEEE Transactions on Parallel and Distributed Systems 24, 12, 2344--2354.
[33]
William Thies, Michal Karczmarek, and Saman P. Amarasinghe. 2002. StreamIt: A language for streaming applications. In Proceedings of the International Conference on Compiler Construction (CC’02). 179--196.
[34]
Gautam Upadhyaya, Vijay S. Pai, and Samuel P. Midkiff. 2007. Expressing and exploiting concurrency in networked applications in Aspen. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’07). 13--23.
[35]
Petros Zerfos, Mudhakar Srivatsa, Hao Yu, D. Dennerline, Hubertus Franke, and Dakshi Agrawal. 2013. Platform and applications for massive-scale streaming network analytics. IBM Journal of Research and Development 57, 3/4, 11:1--11:13.
[36]
Xiaolan J. Zhang, Henrique Andrade, Buğra Gedik, Richard King, John Morar, Senthil Nathan, Yoonho Park, Raju Pavuluri, Edward Pring, Randall Schnier, Philippe Selo, Michael Spicer, Chitra Venkatramani, Andy Frenkiel, Wim De Pauw, Michael Pfiefer, Paul Allen, Norman Cohen, and Kun-Lung Wu. 2009. Implementing a high-volume, low-latency market data processing system on commodity hardware using IBM middleware. In Workshop on High-Performance Computational Finance at Supercomputing.

Cited By

View all
  • (2023)On the building of efficient self-adaptable health data science services by using dynamic patternsFuture Generation Computer Systems10.1016/j.future.2023.03.039145(478-495)Online publication date: Aug-2023
  • (2022)An integrated approach of designing functionality with security for distributed cyber-physical systemsThe Journal of Supercomputing10.1007/s11227-022-04481-978:13(14813-14845)Online publication date: 1-Sep-2022
  • (2021)Self‐adaptation on parallel stream processing: A systematic reviewConcurrency and Computation: Practice and Experience10.1002/cpe.675934:6Online publication date: 7-Dec-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing
ACM Transactions on Parallel Computing  Volume 4, Issue 3
September 2017
120 pages
ISSN:2329-4949
EISSN:2329-4957
DOI:10.1145/3175004
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 April 2018
Accepted: 01 January 2018
Revised: 01 May 2017
Received: 01 September 2015
Published in TOPC Volume 4, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. C-Stream
  2. elastic stream processing engine

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • FP7 European Commission, Marie Curie Actions

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)On the building of efficient self-adaptable health data science services by using dynamic patternsFuture Generation Computer Systems10.1016/j.future.2023.03.039145(478-495)Online publication date: Aug-2023
  • (2022)An integrated approach of designing functionality with security for distributed cyber-physical systemsThe Journal of Supercomputing10.1007/s11227-022-04481-978:13(14813-14845)Online publication date: 1-Sep-2022
  • (2021)Self‐adaptation on parallel stream processing: A systematic reviewConcurrency and Computation: Practice and Experience10.1002/cpe.675934:6Online publication date: 7-Dec-2021
  • (2020)Towards Profile-Guided Optimization for Safe and Efficient Parallel Stream Processing in Rust2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD49847.2020.00047(289-296)Online publication date: Sep-2020

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media