Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3093742.3093929acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article

FlowDB: Integrating Stream Processing and Consistent State Management

Published: 08 June 2017 Publication History

Abstract

Recent advances in stream processing technologies led to their adoption in many large companies, where they are becoming a core element in the data processing stack. In these settings, stream processors are often used in combination with various kinds of data management frameworks to build software architectures that combine data storage, processing, retrieval, and mining. However, the adoption of separate and heterogeneous subsystems makes these architectures overmuch complex, and this hinders the design, development, maintenance, and evolution of the overall system. We address this issue by proposing a new model that integrates data management within a distributed stream processor. The model enables individual stream processing operators to persist data and make it visible and queryable from external components. It offers flexible mechanisms to control the consistency of data, including transactional updates plus ordering and integrity constraints.
The paper contributes to the research on stream processing in various ways: we introduce a new model that has the potential to simplify complex data-intensive applications by integrating data management capabilities within a stream processing system; we define data consistency guarantees and show how they are enforced within this new model; we implement the model into the FlowDB prototype, and study its overhead with respect to a pure stream processing system using real world case studies and synthetic workloads. Finally, we further prove the benefits of the proposed model by showing that FlowDB can outperform a state-of-the-art, in-memory distributed database in data management tasks.

References

[1]
Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Mitch Cherniack, Jeong hyon Hwang, Wolfgang Lindner, Anurag S. Maskey, Er Rasin, Esther Ryvkina, Nesime Tatbul, Ying Xing, and Stan Zdonik. 2005. The design of the borealis stream processing engine. In Proceedings of the Conference on Innovative Data Systems Research (CIDR '05). Asilomar, CA, 277--289.
[2]
Daniel J. Abadi, Don Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. 2003. Aurora: A New Model and Architecture for Data Stream Management. The VLDB Journal 12, 2 (2003), 120--139.
[3]
A. Adya, B. Liskov, and P. O'Neil. 2000. Generalized isolation level definitions. In Proceedings of the International Conference on Data Engineering (ICDE '00). IEEE, 67--78.
[4]
Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle. 2015. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-scale, Unbounded, Out-of-order Data Processing. The VLDB journal 8, 12(2015), 1792--1803.
[5]
Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinländer, Matthias J. Sax, Sebastian Schelter, Mareike Höger, Kostas Tzoumas, and Daniel Warneke. 2014. The Stratosphere Platform for Big Data Analytics. The VLDB Journal 23, 6 (2014), 939--964.
[6]
Arvind Arasu, Shivnath Babu, and Jennifer Widom. 2006. The CQL Continuous Query Language: Semantic Foundations and Query Execution. The VLDB Journal 15, 2 (2006), 121--142.
[7]
Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. 2002. Models and Issues in Data Stream Systems. In Proceedings of the Symposium on Principles of Database Systems (PODS '02). ACM, New York, NY, USA, 1--16.
[8]
Engineer Bainomugisha, Andoni Lombide Carreton, Tom van Cutsem, Stijn Mostinckx, and Wolfgang de Meuter. 2013. A Survey on Reactive Programming. Comput. Surveys 45, 4 (2013), 52:1--52:34.
[9]
Kyle Banker. 2011. MongoDB in Action. Manning Publications Co., Greenwich, CT, USA.
[10]
Lars Brenna, Alan Demers, Johannes Gehrke, Mingsheng Hong, Joel Ossher, Biswanath Panda, Mirek Riedewald, Mohit Thatte, and Walker White. 2007. Cayuga: A High-performance Event Processing Engine. In Proceedings of the International Conference on Management of Data (SIGMOD'07). ACM, New York, NY, USA, 1100--1102.
[11]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink: Stream and Batch Processing in a Single Engine. IEEE Data Engineering Bullettin 38, 4 (2015), 28--38.
[12]
Josiah L. Carlson. 2013. Pedis in Action. Manning Publications Co., Greenwich, CT, USA.
[13]
Ugur Cetintemel, Jiang Du, Tim Kraska, Samuel Madden, David Maier, John Meehan, Andrew Pavlo, Michael Stonebraker, Erik Sutherland, Nesime Tatbul, and others. 2014. S-Store: a streaming NewSQL system for big velocity applications. Proceedings of VLDB 7, 13 (2014), 1633--1636.
[14]
Gianpaolo Cugola and Alessandro Margara. 2010. TESLA: A Formally Defined Event Specification Language. In Proceedings of the International Conference on Distributed Event-Based Systems (DEBS'10). ACM, New York, NY, USA, 50--61.
[15]
Gianpaolo Cugola and Alessandro Margara. 2012. Processing Flows of Information: From Data Stream to Complex Event Processing. Comput. Surveys 44, 3, Article 15 (2012), 15:1--15:62 pages.
[16]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (2008), 107--113.
[17]
Joscha Drechsler, Guido Salvaneschi, Ragnar Mogk, and Mira Mezini. 2014. Distributed REScala: An Update Algorithm for Distributed Reactive Programming. In Proceedings of the International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA '14). ACM, New York, NY, USA, 361--376.
[18]
Opher Etzion and Peter Niblett. 2010. Event Processing in Action. Manning Publications, Greenwich, CT, USA.
[19]
Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki, and Peter R. Pietzuch. 2014. Making State Explicit for Imperative Big Data Processing. In USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, USA, 49--60.
[20]
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proceedings of the International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 239--250.
[21]
H. T. Kung and John T. Robinson. 1981. On Optimistic Methods for Concurrency Control. ACM Transactions on Database Systems 6, 2 (1981), 213--226.
[22]
David C. Luckham. 2001. The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems. Addison-Wesley, Boston, MA, USA.
[23]
Nirmesh Malviya, Ariel Weisberg, Samuel Madden, and Michael Stonebraker. 2014. Rethinking main memory oltp recovery. In Proceedings of the International Conference on Data Engineering (ICDE 2014). IEEE, 604--615.
[24]
Alessandro Margara and Guido Salvaneschi. 2014. We Have a DREAM: Distributed Reactive Programming with Consistency Guarantees. In Proceedings of the International Conference on Distributed Event-Based Systems (DEBS '14). ACM, New York, NY, USA, 142--153.
[25]
Nathan Marz and James Warren. 2015. Big Data: Principles and best practices of scalable realtime data systems. Manning Publications Co., Greenwich, CT, USA.
[26]
Michael Stonebraker, Samuel Madden, Daniel J Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. 2007. The end of an architectural era (It's time for a complete rewrite). In Proceedings of VLDB (VLDB '07). VLDB Endowment, 1150--1160.
[27]
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. 2014. Storm@Twitter. In Proceedings of the International Conference on Management of Data (SIGMOD '14). ACM, New York, NY, USA, 147--156.
[28]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the Conference on Networked Systems Design and Implementation (NSDI'12). USENIX Association, Berkeley, CA, USA, 2--2.
[29]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proceedings of the Conference on Hot Topics in Cloud Computing (HotCloud'10). USENIX Association, Berkeley, CA, USA, 10--10.
[30]
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized Streams: Fault-tolerant Streaming Computation at Scale. In Proceedings of the Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 423--438.

Cited By

View all
  • (2024)Safe Shared State in Dataflow SystemsProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666029(30-41)Online publication date: 24-Jun-2024
  • (2024)MorphStream: Scalable Processing of Transactions over Streams2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00434(5485-5488)Online publication date: 13-May-2024
  • (2024)Fast Parallel Recovery for Transactional Stream Processing on Multicores2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00122(1478-1491)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. FlowDB: Integrating Stream Processing and Consistent State Management

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems
      June 2017
      393 pages
      ISBN:9781450350655
      DOI:10.1145/3093742
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 June 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      DEBS '17

      Acceptance Rates

      DEBS '17 Paper Acceptance Rate 22 of 60 submissions, 37%;
      Overall Acceptance Rate 145 of 583 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)19
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 22 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Safe Shared State in Dataflow SystemsProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666029(30-41)Online publication date: 24-Jun-2024
      • (2024)MorphStream: Scalable Processing of Transactions over Streams2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00434(5485-5488)Online publication date: 13-May-2024
      • (2024)Fast Parallel Recovery for Transactional Stream Processing on Multicores2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00122(1478-1491)Online publication date: 13-May-2024
      • (2024)A survey on transactional stream processingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00814-z33:2(451-479)Online publication date: 1-Mar-2024
      • (2023)MorphStream: Adaptive Scheduling for Scalable Transactional Stream Processing on MulticoresProceedings of the ACM on Management of Data10.1145/35889131:1(1-26)Online publication date: 30-May-2023
      • (2023)Challenges in Prototyping a Cloud-Native Billing Application for 5G with Stream ProcessingProceedings of the International Workshop on Big Data in Emergent Distributed Environments10.1145/3579142.3594292(1-7)Online publication date: 18-Jun-2023
      • (2022)S-QUERY: Opening the Black Box of Internal Stream Processor State2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00103(1314-1327)Online publication date: May-2022
      • (2022)Transactions across serverless functions leveraging stateful dataflowsInformation Systems10.1016/j.is.2022.102015108(102015)Online publication date: Sep-2022
      • (2021)Distributed transactions on serverless stateful functionsProceedings of the 15th ACM International Conference on Distributed and Event-based Systems10.1145/3465480.3466920(31-42)Online publication date: 28-Jun-2021
      • (2020)Towards Concurrent Stateful Stream Processing on Multicore Processors2020 IEEE 36th International Conference on Data Engineering (ICDE)10.1109/ICDE48307.2020.00136(1537-1548)Online publication date: Apr-2020
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media