Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access

S-Store: streaming meets transaction processing

Published: 01 September 2015 Publication History

Abstract

Stream processing addresses the needs of real-time applications. Transaction processing addresses the coordination and safety of short atomic computations. Heretofore, these two modes of operation existed in separate, stove-piped systems. In this work, we attempt to fuse the two computational paradigms in a single system called S-Store. In this way, S-Store can simultaneously accommodate OLTP and streaming applications. We present a simple transaction model for streams that integrates seamlessly with a traditional OLTP system, and provides both ACID and stream-oriented guarantees. We chose to build S-Store as an extension of H-Store - an open-source, in-memory, distributed OLTP database system. By implementing S-Store in this way, we can make use of the transaction processing facilities that H-Store already provides, and we can concentrate on the additional features that are needed to support streaming. Similar implementations could be done using other main-memory OLTP platforms. We show that we can actually achieve higher throughput for streaming workloads in S-Store than an equivalent deployment in H-Store alone. We also show how this can be achieved within H-Store with the addition of a modest amount of new functionality. Furthermore, we compare S-Store to two state-of-the-art streaming systems, Esper and Apache Storm, and show how S-Store can sometimes exceed their performance while at the same time providing stronger correctness guarantees.

References

[1]
Apache Flink. http://flink.apache.org/.
[2]
Apache Samza. http://samza.apache.org/.
[3]
Esper. http://www.espertech.com/esper/.
[4]
TIBCO StreamBase. http://www.streambase.com/.
[5]
Trident Tutorial. http://storm.apache.org/documentation/Trident-tutorial.html.
[6]
VoltDB. http://www.voltdb.com/.
[7]
D. Abadi et al. Aurora: A New Model and Architecture for Data Stream Management. VLDB Journal, 12(2):120--139, 2003.
[8]
D. Abadi et al. The Design of the Borealis Stream Processing Engine. In CIDR, pages 277--289, 2005.
[9]
T. Akidau et al. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. PVLDB, 6(11):1033--1044, 2013.
[10]
A. Arasu et al. STREAM: The Stanford Data Stream Management System. In Data Stream Management: Processing High-Speed Data Streams, 2004.
[11]
M. Balazinska et al. Fault-tolerance in the Borealis Distributed Stream Processing System. ACM TODS, 33(1):3:1--3:44, 2008.
[12]
I. Botan et al. SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems. PVLDB, 3(1):232--243, 2010.
[13]
I. Botan et al. Transactional Stream Processing. In EDBT, pages 204--215, 2012.
[14]
U. Cetintemel et al. S-Store: A Streaming NewSQL System for Big Velocity Applications (Demonstration). PVLDB, 7(13):1633--1636, 2014.
[15]
B. Chandramouli et al. Trill: A High-Performance Incremental Query Processor for Diverse Analytics. PVLDB, 8(4):401--412, 2014.
[16]
S. Chandrasekaran et al. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In CIDR, 2003.
[17]
J. Chen et al. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In SIGMOD, pages 379--390, 2000.
[18]
C. Diaconu et al. Hekaton: SQL Server's Memory-Optimized OLTP Engine. In SIGMOD, pages 1243--1254, 2013.
[19]
A. Elmore et al. A Demonstration of the Big DAWG Poly store System (Demonstration). PVLDB, 8(12):1908--1919, 2015.
[20]
R. C. Fernandez et al. Integrating Scale-out and Fault-tolerance in Stream Processing using Operator State Management. In SIGMOD, pages 725--736, 2013.
[21]
R. C. Fernandez et al. Making State Explicit for Imperative Big Data Processing. In USENIX ATC, pages 49--60, 2014.
[22]
L. Golab et al. On Concurrency Control in Sliding Window Queries over Data Streams. In EDBT, pages 608--626, 2006.
[23]
L. Golab and T. Johnson. Consistency in a Stream Warehouse. In CIDR, pages 114--122, 2011.
[24]
L. Golab and T. Ozsu. Issues in Data Stream Management. ACM SIGMOD Record, 32(2):5--14, 2003.
[25]
J.-H. Hwang et al. High-Availability Algorithms for Distributed Stream Processing. In ICDE, pages 779--790, 2005.
[26]
M. Isard et al. Quincy: Fair Scheduling for Distributed Computing Clusters. In SOSP, pages 261--276, 2009.
[27]
N. Jain et al. Towards a Streaming SQL Standard. PVLDB, 1(2):1379--1390, 2008.
[28]
T. Johnson et al. Query-aware Partitioning for Monitoring Massive Network Data Streams. In SIGMOD, pages 1135--1146, 2008.
[29]
R. Kallman et al. H-Store: A High-Performance, Distributed Main Memory Transaction Processing System. PVLDB, 1(2):1496--1499, 2008.
[30]
A. Lerner and D. Shasha. The Virtues and Challenges of Ad Hoc + Streams Querying in Finance. IEEE Data Engineering Bulletin, 26(1):49--56, 2003.
[31]
N. Malviya et al. Rethinking Main Memory OLTP Recovery. In ICDE, pages 604--615, 2014.
[32]
D. G. Murray et al. Naiad: A Timely Dataflow System. In SOSP, pages 439--455, 2013.
[33]
L. Neumeyer et al. S4: Distributed Stream Computing Platform. In KDCloud, pages 170--177, 2010.
[34]
A. Pavlo et al. Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems. In SIGMOD, pages 61--72, 2012.
[35]
M. A. Shah et al. Highly Available, Fault-tolerant, Parallel Dataflows. In SIGMOD, pages 827--838, 2004.
[36]
A. Silberschatz et al. Database System Concepts. McGraw-Hill, 2010.
[37]
A. Toshniwal et al. Storm @Twitter. In SIGMOD, pages 147--156, 2014.
[38]
D. Wang et al. Active Complex Event Processing over Event Streams. PVLDB, 4(10):634--645, 2011.
[39]
G. Weikum and G. Vossen. Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Morgan Kaufmann, 2001.
[40]
M. Zaharia et al. Discretized Streams: Fault-tolerant Streaming Computation at Scale. In SOSP, pages 423--438, 2013.

Cited By

View all
  • (2024)Texera: A System for Collaborative and Interactive Data Analytics Using WorkflowsProceedings of the VLDB Endowment10.14778/3681954.368202217:11(3580-3588)Online publication date: 1-Jul-2024
  • (2024)Challenges in Empirically Testing Memory Persistency ModelsProceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results10.1145/3639476.3639765(82-86)Online publication date: 14-Apr-2024
  • (2024)A survey on the evolution of stream processing systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00819-833:2(507-541)Online publication date: 1-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 8, Issue 13
Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii
September 2015
144 pages
ISSN:2150-8097
  • Editors:
  • Chen Li,
  • Volker Markl
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2015
Published in PVLDB Volume 8, Issue 13

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)4
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Texera: A System for Collaborative and Interactive Data Analytics Using WorkflowsProceedings of the VLDB Endowment10.14778/3681954.368202217:11(3580-3588)Online publication date: 1-Jul-2024
  • (2024)Challenges in Empirically Testing Memory Persistency ModelsProceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results10.1145/3639476.3639765(82-86)Online publication date: 14-Apr-2024
  • (2024)A survey on the evolution of stream processing systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00819-833:2(507-541)Online publication date: 1-Mar-2024
  • (2024)A survey on transactional stream processingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00814-z33:2(451-479)Online publication date: 1-Mar-2024
  • (2023)Transactional Panorama: A Conceptual Framework for User Perception in Analytical Visual InterfacesProceedings of the VLDB Endowment10.14778/3583140.358316216:6(1494-1506)Online publication date: 1-Feb-2023
  • (2023)Memento: A Framework for Detectable Recoverability in Persistent MemoryProceedings of the ACM on Programming Languages10.1145/35912327:PLDI(292-317)Online publication date: 6-Jun-2023
  • (2023)MorphStream: Adaptive Scheduling for Scalable Transactional Stream Processing on MulticoresProceedings of the ACM on Management of Data10.1145/35889131:1(1-26)Online publication date: 30-May-2023
  • (2022)FriesProceedings of the VLDB Endowment10.14778/3565816.356582716:2(256-268)Online publication date: 1-Oct-2022
  • (2022)Portals: An Extension of Dataflow Streaming for Stateful ServerlessProceedings of the 2022 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3563835.3567664(153-171)Online publication date: 29-Nov-2022
  • (2022)Stream processing with dependency-guided synchronizationProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508413(1-16)Online publication date: 2-Apr-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media