Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

MorphStream: Adaptive Scheduling for Scalable Transactional Stream Processing on Multicores

Published: 30 May 2023 Publication History

Abstract

Transactional stream processing engines (TSPEs) differ significantly in their designs, but all rely on non- adaptive scheduling strategies for processing concurrent state transactions. Subsequently, none exploit multicore parallelism to its full potential due to complex workload dependencies. This paper introduces MorphStream, which adopts a novel approach by decomposing scheduling strategies into three dimensions and then strives to make the right decision along each dimension, based on analyzing the decision trade-offs under varying workload characteristics. Compared to the state-of-the-art, MorphStream achieves up to 3.4 times higher throughput and 69.1% lower processing latency for handling real-world use cases with complex and dynamically changing workload dependencies.

Supplemental Material

MP4 File
Presentation video

References

[1]
2018. Data Artisans Streaming Ledger Serializable ACID Transactions on Streaming Data, https://www.data-artisans.com/blog/serializable-acid-transactions-on-streaming-data. (2018).
[2]
(2018). Serializable ACID Transactions on Streaming Data. https://www.ververica.com/blog/serializable-acid-transactions-on-streaming-data
[3]
Lorenzo Affetti, Alessandro Margara, and Gianpaolo Cugola. 2017. FlowDB: Integrating Stream Processing and Consistent State Management. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems (Barcelona, Spain) (Debs '17). Acm, New York, NY, USA, 134--145. https://doi.org/10.1145/3093742.3093929
[4]
Lorenzo Affetti, Alessandro Margara, and Gianpaolo Cugola. 2020. TSpoon: Transactions on a stream processor. J. Parallel and Distrib. Comput. 140 (2020), 65--79. https://doi.org/10.1016/j.jpdc.2020.03.003
[5]
Arvind Arasu, Shivnath Babu, and Jennifer Widom. 2006. The CQL Continuous Query Language: Semantic Foundations and Query Execution. The VLDB Journal 15, 2 (June 2006), 121--142. https://doi.org/10.1007/s00778-004-0147-z
[6]
Arvind Arasu, Mitch Cherniack, Eduardo Galvez, David Maier, Anurag S. Maskey, Esther Ryvkina, Michael Stonebraker, and Richard Tibbetts. 2004. Linear Road: A Stream Data Management Benchmark. In Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30 (Toronto, Canada) (Vldb '04). VLDB Endowment, 480--491. http://dl.acm.org/citation.cfm?id=1316689.1316732
[7]
Arthur J. Bernstein and et al. 1999. Concurrency Control for Step-decomposed Transactions. Inf. Syst. 1999 24, 9 (Dec. 1999), 673--698. http://dl.acm.org/citation.cfm?id=337919.337922
[8]
Irina Botan, Younggoo Cho, Roozbeh Derakhshan, Nihal Dindar, Laura Haas, Kihong Kim, Chulwon Lee, Girish Mundada, Ming-Chien Shan, Nesime Tatbul, Ying Yan, Beomjin Yun, and Jin Zhang. 2009. Design and Implementation of the MaxStream Federated Stream Processing Architecture. (07 2009). https://doi.org/10.1007/978--3--642--14559--9 _2
[9]
Irina Botan, Peter M. Fischer, Donald Kossmann, and Nesime Tatbul. 2012. Transactional Stream Processing. In Proceedings of the 15th International Conference on Extending Database Technology (Berlin, Germany) (Edbt '12). Acm, New York, NY, USA, 204--215. https://doi.org/10.1145/2247596.2247622
[10]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015). http://flink.apache.org/
[11]
Ugur Cetintemel, Jiang Du, Tim Kraska, Samuel Madden, David Maier, John Meehan, Andrew Pavlo, Michael Stonebraker, Erik Sutherland, Nesime Tatbul, Kristin Tufte, Hao Wang, and Stanley Zdonik. 2014. S-Store: A Streaming NewSQL System for Big Velocity Applications. Proc. VLDB Endow. 7, 13 (Aug. 2014), 1633--1636. https://doi.org/10.14778/2733004.2733048
[12]
Neil Conway. 2008. Cisc 499*: Transactions and data stream processing. Apr 6 (2008), 28.
[13]
Bailu Ding, Surajit Chaudhuri, Johannes Gehrke, and Vivek Narasayya. 2021. DSB: a decision support benchmark for workload-driven and traditional database systems. Proceedings of the VLDB Endowment 14, 13 (2021), 3376--3388.
[14]
Anja Feldmann, Ming-Yang Kao, Jirí Sgall, and Shang-Hua Teng. 1993. Optimal Online Scheduling of Parallel Jobs with Dependencies. In Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing (San Diego, California, USA) (STOC '93). Association for Computing Machinery, New York, NY, USA, 642--651. https://doi.org/10.1145/167088.167254
[15]
Keir Fraser. 2004. Practical lock-freedom. Technical Report. University of Cambridge, Computer Laboratory.
[16]
STREAM Group et al. 2003. STREAM: The Stanford stream data manager. Technical Report. Stanford InfoLab.
[17]
Leslie A Hall, Andreas S Schulz, David B Shmoys, and Joel Wein. 1997. Scheduling to minimize average completion time: Off-line and on-line approximation algorithms. Mathematics of operations research 22, 3 (1997), 513--544.
[18]
Edwin SH Hou, Nirwan Ansari, and Hong Ren. 1994. A genetic algorithm for multiprocessor scheduling. IEEE Transactions on Parallel and Distributed systems 5, 2 (1994), 113--120.
[19]
Y.-K. Kwok and I. Ahmad. 1998. Benchmarking the task graph scheduling algorithms. In Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing. 531--537. https://doi.org/10.1109/IPPS.1998.669967
[20]
Yu-Kwong Kwok and Ishfaq Ahmad. 1999. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys (CSUR) 31, 4 (1999), 406--471.
[21]
John Meehan, Cansu Aslantas, Stan Zdonik, Nesime Tatbul, and Jiang Du. 2017. Data Ingestion for the Connected World. In CIDR.
[22]
John Meehan, Nesime Tatbul, Stan Zdonik, Cansu Aslantas, Ugur Cetintemel, Jiang Du, Tim Kraska, Samuel Madden, David Maier, Andrew Pavlo, Michael Stonebraker, Kristin Tufte, and Hao Wang. 2015. S-Store: Streaming Meets Transaction Processing. Proc. VLDB Endow. 8, 13 (Sept. 2015), 2134--2145. https://doi.org/10.14778/2831360.2831367
[23]
Nikola S Nikolov and Alexandre Tarassov. 2006. Graph layering by promotion of nodes. Discrete Applied Mathematics 154, 5 (2006), 848--860.
[24]
Yves Robert. 2011. Task Graph Scheduling. Springer US, Boston, MA, 2013--2025. https://doi.org/10.1007/978-0--387-09766--4_42
[25]
Ozlem Ceren Sahin, Pinar Karagoz, and Nesime Tatbul. 2019. Streaming Event Detection in Microblogs: Balancing Accuracy and Performance. In Web Engineering, Maxim Bakaev, Flavius Frasincar, and In Young Ko (Eds.). Springer International Publishing, Cham, 123--138.
[26]
Dennis Shasha and et al. 1995. Transaction Chopping: Algorithms and Performance Studies. ACM Trans. Database Syst. 1995 20, 3 (Sept. 1995), 325--363. https://doi.org/10.1145/211414.211427
[27]
Shuhao Zhang, Jiong He, Amelie Chi Zhou, and Bingsheng He. 2019. Briskstream: Scaling Data Stream Processing on Multicore Architectures. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). ACM, Amsterdam, Netherlands, 705--722. https://doi.org/10.1145/3299869.3300067
[28]
Michael Stonebraker, Ugur Çetintemel, and Stan Zdonik. 2005. The 8 Requirements of Real-time Stream Processing. SIGMOD Rec. 34, 4 (Dec. 2005), 42--47. https://doi.org/10.1145/1107499.1107504
[29]
Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. [n. d.]. The End of an Architectural Era: (It's Time for a Complete Rewrite). In Proc VLDB Endow. 2007.
[30]
Jun Tan and Ming Zhong. 2014-05. An Online Bidding System (OBS) under Price Match Mechanism for Commercial Procurement. Applied Mechanics and Materials 556--562 (2014-05), 6540--6543. https://doi.org/10.4028/www.scientific.net/AMM.556--562.6540
[31]
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, et al. 2014. Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 147--156. http://storm.apache.org/
[32]
Peter A. Tucker, David Maier, Tim Sheard, and Leonidas Fegaras. 2003. Exploiting punctuation semantics in continuous data streams. IEEE Transactions on Knowledge and Data Engineering 15, 3 (2003), 555--568.
[33]
Peter A. Tucker, David Maier, Tim Sheard, and Leonidas Fegaras. 2003-03. Exploiting Punctuation Semantics in Continuous Data Streams. IEEE Trans. on Knowl. and Data Eng. 15, 3 (2003-03), 555--568. https://doi.org/10.1109/tkde.2003.1198390
[34]
Di Wang, Elke A. Rundensteiner, and Richard T. Ellison, III. 2011. Active Complex Event Processing over Event Streams. Proc. VLDB Endow. 4, 10 (July 2011), 634--645. https://doi.org/10.14778/2021017.2021021
[35]
Gerhard Weikum and Gottfried Vossen. 2001. Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery. Elsevier.
[36]
Yingjun Wu, Wentian Guo, Chee-Yong Chan, and Kian-Lee Tan. 2017. Fast Failure Recovery for Main-Memory DBMSs on Multicores. In SIGMOD '17 (Chicago, Illinois, USA) (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 267--281. https://doi.org/10.1145/3035918.3064011
[37]
Jielong Xu, Zhenhua Chen, Jian Tang, and Sen Su. 2014. T-Storm: Traffic-Aware Online Scheduling in Storm. In Proceedings of the 2014 IEEE 34th International Conference on Distributed Computing Systems (Icdcs '14). IEEE Computer Society, Washington, DC, USA, 535--544. https://doi.org/10.1109/icdcs.2014.61
[38]
Chang Yao, Divyakant Agrawal, Gang Chen, Qian Lin, Beng Chin Ooi, Weng-Fai Wong, and Meihui Zhang. 2016. Exploiting Single-Threaded Model in Multi-Core In-Memory Systems. IEEE Transactions on Knowledge and Data Engineering 28, 10 (2016), 2635--2650. https://doi.org/10.1109/TKDE.2016.2578319
[39]
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the twenty-fourth ACM symposium on operating systems principles. 423--438.
[40]
Xianzhi Zeng and Shuhao Zhang. 2023. Parallelizing Stream Compression for IoT Applications on Asymmetric Multicores. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE.
[41]
Shuhao Zhang, Yingjun Wu, Feng Zhang, and Bingsheng He. 2020. Towards concurrent stateful stream processing on multicore processors. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1537--1548.
[42]
Yu Zhang, Feng Zhang, Hourun Li, Shuhao Zhang, and Xiaoyong Du. 2023. CompressStreamDB: Fine-Grained Adaptive Stream Processing without Decompression. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE.

Cited By

View all
  • (2024)MorphStream: Scalable Processing of Transactions over Streams2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00434(5485-5488)Online publication date: 13-May-2024
  • (2024)Fast Parallel Recovery for Transactional Stream Processing on Multicores2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00122(1478-1491)Online publication date: 13-May-2024
  • (2024)General-purpose data stream processing on heterogeneous architectures with WindFlowJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104782184:COnline publication date: 27-Feb-2024
  • Show More Cited By

Index Terms

  1. MorphStream: Adaptive Scheduling for Scalable Transactional Stream Processing on Multicores

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 1, Issue 1
    PACMMOD
    May 2023
    2807 pages
    EISSN:2836-6573
    DOI:10.1145/3603164
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 May 2023
    Published in PACMMOD Volume 1, Issue 1

    Permissions

    Request permissions for this article.

    Author Tags

    1. multicore
    2. stream processing
    3. transaction

    Qualifiers

    • Research-article

    Funding Sources

    • DFG Priority Program
    • German Federal Ministry of Education and Research (BMBF) under grants BIFOLD - Berlin Institute for the Foundations of Learning and Data
    • German Federal Ministry of Education and Research (BMBF) under grants BBDC - Berlin Big Data Center
    • National Research Foundation, Singapore and Infocomm Media Development Authority under its Future Communications Research & Development Programme
    • SUTD Start-up Research Grant
    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)177
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 18 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)MorphStream: Scalable Processing of Transactions over Streams2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00434(5485-5488)Online publication date: 13-May-2024
    • (2024)Fast Parallel Recovery for Transactional Stream Processing on Multicores2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00122(1478-1491)Online publication date: 13-May-2024
    • (2024)General-purpose data stream processing on heterogeneous architectures with WindFlowJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104782184:COnline publication date: 27-Feb-2024
    • (2023)A survey on transactional stream processingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00814-z33:2(451-479)Online publication date: 27-Sep-2023

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media