Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3380576acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Parallel Index-based Stream Join on a Multicore CPU

Published: 31 May 2020 Publication History

Abstract

Indexing sliding window content to enhance the performance of streaming queries can be greatly improved by utilizing the computational capabilities of a multicore processor. Conventional indexing data structures optimized for frequent search queries on a prestored dataset do not meet the demands of indexing highly dynamic data as in streaming environments. In this paper, we introduce an index data structure, called the partitioned in-memory merge tree, to address the challenges that arise when indexing highly dynamic data, which are common in streaming settings. Utilizing the specific pattern of streaming data and the distribution of queries, we propose a low-cost and effective concurrency control mechanism to meet the demands of high-rate update queries. To complement the index, we design an algorithm to realize a parallel index-based stream join that exploits the computational power of multicore processors. Our experiments using an octa-core processor show that our parallel stream join achieves up to 5.5 times higher throughput than a single-threaded approach.

Supplementary Material

MP4 File (3318464.3380576.mp4)
Presentation Video

References

[1]
V. Alvarez, S. Richter, Xiao Chen, and J. Dittrich. 2015. A comparison of adaptive radix trees and hash tables. In ICDE. 1227--1238.
[2]
Shivnath Babu and Jennifer Widom. 2001. Continuous queries over data streams. ACM Sigmod Record (2001), 109--120.
[3]
R. Bayer and E. McCreight. 1970. Organization and Maintenance of Large Ordered Indices. In SIGFIDET. 107--141.
[4]
R. Bayer and M. Schkolnick. 1977. Concurrency of operations on B-trees. Acta Informatica (1977), 1--21.
[5]
Timo Bingmann. 2008. STX B+tree C+template classes. URL http://panthema. net/2007/stx-btree (2008).
[6]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, et al. 2015. Apache flink : Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering (2015).
[7]
Sang Kyun Cha, Sangyong Hwang, Kihong Kim, et al. 2001. Cache-conscious concurrency control of main-memory indexes on shared-memory multiprocessor systems. VLDB (2001), 181--190.
[8]
Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, et al. 2014. Trill: A High-performance Incremental Query Processor for Diverse Analytics. VLDB (2014), 401--412.
[9]
Daniele Dell' Aglio, Emanuele Della Valle, Frank van Harmelen, and Abraham Bernstein. 2017. Stream reasoning: A survey and outlook : A summary of ten years of research and a vision for the next decade. Data Science (2017), 59--83.
[10]
Ramez Elmasri. 2008. Fundamentals of database systems .Pearson Education India.
[11]
Xiaoming Gao, Emilio Ferrara, and Judy Qiu. 2015. Parallel clustering of high-dimensional social media data streams. In CCGrid. 323--332.
[12]
Bugra Gedik, Henrique Andrade, Kun-Lung Wu, Philip S Yu, and Myungcheol Doo. 2008. SPADE: the system s declarative stream processing engine. In SIGMOD. 1123--1134.
[13]
Buug ra Gedik, Rajesh R Bordawekar, and S Yu Philip. 2009. CellJoin: a parallel stream join operator for the cell processor. The VLDB journal (2009), 501--519.
[14]
Pawel Gepner and Michal Filip Kowalik. 2006. Multi-core processors: New way to achieve high system performance. In PARELEC. 9--13.
[15]
Lukasz Golab, Shaveen Garg, and M Tamer Özsu. 2004. On indexing sliding windows over online data streams. In International Conference on Extending Database Technology. 712--729.
[16]
Martin Hirzel, Henrique Andrade, Bugra Gedik, Gabriela Jacques-Silva, et al. 2013. IBM streams processing language: analyzing big data in motion. IBM Journal of Research and Development (2013), 7--1.
[17]
Jaewoo Kang, Jeffery F Naughton, and Stratis D Viglas. 2003. Evaluating window joins over unbounded streams. In Data Engineering, International Conference on. 341--352.
[18]
Alexandros Koliousis, Matthias Weidlich, Raul Castro Fernandez, et al. 2016. SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures. In SIGMOD. 555--569.
[19]
Jens Krueger, Changkyu Kim, Martin Grund, Nadathur Satish, David Schwalb, Jatin Chhugani, et al. 2011. Fast Updates on Read-optimized Databases Using Multi-core CPUs. VLDB (2011), 61--72.
[20]
Philip L Lehman et al. 1981. Efficient locking for concurrent operations on B-trees. ACM Transactions on Database Systems (1981), 650--670.
[21]
Tobin J. Lehman and Michael J. Carey. 1986. A Study of Index Structures for Main Memory Database Management Systems. In VLDB. 294--303.
[22]
Viktor Leis, Alfons Kemper, and Tobias Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In ICDE. 38--49.
[23]
Justin J Levandoski, David B Lomet, and Sabyasachi Sengupta. 2013. The Bw-tree: A B-tree for new hardware platforms. In ICDE. 302--313.
[24]
Qian Lin, Beng Chin Ooi, Zhengkui Wang, and Cui Yu. 2015. Scalable Distributed Stream Join Processing. In SIGMOD. 811--825.
[25]
Hongjun Lu, Yuet Yeung Ng, and Zengping Tian. 2000. T-tree or B-tree: Main memory database index structure revisited. In ADC. 65--73.
[26]
Hongyu Miao, Heejin Park, Myeongjae Jeon, Gennady Pekhimenko, et al. 2017. Strea: Modern Stream Processing on a Multicore Machine. In USENIX ATC 17. 617--629.
[27]
Giovanni Montana, Kostas Triantafyllopoulos, and Theodoros Tsagaris. 2008. Data stream mining for market-neutral algorithmic trading. In Proceedings of the 2008 ACM symposium on Applied computing. 966--970.
[28]
Mohammadreza Najafi, Mohammad Sadoghi, and Hans-Arno Jacobsen. 2016. SplitJoin: A Scalable, Low-latency Stream Join Architecture with Adjustable Ordering Precision. In USENIX Annual Technical Conference. 493--505.
[29]
Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica (1996), 351--385.
[30]
Ippokratis Pandis, Pinar Tözün, Ryan Johnson, and Anastasia Ailamaki. 2011. PLP: Page Latch-free Shared-everything OLTP. VLDB (2011), 610--621.
[31]
Jun Rao and Kenneth A. Ross. 1999. Cache Conscious Indexing for Decision-Support in Main Memory. In VLDB. 78--89.
[32]
Jun Rao and Kenneth A. Ross. 2000. Making B+-Trees Cache Conscious in Main Memory. In SIGMOD. 475--486.
[33]
Rajeev Rastogi, S. Seshadri, Philip Bohannon, et al. 1997. Logical and Physical Versioning in Main Memory Databases. In VLDB. 86--95.
[34]
Pratanu Roy, Jens Teubner, and Rainer Gemulla. 2014. Low-latency handshake join. VLDB (2014), 709--720.
[35]
Jason Sewall, Jatin Chhugani, Changkyu Kim, et al. 2011. PALM: Parallel architecture-friendly latch-free modifications to B+trees on many-core processors. VLDB (2011), 795--806.
[36]
Amirhesam Shahvarani and Hans-Arno Jacobsen. 2016. A Hybrid B+-Tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing Platforms. In SIGMOD. 1523--1538.
[37]
Amirhesam Shahvarani and Hans-Arno Jacobsen. 2019. Parallel Index-based Stream Join on a Multicore CPU. https://arxiv.org/pdf/1903.00452.pdf. (2019). arxiv: cs.DB/1903.00452
[38]
Elias Stehle and Hans-Arno Jacobsen. 2017. A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs. In SIGMOD. 417--432.
[39]
Elias Stehle and Hans-Arno Jacobsen. 2020. ParPaRaw: Massively Parallel Parsing of Delimiter-Separated Raw Data. VLDB (2020), 616--628.
[40]
Michael Stonebraker, Uug ur cC etintemel, and Stan Zdonik. 2005. The 8 requirements of real-time stream processing. SIGMOD (2005), 42--47.
[41]
Jens Teubner and Rene Mueller. 2011. How soccer players would do stream joins. In Sigmod. 625--636.
[42]
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, et al. 2014. Storm@Twitter. In SIGMOD. 147--156.
[43]
Yu Ya-xin, Yang Xing-hua, Yu Ge, and Wu Shan-shan. 2006. An indexed non-equijoin algorithm based on sliding windows over data streams. (2006), 294--298.
[44]
Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, et al. 2016. Apache Spark: A Unified Engine for Big Data Processing. Communication of the ACM (2016), 56--65.
[45]
Steffen Zeuch, Bonaventura Del Monte, Jeyhun Karimov, et al. 2019. Analyzing Efficient Stream Processing on Modern Hardware. VLDB (2019), 516--530.
[46]
Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, et al. 2015. In-memory big data management and processing: A survey. IEEE Transactions on Knowledge and Data Engineering (2015), 1920--1948.
[47]
Linfeng Zhang and Yong Guan. 2008. Detecting click fraud in pay-per-click streams of online advertising networks. ICDCS, 77--84.

Cited By

View all
  • (2024)Low-Latency Adaptive Distributed Stream Join System Based on a Flexible Join ModelProceedings of the ACM on Management of Data10.1145/36549532:3(1-27)Online publication date: 30-May-2024
  • (2024)SWIX: A Memory-efficient Sliding Window Learned IndexProceedings of the ACM on Management of Data10.1145/36392962:1(1-26)Online publication date: 26-Mar-2024
  • (2024)PECJ: Stream Window Join on Disorder Data Streams with Proactive Error CompensationProceedings of the ACM on Management of Data10.1145/36392682:1(1-24)Online publication date: 26-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data indexing
  2. data stream processing
  3. multicore processors
  4. parallel computing

Qualifiers

  • Research-article

Funding Sources

  • Alexander von Humboldt-Stiftung

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)56
  • Downloads (Last 6 weeks)2
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Low-Latency Adaptive Distributed Stream Join System Based on a Flexible Join ModelProceedings of the ACM on Management of Data10.1145/36549532:3(1-27)Online publication date: 30-May-2024
  • (2024)SWIX: A Memory-efficient Sliding Window Learned IndexProceedings of the ACM on Management of Data10.1145/36392962:1(1-26)Online publication date: 26-Mar-2024
  • (2024)PECJ: Stream Window Join on Disorder Data Streams with Proactive Error CompensationProceedings of the ACM on Management of Data10.1145/36392682:1(1-24)Online publication date: 26-Mar-2024
  • (2024)Stream-aware indexing for distributed inequality join processingInformation Systems10.1016/j.is.2024.102425125(102425)Online publication date: Nov-2024
  • (2024)A survey on transactional stream processingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00814-z33:2(451-479)Online publication date: 1-Mar-2024
  • (2023)Nereus: A Distributed Stream Band Join System With Adaptive Range PartitioningIEEE Transactions on Consumer Electronics10.1109/TCE.2023.324929269:4(949-961)Online publication date: Nov-2023
  • (2023)SepJoin: A Distributed Stream Join System with Low Latency and High Throughput2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS56603.2022.00088(633-640)Online publication date: Jan-2023
  • (2023)Scalable Online Interval Join on Modern Multicore Processors in OpenMLDB2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00232(3031-3042)Online publication date: Apr-2023
  • (2022)Revisiting the Design of Parallel Stream Joins on Trusted Execution EnvironmentsAlgorithms10.3390/a1506018315:6(183)Online publication date: 25-May-2022
  • (2022)An adaptive non-migrating load-balanced distributed stream window join systemThe Journal of Supercomputing10.1007/s11227-022-04991-679:8(8236-8264)Online publication date: 15-Dec-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media