Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1989323.1989389acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

How soccer players would do stream joins

Published: 12 June 2011 Publication History

Abstract

In spite of the omnipresence of parallel (multi-core) systems, the predominant strategy to evaluate window-based stream joins is still strictly sequential, mostly just straightforward along the definition of the operation semantics.
In this work we present handshake join, a way of describing and executing window-based stream joins that is highly amenable to parallelized execution. Handshake join naturally leverages available hardware parallelism, which we demonstrate with an implementation on a modern multi-core system and on top of field-programmable gate arrays (FPGAs), an emerging technology that has shown distinctive advantages for high-throughput data processing.
On the practical side, we provide a join implementation that substantially outperforms CellJoin (the fastest published result) and that will directly turn any degree of parallelism into higher throughput or larger supported window sizes. On the semantic side, our work gives a new intuition of window semantics, which we believe could inspire other stream processing algorithms or ongoing standardization efforts for stream query languages.

References

[1]
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Çetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, and S. Zdonik. The Design of the Borealis Stream Processing Engine. In Proc. of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, 2005.
[2]
A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, K. Ito, R. Motwani, U. Srivastava, and J. Widom. STREAM: The Stanford Data Stream Management System. http://infolab.stanford.edu/ usriv/papers/streambook.pdf.
[3]
A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The Multikernel: A new OS architecture for scalable multicore systems. In Proc. of the 22nd ACM Symposium on Operating Systems Principles (SOSP), Big Sky, MT, USA, 2009.
[4]
I. Botan, R. Derakhshan, N. Dindar, L. Haas, R. J. Miller, and N. Tatbul. SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems. Proc. of the VLDB Endowment, 3(1), 2010.
[5]
M. Butts. Synchronization Through Communication in a Massively Parallel Processor Array. IEEE Micro, 27(5):32--40, 2007.
[6]
P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor. IEEE Micro, 30(2):16--29, 2010.
[7]
J. Davis, C. Thacker, and C. Chang. BEE3: Revitalizing Computer Architecture Research. Technical Report MSR-TR-2009-45, Microsoft Research, 2009.
[8]
A. DeHon. Balancing Interconnect and Computation in a Reconfigurable Computing Array (or, why you don't really want 100% LUT utilization). In Proc. of the Int'l Symposium on Field Programmable Gate Arrays (FPGA), Monterey, CA, USA, 1999.
[9]
P. Frey, R. Gonçalves, M. Kersten, and J. Teubner. A Spinning Join That Does Not Get Dizzy. In Proc. of the 30th Int'l Conference on Distributed Computing Systems (ICDCS), Genoa, Italy, 2010.
[10]
B. Gedik, P. S. Yu, and R. Bordawekar. CellJoin: A Parallel Stream Join Operator for the Cell Processor. The VLDB Journal, 18(2):501--519, 2009.
[11]
T. M. Ghanem, M. A. Hammad, M. F. Mokbel, W. G. Aref, and A. K. Elmagarmid. Incremental Evaluation of Sliding-Window Queries over Data Streams. IEEE Trans. on Knowledge and Data Engineering (TKDE), 19(1):57--72, 2007.
[12]
L. Golab, S. Garg, and T. Öszu. On Indexing Sliding Windows over Online Data Streams. In Proc. of the 9th Int'l Conference on Extending Database Technology (EDBT), Crete, Greece, 2004.
[13]
S. Helmer, T. Westmann, and G. Moerkotte. Diag-Join: An Opportunistic Join Algorithm for 1:N Relationships. In Proc. of the 24th Int'l Conference on Very Large Databases (VLDB), New York, NY, USA, 1998.
[14]
J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, F. Pailet, S. Jain, T. Jacob, S. Yada, S. Marella, P. Salihundam, V. Erraguntla, M. Konow, M. Riepen, G. Droege, J. Lindemann, M. Gries, T. Apel, K. Henriss, T. Lund-Larsen, S. Steibl, S. Borkar, V. De, R. v. d. Wijngaart, and T. Mattson. A 48-Core IA-32 Message-Passing Processor with DVFS in 45nm CMOS. In 2010 IEEE International Solid-State Circuits Conference, San Francisco, CA, USA, February 2010.
[15]
Z. Ives, D. Florescu, M. Friedman, A. Levy, and D. S. Weld. An Adaptive Query Execution System for Data Integration. In Proc. of the Int'l Conference on Management of Data (SIGMOD), Philadelphia, PA, USA, 1999.
[16]
N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnan, U. Çetintemel, M. Cherniack, R. Tibbetts, and S. Zdonik. Towards a Streaming SQL Standard. Proc. of the VLDB Endowment, 1(2), 2008.
[17]
J. Kang, J. F. Naughton, and S. D. Viglas. Evaluating Window Joins over Unbounded Streams. In Proc. of the 19th Int'l Conference on Data Engineering (ICDE), Bangalore, India, 2003.
[18]
S. Krishnamurthy, M. J. Franklin, J. Davis, D. Farina, P. Golovko, A. Li, and N. Thombre. Continuous Analytics Over Discontinuous Streams. In Proc. of the Int'l Conference on Management of Data (SIGMOD), Indidanapolis, IN, USA, 2010.
[19]
H. T. Kung and P. L. Lohman. Systolic (VLSI) Arrays for Relational Database Operations. In Proc. of the Int'l Conference on Management of Data (SIGMOD), Santa Monica, CA, USA, 1980.
[20]
I. Kuon and J. Rose. Measuring the gap between FPGAs and ASICs. In Proc. of the 14th Int'l Symposium on Field-Programmable Gate Arrays (FPGA), Monterey, CA, USA, 2006.
[21]
J. Li, D. Maier, K. Tufte, V. Papadimos, and P. A. Tucker. Semantics and Evaluation Techniques for Window Aggregates in Data Streams. In Proc. of the Int'l Conference on Management of Data (SIGMOD), Baltimore, MD, USA, 2005.
[22]
O. Mencer, K. H. Tsoi, S. Craimer, T. Todman, W. Luk, M. Y. Wong, and P. H. W. Leong. CUBE: A 512-FPGA Cluster. In Proc. of the Southern Programmable Logic Conference (SPL), Sao Carlos, Brazil, 2009.
[23]
A. Mitra, M. R. Vieira, P. Bakalov, V. J. Tsotras, and W. A. Najjar. Boosting XML Filtering Through a Scalable FPGA-Based Architecture. In Proc. of the 4th Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, 2009.
[24]
Netezza Corp. http://www.netezza.com/.
[25]
J.-B. Qian, H.-B. Xu, Y.-S. Dong, X.-J. Liu, and Y.-L. Wang. FPGA Acceleration Window Joins Over Multiple Data Streams. Journal of Circuits, Systems, and Computers, 14(4):813--830, 2005.
[26]
K. A. Ross. Selection Conditions in Main Memory. ACM Trans. on Database Systems (TODS), 29:132--161, 2004.
[27]
N. Tatbul, U. Çetintemel, S. Zdonik, M. Cherniack, and M. Stonebraker. Load Shedding in a Data Stream Manager. In Proc. of the 29th Int'l Conference on Very Large Databases (VLDB), Berlin, Germany, 2003.
[28]
W. B. Teeuw and H. M. Blanken. Control Versus Data Flow in Parallel Database Machines. IEEE Trans. on Parallel and Distributed Systems (TPDS), 4(11):1265--1279, 1993.
[29]
J. Teubner, R. Mueller, and G. Alonso. FPGA Acceleration for the Frequent Item Problem. In Proc. of the 26th Int'l Conference on Data Engineering (ICDE), Long Beach, CA, USA, 2010.
[30]
A. Wilschut and P. Apers. Dataflow Query Execution in a Parallel Main-Memory Environment. In Proc. of the 1st Int'l Conference on Parallel and Distributed Information Systems (PDIS), Miami Beach, FL, USA, 1991.

Cited By

View all
  • (2024)Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and QualityProceedings of the ACM on Management of Data10.1145/36771342:4(1-31)Online publication date: 30-Sep-2024
  • (2024)Low-Latency Adaptive Distributed Stream Join System Based on a Flexible Join ModelProceedings of the ACM on Management of Data10.1145/36549532:3(1-27)Online publication date: 30-May-2024
  • (2024)PECJ: Stream Window Join on Disorder Data Streams with Proactive Error CompensationProceedings of the ACM on Management of Data10.1145/36392682:1(1-24)Online publication date: 26-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
June 2011
1364 pages
ISBN:9781450306614
DOI:10.1145/1989323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data flow
  2. parallelism
  3. stream joins

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)4
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and QualityProceedings of the ACM on Management of Data10.1145/36771342:4(1-31)Online publication date: 30-Sep-2024
  • (2024)Low-Latency Adaptive Distributed Stream Join System Based on a Flexible Join ModelProceedings of the ACM on Management of Data10.1145/36549532:3(1-27)Online publication date: 30-May-2024
  • (2024)PECJ: Stream Window Join on Disorder Data Streams with Proactive Error CompensationProceedings of the ACM on Management of Data10.1145/36392682:1(1-24)Online publication date: 26-Mar-2024
  • (2024)DIBA: A Re-Configurable Stream ProcessorIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338119236:9(4550-4566)Online publication date: Sep-2024
  • (2024)ABACUS: ASIP-Based Avro Schema-Customizable Parser Acceleration on FPGAs2024 27th International Symposium on Design & Diagnostics of Electronic Circuits & Systems (DDECS)10.1109/DDECS60919.2024.10508904(79-85)Online publication date: 3-Apr-2024
  • (2024)Stream-aware indexing for distributed inequality join processingInformation Systems10.1016/j.is.2024.102425125(102425)Online publication date: Nov-2024
  • (2023)SepJoin: A Distributed Stream Join System with Low Latency and High Throughput2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS56603.2022.00088(633-640)Online publication date: Jan-2023
  • (2023)Scalable Online Interval Join on Modern Multicore Processors in OpenMLDB2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00232(3031-3042)Online publication date: Apr-2023
  • (2023)SPEAR-JSON: Selective Parsing of JSON to Enable Accelerated Stream Processing on FPGAs2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL60245.2023.00034(189-196)Online publication date: 4-Sep-2023
  • (2022)Revisiting the Design of Parallel Stream Joins on Trusted Execution EnvironmentsAlgorithms10.3390/a1506018315:6(183)Online publication date: 25-May-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media