Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

CellJoin: a parallel stream join operator for the cell processor

Published: 01 April 2009 Publication History

Abstract

Low-latency and high-throughput processing are key requirements of data stream management systems (DSMSs). Hence, multi-core processors that provide high aggregate processing capacity are ideal matches for executing costly DSMS operators. The recently developed Cell processor is a good example of a heterogeneous multi-core architecture and provides a powerful platform for executing data stream operators with high-performance. On the down side, exploiting the full potential of a multi-core processor like Cell is often challenging, mainly due to the heterogeneous nature of the processing elements, the software managed local memory at the co-processor side, and the unconventional programming model in general. In this paper, we study the problem of scalable execution of windowed stream join operators on multi-core processors, and specifically on the Cell processor. By examining various aspects of join execution flow, we determine the right set of techniques to apply in order to minimize the sequential segments and maximize parallelism. Concretely, we show that basic windows coupled with low-overhead pointer-shifting techniques can be used to achieve efficient join window partitioning, column-oriented join window organization can be used to minimize scattered data transfers, delay-optimized double buffering can be used for effective pipelining, rate-aware batching can be used to balance join throughput and tuple delay, and finally single-instruction multiple-data (SIMD) optimized operator code can be used to exploit data parallelism. Our experimental results show that, following the design guidelines and implementation techniques outlined in this paper, windowed stream joins can achieve high scalability (linear in the number of co-processors) by making efficient use of the extensive hardware parallelism provided by the Cell processor (reaching data processing rates of ?13 GB/s) and significantly surpass the performance obtained form conventional high-end processors (supporting a combined input stream rate of 2,000 tuples/s using 15 min windows and without dropping any tuples, resulting in ?8.3 times higher output rate compared to an SSE implementation on dual 3.2 GHz Intel Xeon).

References

[1]
ALF for Cell BE programmer's guide and API reference. Retrieved on 15th July 2008, http://www-01.ibm.com/chips/techlib/techlib. nsf/techdocs/41838EDB5A15CCCD002573530063D465 (2008).
[2]
Arasu, A., Babcock, B., Babu, S., Datar, M., Ito, K., Motwani, R., Nishizawa, I., Srivastava, U., Thomas, D., Varma, R., Widom, J.: STREAM: the Stanford stream data manager. IEEE Data Eng. Bull. 26 (1) (2003).
[3]
Ayad, A.M., Naughton, J.F.: Static optimization of conjunctive queries with sliding windows over infinite streams. In: ACM International Conference on Management of Data (SIGMOD) (2004).
[4]
Backus, J.: Can programming be liberated from the von neumann style? Communications of the ACM, 21 (8) (1978).
[5]
Bader, D., Agarwal, V., Madduri, K.: On the design and analysis of irregular algorithms on the Cell processor: a case study on list ranking. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2007).
[6]
Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Cherniack, M., Convy, C., Galvez, E., Salz, J., Stonebraker, M., Tatbul, N., Tibbetts, R., Zdonik, S.: Retrospective on Aurora. VLDB J. Special Issue on Data Stream Processing (2004).
[7]
Bandi, N., Sun, C., Abbadi, A.E., Agrawal, D.: Hardware acceleration for spatial selections and joins. In: International Conference on Very Large Data Bases (VLDB) (2004).
[8]
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S.R., Raman, V., Reiss, F., Shah, M.A.: TelegraphCQ: Continuous data-flow processing for an uncertain world. In: Conference on Innovative Data Systems Research (CIDR) (2003).
[9]
Das, A., Gehrke, J., Riedewald, M.: Approximate join processing over data streams. In: ACM International Conference on Management of Data (SIGMOD) (2003).
[10]
DeWitt, D.J., Gerber, R.H., Graefe, G., Heytens, M.L., Kumar, K.B., Muralikrishna, M.: GAMMA--a high performance dataflow database machine. In: International Conference on Very Large Data Bases (VLDB) (1986).
[11]
De Witt, D.J., Naughton, J.F., Schneider, D.A.: An evaluation of non-equijoin algorithms. In: International Conference on Very Large Data Bases (VLDB) (1991).
[12]
Eichenberger, A.E., O'Brien, K., O'Brien, K., Wu, P., Chen, T., Oden, P.H., Prener, D.A., Shepherd, J.C., So, B., Sura, Z., Wang, A., Zhang, T., Zhao, P., Gschwind, M.: Optimizing compiler for a Cell processor. In: International Conference on Parallel Architectures and Compilation Techniques (PACT) (2005).
[13]
Gedik, B., Wu, K.-L., Yu, P.S., Liu, L.: Cpu load shedding for binary stream joins. Springer Knowledge and Information Systems, 13 (3) (2007).
[14]
Golab, L., Özsu, M.T.: Processing sliding window multi-joins in continuous queries over data streams. In: International Conference on Very Large Data Bases (VLDB) (2003).
[15]
Gold, B., Ailamaki, A., Huston, L., Falsafi, B.: Accelerating data-base operators using a network processor. In: Proceedings of the 1st international workshop on Data management on new hardware (DAMON) (2005).
[16]
Govindaraju, N.K., Gray, J., Kumar, R., Manocha, D.: GPU-TeraSort: High performance graphics co-processor sorting for large database management. In: ACM International Conference on Management of Data (SIGMOD) (2006).
[17]
Govindaraju, N.K., Raghuvanshi, N., Manocha, D.: Fast and approximate stream mining of quantiles and frequencies using graphics processors. In: ACM International Conference on Management of Data (SIGMOD) (2005).
[18]
Gu, X., Wen, Z., Lin, C., Yu, P.S.: ViCo: an adaptive distributed video correlation system. In: ACM International Conference on Multimedia (2006).
[19]
Hammad, M.A., Aref, W.G., Elmagarmid, A.K.: Stream window join: Tracking moving objects in sensor-network databases. In: Scientific and Statistical Database Management Conference (SSDBM) (2003).
[20]
IBM.: Cell broadband engine architecture. Technical Report Version 1.0, IBM Systems and Technology Group (2005).
[21]
IBM full-system simulator for the cell broadband engine processor. Retrieved on 12th October 2006, http://www.alphaworks.ibm.com/ tech/cellsystemsim/ (2006).
[22]
Intel. IXP2400 network processor hardware reference manual. Technical report, Intel Corporation, May 2003.
[23]
Jain, N., Amini, L., Andrade, H., King, R., Park, Y., Selo, P., Venkatramani, C.: Design, implementation, and evaluation of the linear road benchmark on the stream processing core. In: ACM International Conference on Management of Data (SIGMOD) (2006).
[24]
Jain, N., Amini, L., Andrade, H., King, R., Park, Y., Selo, P., Venkatramani, C.: Design, implementation, and evaluation of the linear road benchmark on the stream processing core. In: ACM International Conference on Management of Data, (SIGMOD) (2006).
[25]
Kang, J., Naughton, J., Viglas, S.: Evaluating window joins over unbounded streams. In: IEEE International Conference on Data Engineering (ICDE) (2003).
[26]
Kistler, M., Perrone, M., Petrini, F.: Cell multiprocessor interconnection network: Built for speed. IEEE Micro, 26 (3) 2006.
[27]
Lakshmi, M.S., Yu, P.S.: Limiting factors of join performance on parallel processors. In: IEEE International Conference on Data Engineering (ICDE) (1989).
[28]
Mercury Systems Dual Cell-based Blade. Feburary 2007, http:// www.mc.com/literature/literature_files/Cell_blade-ds.pdf (2007).
[29]
Petrini, F., Fossum, G., Fernandez, J., Varbanescu, A.L., Kistler, M., Perrone, M.: Multicore surprises: Lessons learned from optimizing Sweep3D on the Cell Broadband Engine. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2007).
[30]
Srivastava, U., Widom, J.: Memory-limited execution of windowed stream joins. In: International Conference on Very Large Data Bases (VLDB) (2004).
[31]
Stonebraker, M.: The case for shared nothing architecture. IEEE Database Eng. Bull. 9 (1) (1986).
[32]
Stonebraker, M., Abadi, D., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O'Neil, E., O'Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-Store: A column oriented DBMS. In: International Conference on Very Large Data Bases (VLDB) (2005).
[33]
Streambase systems. Retrieved on May 2005, http://www. streambase.com/ (2005).
[34]
Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: Very Large Databases Conference (VLDB) (2003).
[35]
Viglas, S.D., Naughton, J.F., Burger, J.: Maximizing the output rate of m-way join queries over streaming information sources. In: Very Large Databases Conference (VLDB) (2003).
[36]
Wu, K.-L., Yu, P.S., Gedik, B., Hildrum, K.W., Aggarwal, C., Bouillet, E., Fan, W., George, D.A., Gu, X., Luo, G., Wang, H.: Challenges and experience in prototyping a multi-modal stream analytic and monitoring application on System S. In: Very Large Data Bases Conference (2007).

Cited By

View all
  • (2024)Low-Latency Adaptive Distributed Stream Join System Based on a Flexible Join ModelProceedings of the ACM on Management of Data10.1145/36549532:3(1-27)Online publication date: 30-May-2024
  • (2024)DIBA: A Re-Configurable Stream ProcessorIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338119236:9(4550-4566)Online publication date: 1-Sep-2024
  • (2024)Efficient detection of multivariate correlations with different correlation measuresThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00815-y33:2(481-505)Online publication date: 1-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases  Volume 18, Issue 2
April 2009
212 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 April 2009

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)5
Reflects downloads up to 11 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Low-Latency Adaptive Distributed Stream Join System Based on a Flexible Join ModelProceedings of the ACM on Management of Data10.1145/36549532:3(1-27)Online publication date: 30-May-2024
  • (2024)DIBA: A Re-Configurable Stream ProcessorIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338119236:9(4550-4566)Online publication date: 1-Sep-2024
  • (2024)Efficient detection of multivariate correlations with different correlation measuresThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00815-y33:2(481-505)Online publication date: 1-Mar-2024
  • (2023)Change Propagation Without JoinsProceedings of the VLDB Endowment10.14778/3579075.357908016:5(1046-1058)Online publication date: 1-Jan-2023
  • (2022)iGPU-Accelerated Pattern Matching on Event StreamsProceedings of the 18th International Workshop on Data Management on New Hardware10.1145/3533737.3535099(1-7)Online publication date: 12-Jun-2022
  • (2022)An adaptive non-migrating load-balanced distributed stream window join systemThe Journal of Supercomputing10.1007/s11227-022-04991-679:8(8236-8264)Online publication date: 15-Dec-2022
  • (2019)Event Stream Processing on Heterogeneous System ArchitectureProceedings of the 15th International Workshop on Data Management on New Hardware10.1145/3329785.3329933(1-10)Online publication date: 1-Jul-2019
  • (2019)STRETCHProceedings of the 13th ACM International Conference on Distributed and Event-based Systems10.1145/3328905.3329509(7-18)Online publication date: 24-Jun-2019
  • (2019)Parallelization of Massive Multiway Stream Joins on Manycore CPUsEuro-Par 2019: Parallel Processing Workshops10.1007/978-3-030-48340-1_1(5-16)Online publication date: 26-Aug-2019
  • (2018)Recent Advancements in Event ProcessingACM Computing Surveys10.1145/317043251:2(1-36)Online publication date: 13-Feb-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media