Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2882906acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

SABER: Window-Based Hybrid Stream Processing for Heterogeneous Architectures

Published: 14 June 2016 Publication History

Editorial Notes

Computationally Replicable. The experimental results of this paper were replicated by a SIGMOD Review Committee and were found to support the central results reported in the paper. Details of the review process are found here

Abstract

Modern servers have become heterogeneous, often combining multi-core CPUs with many-core GPGPUs. Such heterogeneous architectures have the potential to improve the performance of data-intensive stream processing applications, but they are not supported by current relational stream processing engines. For an engine to exploit a heterogeneous architecture, it must execute streaming SQL queries with sufficient data-parallelism to fully utilise all available heterogeneous processors, and decide how to use each in the most effective way. It must do this while respecting the semantics of streaming SQL queries, in particular with regard to window handling.
We describe Saber, a hybrid high-performance relational stream processing engine for CPUs and GPGPUs. Saber executes window-based streaming SQL queries in a data-parallel fashion using all available CPU and GPGPU cores. Instead of statically assigning query operators to heterogeneous processors, Saber employs a new adaptive heterogeneous lookahead scheduling strategy, which increases the share of queries executing on the processor that yields the highest performance. To hide data movement costs, Saber pipelines the transfer of stream data between CPU and GPGPU memory. Our experimental comparison against state-of-the-art engines shows that Saber increases processing throughput while maintaining low latency for a wide range of streaming SQL queries with both small and large window sizes.

Supplementary Material

ReadMe (readme.md)
Rights information
Reproducibility (saber-sigmod16-reproducibility.zip)
Scripts, Datasets

References

[1]
Apache Samza. http://samza.apache.org/. Last access: 09/02/16.
[2]
Esper. http://www.espertech.com/esper/. Last access: 09/02/16.
[3]
Oracle® Stream Explorer. http://bit.ly/1L6tKz3. Last access: 09/02/16.
[4]
T. Akidau, A. Balikov, K. Bekiro\uglu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. MillWheel: Fault-tolerant Stream Processing at Internet Scale. Proc. VLDB Endow., 6(11):1033--1044, Aug. 2013.
[5]
M. H. Ali, C. Gerea, B. S. Raman, B. Sezgin, T. Tarnavski, T. Verona, P. Wang, P. Zabback, A. Ananthanarayan, A. Kirilov, M. Lu, A. Raizman, R. Krishnan, R. Schindlauer, T. Grabs, S. Bjeletich, B. Chandramouli, J. Goldstein, S. Bhat, Y. Li, V. Di Nicola, X. Wang, D. Maier, S. Grell, O. Nano, and I. Santos. Microsoft CEP Server and Online Behavioral Targeting. Proc. VLDB Endow., 2(2):1558--1561, Aug. 2009.
[6]
A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, R. Motwani, I. Nishizawa, U. Srivastava, D. Thomas, R. Varma, and J. Widom. STREAM: The Stanford Stream Data Manager. IEEE Data Eng. Bull., 26(1):19--26, Mar. 2003.
[7]
A. Arasu, S. Babu, and J. Widom. The CQL Continuous Query Language: Semantic Foundations and Query Execution. The VLDB Journal, 15(2):121--142, June 2006.
[8]
A. Arasu, M. Cherniack, E. Galvez, D. Maier, A. S. Maskey, E. Ryvkina, M. Stonebraker, and R. Tibbetts. Linear Road: A Stream Data Management Benchmark. In Proceedings of the 30th International Conference on Very Large Data Bases, VLDB '04, pages 480--491. VLDB Endowment, 2004.
[9]
A. Artikis, M. Weidlich, F. Schnitzler, I. Boutsis, T. Liebig, N. Piatkowski, C. Bockermann, K. Morik, V. Kalogeraki, J. Marecek, A. Gal, S. Mannor, D. Gunopulos, and D. Kinane. Heterogeneous Stream Processing and Crowdsourcing for Urban Traffic Management. In Proceedings of the 17th International Conference on Extending Database Technology, EDBT '14, pages 712--723. OpenProceedings.org, 2014.
[10]
aws.amazon.com. Announcing Cluster GPU Instances for Amazon EC2. http://amzn.to/1RrDflL, Nov. 2010. Last access: 09/02/16.
[11]
C. Balkesen and N. Tatbul. Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams. In Proceedings of the 8th International Workshop on Data Management for Sensor Networks, DMSN '11, 2011.
[12]
P. Bhatotia, U. A. Acar, F. P. Junqueira, and R. Rodrigues. Slider: Incremental Sliding Window Analytics. In Proceedings of the 15th International Middleware Conference, Middleware '14, pages 61--72. ACM, 2014.
[13]
A. Biem, E. Bouillet, H. Feng, A. Ranganathan, A. Riabov, O. Verscheure, H. Koutsopoulos, and C. Moran. IBM Infosphere Streams for Scalable, Real-time, Intelligent Transportation Services. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 1093--1104. ACM, 2010.
[14]
G. E. Blelloch. Vector Models for Data-parallel Computing. MIT Press, 1990.
[15]
S. Breß. The Design and Implementation of CoGaDB: A Column-oriented GPU-accelerated DBMS. Datenbank-Spektrum, 14(3):199--209, 2014.
[16]
S. Breß and G. Saake. Why It is Time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS. Proc. VLDB Endow., 6(12):1398--1403, Aug. 2013.
[17]
R. Castro Fernandez, M. Migliavacca, E. Kalyvianaki, and P. Pietzuch. Integrating Scale out and Fault Tolerance in Stream Processing Using Operator State Management. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 725--736. ACM, 2013.
[18]
U. Cetintemel, J. Du, T. Kraska, S. Madden, D. Maier, J. Meehan, A. Pavlo, M. Stonebraker, E. Sutherland, N. Tatbul, K. Tufte, H. Wang, and S. Zdonik. S-Store: A Streaming NewSQL System for Big Velocity Applications. Proc. VLDB Endow., 7(13):1633--1636, Aug. 2014.
[19]
B. Chandramouli, J. Goldstein, M. Barnett, R. DeLine, D. Fisher, J. C. Platt, J. F. Terwilliger, and J. Wernsing. Trill: A High-performance Incremental Query Processor for Diverse Analytics. Proc. VLDB Endow., 8(4):401--412, Dec. 2014.
[20]
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss, and M. A. Shah. TelegraphCQ: Continuous Dataflow Processing. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD '03, pages 668--668. ACM, 2003.
[21]
J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. SIGMOD Rec., 29(2):379--390, May 2000.
[22]
L. Chen, X. Huo, and G. Agrawal. Accelerating MapReduce on a Coupled CPU-GPU Architecture. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 25:1--25:11. IEEE Computer Society Press, 2012.
[23]
X. Chen, C.-D. Lu, and K. Pattabiraman. Failure Analysis of Jobs in Compute Clouds: A Google Cluster Case Study. In Proceedings of the 25th International Symposium on Software Reliability Engineering, ISSRE '14, pages 167--177. IEEE Computer Society Press, 2014.
[24]
S. Crago, K. Dunn, P. Eads, L. Hochstein, D.-I. Kang, M. Kang, D. Modium, K. Singh, J. Suh, and J. Walters. Heterogeneous Cloud Computing. In Proceedings of the 2011 IEEE International Conference on Cluster Computing, CLUSTER '11, pages 378--385. IEEE Press, 2011.
[25]
T. Das, Y. Zhong, I. Stoica, and S. Shenker. Adaptive Stream Processing Using Dynamic Batch Sizing. In Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, pages 16:1--16:13. ACM, 2014.
[26]
feedzai.com. Modern Payment Fraud Prevention at Big Data Scale. http://bit.ly/1KCskD5, 2013. Last access: 09/02/16.
[27]
B. Gedik, R. R. Bordawekar, and P. S. Yu. CellJoin: A Parallel Stream Join Operator for the Cell Processor. The VLDB Journal, 18(2):501--519, Apr. 2009.
[28]
N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTeraSort: High Performance Graphics Co-processor Sorting for Large Database Management. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD '06, pages 325--336. ACM, 2006.
[29]
B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational Query Coprocessing on Graphics Processors. ACM Trans. Database Syst., 34(4):21:1--21:39, Dec. 2009.
[30]
J. He, M. Lu, and B. He. Revisiting Co-processing for Hash Joins on the Coupled CPU-GPU Architecture. Proc. VLDB Endow., 6(10):889--900, Aug. 2013.
[31]
J. He, S. Zhang, and B. He. In-cache Query Co-processing on Coupled CPU-GPU Architectures. Proc. VLDB Endow., 8(4):329--340, Dec. 2014.
[32]
M. Heimel, M. Saecker, H. Pirk, S. Manegold, and V. Markl. Hardware-oblivious Parallelism for In-memory Column-stores. Proc. VLDB Endow., 6(9):709--720, July 2013.
[33]
S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Engineering Bulletin, 35(1):40--45, 2012.
[34]
Z. Jerzak and H. Ziekow. The DEBS 2014 Grand Challenge. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems, DEBS '14, pages 266--269. ACM, 2014.
[35]
J. Kang, J. Naughton, and S. Viglas. Evaluating window joins over unbounded streams. In Proceedings of the 19th IEEE International Conference on Data Engineering, ICDE '03, pages 341--352. IEEE Press, 2003.
[36]
T. Karnagel, D. Habich, B. Schlegel, and W. Lehner. The HELLS-join: A Heterogeneous Stream Join for Extremely Large Windows. In Proceedings of the 9th International Workshop on Data Management on New Hardware, DaMoN '13, pages 2:1--2:7. ACM, 2013.
[37]
T. Karnagel, M. Hille, M. Ludwig, D. Habich, W. Lehner, M. Heimel, and V. Markl. Demonstrating Efficient Query Processing in Heterogeneous Environments. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pages 693--696. ACM, 2014.
[38]
S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan. An Analysis of Traces from a Production MapReduce Cluster. In Proceedings of the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGrid '10, pages 94--103. IEEE Computer Society, 2010.
[39]
S. J. Kazemitabar, U. Demiryurek, M. Ali, A. Akdogan, and C. Shahabi. Geospatial Stream Query Processing Using Microsoft SQL Server StreamInsight. Proc. VLDB Endow., 3(1--2):1537--1540, Sept. 2010.
[40]
Khronos OpenCL Working Group. The OpenCL Specification, version 1.2, 2012.
[41]
J. Li, D. Maier, K. Tufte, V. Papadimos, and P. A. Tucker. No Pane, No Gain: Efficient Evaluation of Sliding-window Aggregates over Data Streams. SIGMOD Rec., 34(1):39--44, Mar. 2005.
[42]
W. Liu, B. Schmidt, G. Voss, and W. Muller-Wittig. Streaming Algorithms for Biological Sequence Alignment on GPUs. IEEE Trans. Parallel Distrib. Syst., 18(9):1270--1281, Sept. 2007.
[43]
D. Lustig and M. Martonosi. Reducing GPU Offload Latency via Fine-grained CPU-GPU Synchronization. In Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, HPCA '13, pages 354--365. IEEE Computer Society, 2013.
[44]
M. L. Massie, B. N. Chun, and D. E. Culler. The ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing, 30(7):817 -- 840, July 2004.
[45]
L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed Stream Computing Platform. In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, ICDMW '10, pages 170--177. IEEE Computer Society, 2010.
[46]
Novasparks#8482;. NovaSparks Releases FPGA-based Feed Handler for OPRA. http://bit.ly/1kGbQ0D, Aug. 2014. Last access: 09/02/16.
[47]
NVIDIA®. Quadro K5200 Data Sheet. http://bit.ly/1O3D0Yx. Last access: 09/02/16.
[48]
H. Pirk, S. Manegold, and M. Kersten. Waste not... Efficient co-processing of relational data. In Proceedings of the 30th IEEE International Conference on Data Engineering, ICDE '14, pages 508--519. IEEE Press, 2014.
[49]
Z. Shao. Real-time Analytics at Facebook. Presented at the phFifth Conference on Extremely Large Databases, XLDB5, Oct. 2011. http://stanford.io/1HqxPmw. Last access: 09/02/16.
[50]
K. Tangwongsan, M. Hirzel, S. Schneider, and K.-L. Wu. General Incremental Sliding-window Aggregation. Proc. VLDB Endow., 8(7):702--713, Feb. 2015.
[51]
J. Teubner and R. Mueller. How Soccer Players Would Do Stream Joins. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD '11, pages 625--636. ACM, 2011.
[52]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@Twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pages 147--156. ACM, 2014.
[53]
J. Wilkes. More Google Cluster Data. Google Research Blog, http://bit.ly/1A38mfR, Nov. 2011. Last access: 09/02/16.
[54]
wired.com. Google Erects Fake Brain With... Graphics Chips? http://wrd.cm/1CyGIYQ, May 2013. Last access: 09/02/16.
[55]
Y. Yuan, R. Lee, and X. Zhang. The Yin and Yang of Processing Data Warehousing Queries on GPU Devices. Proc. VLDB Endow., 6(10):817--828, Aug. 2013.
[56]
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized Streams: Fault-tolerant Streaming Computation at Scale. In Proceedings of the 24th ACM Symposium on Operating Systems Principles, SOSP '13, pages 423--438. ACM, 2013.
[57]
Y. Zhang and F. Mueller. GStream: A General-Purpose Data Streaming Framework on GPU Clusters. In Proceedings of the International Conference on Parallel Processing, ICPP '11, pages 245--254. IEEE Press, 2011.

Cited By

View all
  • (2024)Heterogeneous Intra-Pipeline Device-Parallel AggregationsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663441(1-10)Online publication date: 10-Jun-2024
  • (2024)μWheel: Aggregate Management for Streams and QueriesProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666031(54-65)Online publication date: 24-Jun-2024
  • (2024)DIBA: A Re-Configurable Stream ProcessorIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338119236:9(4550-4566)Online publication date: 1-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. data parallelism
  2. gpgpus
  3. heterogeneous hardware
  4. hybrid scheduling
  5. stream processing
  6. windowing support

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)112
  • Downloads (Last 6 weeks)11
Reflects downloads up to 11 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Heterogeneous Intra-Pipeline Device-Parallel AggregationsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663441(1-10)Online publication date: 10-Jun-2024
  • (2024)μWheel: Aggregate Management for Streams and QueriesProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666031(54-65)Online publication date: 24-Jun-2024
  • (2024)DIBA: A Re-Configurable Stream ProcessorIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338119236:9(4550-4566)Online publication date: 1-Sep-2024
  • (2024)Data-Aware Adaptive Compression for Stream ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337771036:9(4531-4549)Online publication date: Sep-2024
  • (2024)BIFROST: A Future Graph Database Runtime2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00448(5605-5613)Online publication date: 13-May-2024
  • (2024)Counting Butterflies in Fully Dynamic Bipartite Graph Streams2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00226(2917-2930)Online publication date: 13-May-2024
  • (2024)Costream: Learned Cost Models for Operator Placement in Edge-Cloud Environments2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00015(96-109)Online publication date: 13-May-2024
  • (2024)Accelerating network analytics with an on-NIC streaming engineComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2024.110231241:COnline publication date: 1-Mar-2024
  • (2024)Handling Uncertainty in Spatiotemporal DataSpatiotemporal Data Analytics and Modeling10.1007/978-981-99-9651-3_4(69-87)Online publication date: 16-Apr-2024
  • (2023)Throughput Optimization with a NUMA-Aware Runtime System for Efficient Scientific Data StreamingProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624593(795-805)Online publication date: 12-Nov-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media