Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2915224acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

GPL: A GPU-based Pipelined Query Processing Engine

Published: 26 June 2016 Publication History

Abstract

Graphics Processing Units (GPUs) have evolved as a powerful query co-processor for main memory On-Line Analytical Processing (OLAP) databases. However, existing GPU-based query processors adopt a kernel-based execution approach which optimizes individual kernels for resource utilization and executes the GPU kernels involved in the query plan one by one. Such a kernel-based approach cannot utilize all GPU resources efficiently due to the resource underutilization of individual kernels and memory ping-pong across kernel executions. In this paper, we propose GPL, a novel pipelined query execution engine to improve the resource utilization of query co-processing on the GPU. Different from the existing kernel-based execution, GPL takes advantage of hardware features of new-generation GPUs including concurrent kernel execution and efficient data communication channel between kernels. We further develop an analytical model to guide the generation of the optimal pipelined query plan. Thus, the tile size of the pipelined query execution can be adapted in a cost-based manner. We evaluate GPL with TPC-H queries on both AMD and NVIDIA GPUs. The experimental results show that 1) the analytical model is able to guide determining the suitable parameter values in pipelined query execution plan, and 2) GPL is able to significantly outperform the state-of-the-art kernel-based query processing approaches, with improvement up to 48%.

References

[1]
A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. Dbmss on a modern processor: Where does time go? In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB '99, pages 266--277, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
[2]
S. Arumugam, A. Dobra, C. M. Jermaine, N. Pansare, and L. Perez. The datapath system: A data-centric analytic processing engine for large data warehouses. In SIGMOD, 2010.
[3]
C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pages 362--373, April 2013.
[4]
P. A. Boncz, M. Zukowski, and N. Nes. Monetdb/x100: Hyper-pipelining query execution. Conference on Innovative Data Systems Research (CIDR), 2005.
[5]
Z. Chen, J. Xu, J. Tang, K. Kwiat, and C. Kamhoua. G-storm: GPU-enabled high-throughput online data processing in storm. In Big Data (Big Data), 2015 IEEE International Conference on, pages 307--312, Oct 2015.
[6]
Y. Cheng and F. Rusu. Parallel in-situ data processing with speculative loading. In SIGMOD. ACM, 2014.
[7]
J. Cieslewicz, W. Mee, and K. A. Ross. Cache-conscious buffering for database operators with state. In Proceedings of the Fifth International Workshop on Data Management on New Hardware, DaMoN '09, New York, NY, USA, 2009.
[8]
J. Giceva, G. Alonso, T. Roscoe, and T. Harris. Deployment of query plans on multicores. Proc. VLDB Endow., 8(3):233--244, Nov. 2014.
[9]
N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, and J. Manferdelli. High performance discrete fourier transforms on graphics processors. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC '08, Piscataway, NJ, USA, 2008.
[10]
G. Graefe. Volcano - an extensible and parallel query evaluation system. IEEE Trans. on Knowl. and Data Eng., 1994.
[11]
S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. QPipe: A simultaneously pipelined relational query engine. In SIGMOD, 2005.
[12]
B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: A mapreduce framework on graphics processors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT '08, pages 260--269, New York, NY, USA, 2008. ACM.
[13]
B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational query coprocessing on graphics processors. ACM Trans. Database Syst., 34(4):21:1--21:39, Dec. 2009.
[14]
B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD '08, pages 511--524, New York, NY, USA, 2008. ACM.
[15]
J. He, M. Lu, and B. He. Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. Proc. VLDB Endow., 6(10):889--900, Aug. 2013.
[16]
J. He, S. Zhang, and B. He. In-cache query co-processing on coupled CPU-GPU architectures. Proc. VLDB Endow., 8(4):329--340, Dec. 2014.
[17]
M. Heimel, M. Kiefer, and V. Markl. Self-tuning, GPU-accelerated kernel density models for multidimensional selectivity estimation. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, New York, NY, USA, 2015.
[18]
M. Heimel, M. Saecker, H. Pirk, S. Manegold, and V. Markl. Hardware-oblivious parallelism for in-memory column-stores. Proc. VLDB Endow., 6(9):709--720, July 2013.
[19]
S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. Monetdb: Two decades of research in column-oriented database architectures. IEEE Data Engineering Bulletin, 35(1), 2012.
[20]
S. Jha, B. He, M. Lu, X. Cheng, and H. P. Huynh. Improving main memory hash joins on intel xeon phi processors: An experimental approach. Proc. VLDB Endow., 8(6):642--653, Feb. 2015.
[21]
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: A high-performance, distributed main memory transaction processing system. Proc. VLDB Endow., 1(2), Aug. 2008.
[22]
V. Leis, P. Boncz, A. Kemper, and T. Neumann. Morsel-driven parallelism: A numa-aware query evaluation framework for the many-core age. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pages 743--754, New York, NY, USA, 2014. ACM.
[23]
G. Luo, J. F. Naughton, C. J. Ellmann, and M. W. Watzke. Toward a progress indicator for database queries. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD '04, pages 791--802, New York, NY, USA, 2004. ACM.
[24]
S. Manegold, P. A. Boncz, and M. L. Kersten. Optimizing database architecture for the new bottleneck: Memory access. The VLDB Journal, 9(3), Dec. 2000.
[25]
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 2007.
[26]
I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki. Data-oriented transaction execution. Proc. VLDB Endow., 3(1--2), Sept. 2010.
[27]
H. Pirk, F. Funke, M. Grund, T. Neumann, U. Leser, S. Manegold, A. Kemper, and M. Kersten. CPU and cache efficient management of memory-resident databases. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pages 14--25, April 2013.
[28]
H. Pirk, S. Manegold, and M. Kersten. Waste not... efficient co-processing of relational data. In 2014 IEEE 30th International Conference on Data Engineering, March 2014.
[29]
O. Polychroniou, A. Raghavan, and K. A. Ross. Rethinking simd vectorization for in-memory databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pages 1493--1508, New York, NY, USA, 2015. ACM.
[30]
J. Power, Y. Li, M. D. Hill, J. M. Patel, and D. A. Wood. Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries. In Proceedings of the 11th International Workshop on Data Management on New Hardware, DaMoN'15, 2015.
[31]
M. Saecker. Ocelot: A Hardware-Oblivious Database Engine. https://bitbucket.org/msaecker/monetdb-opencl.
[32]
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, SIGMOD '79, 1979.
[33]
A. Shatdal, C. Kant, and J. F. Naughton. Cache conscious algorithms for relational query processing. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB '94, pages 510--521, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc.
[34]
K.-L. Tan, Q. Cai, B. C. Ooi, W.-F. Wong, C. Yao, and H. Zhang. In-memory databases: Challenges and opportunities from software and hardware perspectives. SIGMOD Rec., Aug. 2015.
[35]
K. Wang, K. Zhang, Y. Yuan, S. Ma, R. Lee, X. Ding, and X. Zhang. Concurrent analytical query processing with GPUs. Proc. VLDB Endow., July 2014.
[36]
H. Wu, G. Diamos, T. Sheard, M. Aref, S. Baxter, M. Garland, and S. Yalamanchili. Red fox: An execution environment for relational query processing on GPUs. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pages 44:44--44:54, New York, NY, USA, 2014. ACM.
[37]
Y. Yuan, R. Lee, and X. Zhang. The yin and yang of processing data warehousing queries on GPU devices. Proc. VLDB Endow., 6(10):817--828, Aug. 2013.
[38]
H. Zhang, G. Chen, B. C. Ooi, K. L. Tan, and M. Zhang. In-memory big data management and processing: A survey. IEEE Transactions on Knowledge and Data Engineering, 27(7):1920--1948, July 2015.
[39]
K. Zhang, K. Wang, Y. Yuan, L. Guo, R. Lee, and X. Zhang. Mega-kv: A case for GPUs to maximize the throughput of in-memory key-value stores. Proc. VLDB Endow., 8(11):1226--1237, July 2015.
[40]
S. Zhang, J. He, B. He, and M. Lu. OmniDB: Towards portable and efficient query processing on parallel CPU/GPU architectures. Proc. VLDB Endow., Aug. 2013.
[41]
M. Zukowski, M. van de Wiel, and P. Boncz. Vectorwise: A vectorized analytical dbms. In Data Engineering (ICDE), 2012 IEEE 28th International Conference on, pages 1349--1350, April 2012.

Cited By

View all
  • (2024)GPU Database Systems Characterization and OptimizationProceedings of the VLDB Endowment10.14778/3632093.363210717:3(441-454)Online publication date: 20-Jan-2024
  • (2024)Heterogeneous Intra-Pipeline Device-Parallel AggregationsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663441(1-10)Online publication date: 10-Jun-2024
  • (2024)CERT: Finding Performance Issues in Database Systems Through the Lens of Cardinality EstimationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639076(1-13)Online publication date: 20-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. KBE
  2. channel
  3. pipelined execution
  4. tiling

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)4
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)GPU Database Systems Characterization and OptimizationProceedings of the VLDB Endowment10.14778/3632093.363210717:3(441-454)Online publication date: 20-Jan-2024
  • (2024)Heterogeneous Intra-Pipeline Device-Parallel AggregationsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663441(1-10)Online publication date: 10-Jun-2024
  • (2024)CERT: Finding Performance Issues in Database Systems Through the Lens of Cardinality EstimationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639076(1-13)Online publication date: 20-May-2024
  • (2024)UltraPrecise: A GPU-Based Framework for Arbitrary-Precision Arithmetic in Database Systems2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00294(3837-3850)Online publication date: 13-May-2024
  • (2023)Testing Database Engines via Query Plan Guidance2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00174(2060-2071)Online publication date: May-2023
  • (2023)SQL2FPGA: Automatic Acceleration of SQL Query Processing on Modern CPU-FPGA Platforms2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM57271.2023.00028(184-194)Online publication date: May-2023
  • (2022)Query processing on tensor computation runtimesProceedings of the VLDB Endowment10.14778/3551793.355183315:11(2811-2825)Online publication date: 29-Sep-2022
  • (2022)A Comprehensive Empirical Study of Query Performance Across GPU DBMSesACM SIGMETRICS Performance Evaluation Review10.1145/3547353.352264450:1(51-52)Online publication date: 7-Jul-2022
  • (2022)Dynamic memory management in massively parallel systemsProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532387(1-13)Online publication date: 28-Jun-2022
  • (2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media