Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2882936acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Robust Query Processing in Co-Processor-accelerated Databases

Published: 26 June 2016 Publication History

Abstract

Technology limitations are making the use of heterogeneous computing devices much more than an academic curiosity. In fact, the use of such devices is widely acknowledged to be the only promising way to achieve application-speedups that users urgently need and expect. However, building a robust and efficient query engine for heterogeneous co-processor environments is still a significant challenge.
In this paper, we identify two effects that limit performance in case co-processor resources become scarce. Cache thrashing occurs when the working set of queries does not fit into the co-processor's data cache, resulting in performance degradations up to a factor of 24. Heap contention occurs when multiple operators run in parallel on a co-processor and when their accumulated memory footprint exceeds the main memory capacity of the co-processor, slowing down query execution by up to a factor of six.
We propose solutions for both effects. Data-driven operator placement avoids data movements when they might be harmful; query chopping limits co-processor memory usage and thus avoids contention. The combined approach-data-driven query chopping-achieves robust and scalable performance on co-processors. We validate our proposal with our open-source GPU-accelerated database engine CoGaDB and the popular star schema and TPC-H benchmarks.

References

[1]
CUDA C programming guide, CUDA version 6.5, 77--78. NVIDIA, 2014. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf.
[2]
D. J. Abadi, S. R. Madden, and N. Hachem. Column-stores vs. row-stores: How different are they really? In SIGMOD, pages 967--980. ACM, 2008.
[3]
S. Arumugam, A. Dobra, C. M. Jermaine, N. Pansare, and L. Perez. The DataPath system: A data-centric analytic processing engine for large data warehouses. In SIGMOD, pages 519--530. ACM, 2010.
[4]
P. A. Boncz and M. L. Kersten. MIL primitives for querying a fragmented world. The VLDB Journal, 8(2):101--119, 1999.
[5]
P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, pages 225--237, 2005.
[6]
S. Borkar and A. A. Chien. The future of microprocessors. Communications of the ACM, 54(5):67--77, 2011.
[7]
S. Breß, F. Beier, H. Rauhe, K.-U. Sattler, E. Schallehn, and G. Saake. Efficient co-processor utilization in database query processing. Information Systems, 38(8):1084--1096, 2013.
[8]
S. Breß, M. Heimel, M. Saecker, B. Kocher, V. Markl, and G. Saake. Ocelot/HyPE: Optimized data processing on heterogeneous hardware. PVLDB, 7(13):1609--1612, 2014.
[9]
S. Breß, N. Siegmund, M. Heimel, M. Saecker, T. Lauer, L. Bellatreche, and G. Saake. Load-aware inter-co-processor parallelism in database query processing. Data & Knowledge Engineering, 93(0):60--79, 2014.
[10]
L. Chen, X. Huo, and G. Agrawal. Accelerating mapreduce on a coupled cpu-gpu architecture. In SC, pages 25:1--25:11. IEEE, 2012.
[11]
C. Gregg and K. Hazelwood. Where is the data? why you cannot debate CPU vs. GPU performance without the answer. In ISPASS, pages 134--144. IEEE, 2011.
[12]
S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. QPipe: A simultaneously pipelined relational query engine. In SIGMOD, pages 383--394. ACM, 2005.
[13]
B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational query co-processing on graphics processors. In ACM Trans. Database Syst., volume 34. ACM, 2009.
[14]
J. He, M. Lu, and B. He. Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. Proc. VLDB Endow., 6(10):889--900, 2013.
[15]
M. Heimel, M. Saecker, H. Pirk, S. Manegold, and V. Markl. Hardware-oblivious parallelism for in-memory column-stores. PVLDB, 6(9):709--720, 2013.
[16]
S. Héman, N. Nes, M. Zukowski, and P. Boncz. Vectorized data processing on the Cell broadband engine. In DaMoN, pages 4:1--4:6. ACM, 2007.
[17]
S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull., 35(1):40--45, 2012.
[18]
S. Jha, B. He, M. Lu, X. Cheng, and H. P. Huynh. Improving main memory hash joins on Intel Xeon Phi processors: An experimental approach. PVLDB, 8(6):642--653, 2015.
[19]
T. Karnagel et al. Demonstrating efficient query processing in heterogeneous environments. In SIGMOD, pages 693--696. ACM, 2014.
[20]
T. Karnagel, D. Habich, and W. Lehner. Local vs. global optimization: Operator placement strategies in heterogeneous environments. In DAPHNE, EDBT/ICDT Workshops, pages 48--55, 2015.
[21]
K. Krikellas, S. Viglas, and M. Cintra. Generating code for holistic query evaluation. In ICDE, pages 613--624. IEEE, 2010.
[22]
V. Leis, P. Boncz, A. Kemper, and T. Neumann. Morsel-driven parallelism: A NUMA-aware query evaluation framework for the many-core age. In SIGMOD, pages 743--754. ACM, 2014.
[23]
M. Lu, L. Zhang, H. P. Huynh, Z. Ong, Y. Liang, B. He, R. Goh, and R. Huynh. Optimizing the mapreduce framework on Intel Xeon Phi coprocessor. In Big Data, pages 125--130. IEEE, 2013.
[24]
T. Mostak. An overview of MapD (massively parallel database). White Paper, MIT, April 2013. http://geops.csail.mit.edu/docs/mapd_overview.pdf.
[25]
R. Mueller, J. Teubner, and G. Alonso. Data processing on FPGAs. PVLDB, 2(1):910--921, 2009.
[26]
T. Mühlbauer, W. Rödiger, R. Seilbeck, A. Kemper, and T. Neumann. Heterogeneity-conscious parallel query execution: Getting a better mileage while driving faster! In DaMoN, pages 2:1--2:10. ACM, 2014.
[27]
T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, 2011.
[28]
P. O'Neil, E. J. O'Neil, and X. Chen. The star schema benchmark (SSB), 2009. Revision 3, http://www.cs.umb.edu/poneil/StarSchemaB.pdf.
[29]
I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki. Data-oriented transaction execution. PVLDB, 3(1--2):928--939, 2010.
[30]
H. Pirk, S. Manegold, and M. Kersten. Waste not... efficient co-processing of relational data. In ICDE. IEEE, 2014.
[31]
H. Pirk, T. Sellam, S. Manegold, and M. Kersten. X-Device Query Processing by Bitwise Distribution. In DaMoN, pages 48--54. ACM, 2012.
[32]
I. Psaroudakis, M. Athanassoulis, and A. Ailamaki. Sharing data and work across concurrent analytical queries. PVLDB, 6(9):637--648, 2013.
[33]
I. Psaroudakis, T. Scheuer, N. May, and A. Ailamaki. Task scheduling for highly concurrent analytical and transactional main-memory workloads. In ADMS, pages 36--45. VLDB Endowment, 2013.
[34]
J. Sanders and E. Kandrot. CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional, 1st edition, 2010.
[35]
K. Wang, K. Zhang, Y. Yuan, S. Ma, R. Lee, X. Ding, and X. Zhang. Concurrent analytical query processing with GPUs. PVLDB, 7(11):1011--1022, 2014.
[36]
H. Wu, G. Diamos, T. Sheard, M. Aref, S. Baxter, M. Garland, and S. Yalamanchili. Red Fox: An execution environment for relational query processing on GPUs. In CGO, pages 44:44--44:54. ACM, 2014.
[37]
Y. Yuan, R. Lee, and X. Zhang. The yin and yang of processing data warehousing queries on GPU devices. PVLDB, 6(10):817--828, 2013.
[38]
S. Zhang et al. OmniDB: Towards portable and efficient query processing on parallel CPU/GPU architectures. PVLDB, 6(12):1374--1377, 2013.

Cited By

View all
  • (2024)Workload Placement on Heterogeneous CPU-GPU SystemsProceedings of the VLDB Endowment10.14778/3685800.368584517:12(4241-4244)Online publication date: 8-Nov-2024
  • (2024)Heterogeneous Intra-Pipeline Device-Parallel AggregationsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663441(1-10)Online publication date: 10-Jun-2024
  • (2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. co-processing
  2. co-processor
  3. databases
  4. query optimization
  5. query processing

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)21
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Workload Placement on Heterogeneous CPU-GPU SystemsProceedings of the VLDB Endowment10.14778/3685800.368584517:12(4241-4244)Online publication date: 8-Nov-2024
  • (2024)Heterogeneous Intra-Pipeline Device-Parallel AggregationsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663441(1-10)Online publication date: 10-Jun-2024
  • (2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
  • (2024)On-The-Fly Data Distribution to Accelerate Query Processing in Heterogeneous Memory SystemsAdvances in Databases and Information Systems10.1007/978-3-031-70626-4_12(170-183)Online publication date: 1-Sep-2024
  • (2023)Distributed GPU Joins on Fast RDMA-capable NetworksProceedings of the ACM on Management of Data10.1145/35887091:1(1-26)Online publication date: 30-May-2023
  • (2023)Novel insights on atomic synchronization for sort-based group-by on GPUsDistributed and Parallel Databases10.1007/s10619-023-07424-241:3(387-409)Online publication date: 24-Apr-2023
  • (2022)Orchestrating data placement and query execution in heterogeneous CPU-GPU DBMSProceedings of the VLDB Endowment10.14778/3551793.355180915:11(2491-2503)Online publication date: 29-Sep-2022
  • (2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
  • (2022)Databases on Modern HardwareundefinedOnline publication date: 25-Feb-2022
  • (2021)Speculative Dynamic Reconfiguration and Table Prefetching Using Query Look-Ahead in the ReProVide Near-Data-Processing SystemDatenbank-Spektrum10.1007/s13222-020-00363-721:1(55-64)Online publication date: 4-Jan-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media