Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3149457.3149480acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

Performing External Join Operator on PostgreSQL with Data Transfer Approach

Published: 28 January 2018 Publication History

Abstract

With the development of sensing devices, the size of data managed by human being has been rapidly increasing. To manage such huge data, relational database management system (RDBMS) plays a key role. RDBMS models the real world data as n-ary relational tables. Join operator is one of the most important relational operators, and its acceleration has been studied widely and deeply. How can an RDBMS provide such an efficient join operator? The performance improvement of join operator has been deeply studied for a decade, and many techniques are proposed already. The problem that we face is how to actually use such excellent techniques in real RDBMSs. We propose to implement an efficient join technique by the data transfer approach. The approach makes a hook point inside an RDBMS internal, and pulls data streams from the operator pipeline in the RDBMS, and applies our original join operator to the data, and finally returns the result to the operator pipeline in the RDBMS. The result of the experiment showed that our proposed method achieved 1.42x speedup compared with PostgreSQL. Our code is available on GitHub.

References

[1]
Apache. {n. d.}. Apache Impala (incubating): the open source, native analytic database for Apache Hadoop. https://impala.apache.org. ({n. d.}).
[2]
Apache. {n. d.}. Apache SparkSQL. https://spark.apache.org/sql. ({n. d.}).
[3]
Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M. Tamer Özsu. 2013. Multi-core, Main-memory Joins: Sort vs. Hash Revisited. Proc. VLDB Endow. 7, 1 (Sept. 2013), 85--96.
[4]
R. Barber, G. Lohman, I. Pandis, V. Raman, R. Sidle, G. Attaluri, N. Chainani, S. Lightstone, and D. Sharpe. 2014. Memory-efficient Hash Joins. Proc. VLDB Endow. 8, 4 (Dec. 2014), 353--364.
[5]
Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CP Us. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11). ACM, New York, NY, USA, 37--48.
[6]
Facebook. {n. d.}. Presto --- Distributed SQL Query Engine for Big Data. https://prestodb.io. ({n. d.}).
[7]
Jana Giceva, Gerd Zellweger, Gustavo Alonso, and Timothy Rosco. 2016. Customized OS Support for Data-processing. In Proceedings of the 12th International Workshop on Data Management on New Hardware (DaMoN '16). ACM, New York, NY, USA, Article 2, 6 pages.
[8]
Goetz Graefe. 1990. Encapsulation of Parallelism in the Volcano Query Processing System. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD '90). ACM, New York, NY, USA, 102--111.
[9]
Jiong He, Mian Lu, and Bingsheng He. 2013. Revisiting Co-processing for Hash Joins on the Coupled CPU-GPU Architecture. Proc. VLDB Endow. 6, 10 (Aug. 2013), 889--900.
[10]
Kentaro Horio, Hideyuki Kawashima, and Osamu Tatebe. 2017. Efficient parallel summation on encrypted database system. In 2017 IEEE International Conference on Big Data and Smart Computing, BigComp 2017, Jeju Island, South Korea, February 13-16, 2017. 178--185.
[11]
Saurabh Jha, Bingsheng He, Mian Lu, Xuntao Cheng, and Huynh Phung Huynh. 2015. Improving Main Memory Hash Joins on Intel Xeon Phi Processors: An Experimental Approach. Proc. VLDB Endow. 8, 6 (Feb. 2015), 642--653.
[12]
Tim Kaldewey, Guy Lohman, Rene Mueller, and Peter Volk. 2012. GPU Join Processing Revisited. In Proceedings of the Eighth International Workshop on Data Management on New Hardware (DaMoN '12). ACM, New York, NY, USA, 55--62.
[13]
Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-core CPUs. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1378--1389.
[14]
Masaru Kitsuregawa, Hidehiko Tanaka, and Tohru Moto-Oka. 1983. Application of Hash to Data Base Machine and Its Architecture. New Generation Comput. 1, 1 (1983), 63--74.
[15]
Viktor Leis, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven Parallelism: A NUMA-aware Query Evaluation Framework for the Many-core Age. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). ACM, New York, NY, USA, 743--754.
[16]
Ryuya Mitsuhashi. {n. d.}. PostgreSQL Extension for External Join. ({n. d.}). https://github.com/ryumt/ExternalJoin-pgsql9.5.2
[17]
Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB 4, 9 (2011), 539--550. http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
[18]
PGStrom. {n. d.}. PG-Strom: Limit Breaker of PostgreSQL. ({n. d.}). http://strom.kaigai.gr.jp
[19]
PostgreSQL. {n. d.}. Parallel Plans on PostgreSQL-9.6. ({n. d.}). https://www.postgresql.org/docs/9.6/static/parallel-plans.html
[20]
Jens Teubner, Gustavo Alonso, Cagri Balkesen, and M. Tamer Ozsu. 2013. Main-memory Hash Joins on Multi-core CPUs: Tuning to the Underlying Hardware. In Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013) (ICDE '13). IEEE Computer Society, Washington, DC, USA, 362--373.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
HPCAsia '18: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
January 2018
322 pages
ISBN:9781450353724
DOI:10.1145/3149457
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 January 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Parallel Hash Join
  2. PostgreSQL
  3. Relational Database

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

HPC Asia 2018

Acceptance Rates

HPCAsia '18 Paper Acceptance Rate 30 of 67 submissions, 45%;
Overall Acceptance Rate 69 of 143 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 129
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)2
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media