Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2523616.2523626acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Memory footprint matters: efficient equi-join algorithms for main memory data processing

Published: 01 October 2013 Publication History
  • Get Citation Alerts
  • Abstract

    High-performance analytical data processing systems often run on servers with large amounts of main memory. A common operation in such environments is combining data from two or more sources using some "join" algorithm. The focus of this paper is on studying hash-based and sort-based equi-join algorithms when the data sets being joined fully reside in main memory. We only consider a single node setting, which is an important building block for larger high-performance distributed data processing systems. A critical contribution of this work is in pointing out that in addition to query response time, one must also consider the memory footprint of each join algorithm, as it impacts the number of concurrent queries that can be serviced. Memory footprint becomes an important deployment consideration when running analytical data processing services on hardware that is shared by other concurrent services. We also consider the impact of particular physical properties of the input and the output of each join algorithm. This information is essential for optimizing complex query pipelines with multiple joins. Our key contribution is in characterizing the properties of hash-based and sort-based equi-join algorithms, thereby allowing system implementers and query optimizers to make a more informed choice about which join algorithm to use.

    References

    [1]
    A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In VLDB, pages 266--277, 1999.
    [2]
    M.-C. Albutiu, A. Kemper, and T. Neumann. Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB, 5(10): 1064--1075, 2012.
    [3]
    S. Arumugam, A. Dobra, C. M. Jermaine, N. Pansare, and L. L. Perez. The DataPath system: a data-centric analytic processing engine for large data warehouses. In SIGMOD, pages 519--530, 2010.
    [4]
    C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, 2013.
    [5]
    R. Barber, P. Bendel, M. Czech, O. Draese, F. Ho, N. Hrle, S. Idreos, M.-S. Kim, O. Koeth, J.-G. Lee, T. T. Li, G. M. Lohman, K. Morfonios, R. Müller, K. Murthy, I. Pandis, L. Qiao, V. Raman, R. Sidle, K. Stolze, and S. Szabo. Business analytics in (a) blink. IEEE Data Eng. Bull., 35(1): 9--14, 2012.
    [6]
    S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD, pages 37--48, 2011.
    [7]
    P. A. Boncz, S. Manegold, and M. L. Kersten. Database architecture optimized for the new bottleneck: Memory access. In VLDB, pages 54--65, 1999.
    [8]
    S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Inspector joins. In VLDB, pages 817--828, 2005.
    [9]
    S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Improving hash join performance through prefetching. ACM Trans. Database Syst., 32(3): 17, 2007.
    [10]
    J. Cieslewicz and K. A. Ross. Data partitioning on chip multiprocessors. In DaMoN, pages 25--34, 2008.
    [11]
    D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. Stonebraker, and D. A. Wood. Implementation techniques for main memory database systems. In SIGMOD, pages 1--8, 1984.
    [12]
    D. J. DeWitt, J. F. Naughton, and D. A. Schneider. Parallel sorting on a shared-nothing architecture using probabilistic splitting. In PDIS, pages 280--291, 1991.
    [13]
    F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner. SAP HANA database - Data management for modern business applications. SIGMOD Record, 40(4): 45--51, 2011.
    [14]
    G. Fowler, L. C. Noll, and P. Vo. FNV hash. http://www.isthe.com/chongo/tech/comp/fnv/.
    [15]
    G. Giannikis, G. Alonso, and D. Kossmann. SharedDB: Killing one thousand queries with one stone. PVLDB, 5(6): 526--537, 2012.
    [16]
    G. H. Gonnet. Expected length of the longest probe sequence in hash code searching. J. ACM, 28: 289--304, April 1981.
    [17]
    G. Graefe. Encapsulation of parallelism in the Volcano query processing system. In SIGMOD, pages 102--111, 1990.
    [18]
    G. Graefe. Sort-merge-join: An idea whose time Has(h) passed? In ICDE, pages 406--417, 1994.
    [19]
    G. Graefe. Implementing sorting in database systems. ACM Comput. Surv., 38(3), 2006.
    [20]
    S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. QPipe: A simultaneously pipelined relational query engine. In SIGMOD, pages 383--394, 2005.
    [21]
    Intel Xeon Processor 7500 Series Uncore Programming Guide, March 2010. Reference number: 323535-001.
    [22]
    C. Kim, E. Sedlar, J. Chhugani, T. Kaldewey, A. D. Nguyen, A. D. Blas, V. W. Lee, N. Satish, and P. Dubey. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. PVLDB, 2(2): 1378--1389, 2009.
    [23]
    D. E. Knuth. The Art of Computer Programming, Volume III: Sorting and Searching, chapter 6.4. Addison-Wesley, 1998.
    [24]
    S. Manegold, P. A. Boncz, and M. L. Kersten. Optimizing main-memory join on modern hardware. IEEE Trans. Knowl. Data Eng., 14(4): 709--730, 2002.
    [25]
    D. R. Musser. Introspective sorting and selection algorithms. Softw., Pract. Exper., 27(8): 983--993, 1997.
    [26]
    T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9): 539--550, 2011.
    [27]
    Oracle Exalytics In-Memory Machine: A Brief Introduction, October 2011.
    [28]
    A. Pavlo, C. Curino, and S. B. Zdonik. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In SIGMOD, pages 61--72, 2012.
    [29]
    N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In SIGMOD, 2010.

    Cited By

    View all
    • (2020)ReSQM: Accelerating Database Operations Using ReRAM-Based Content Addressable MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.301286039:11(4030-4041)Online publication date: Nov-2020
    • (2016)Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM.2016.62(212-219)Online publication date: May-2016
    • (2016)FB+-tree for Big Data ManagementBig Data Research10.1016/j.bdr.2015.11.0034:C(25-36)Online publication date: 1-Jun-2016
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing
    October 2013
    427 pages
    ISBN:9781450324281
    DOI:10.1145/2523616
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SOCC '13
    Sponsor:
    SOCC '13: ACM Symposium on Cloud Computing
    October 1 - 3, 2013
    California, Santa Clara

    Acceptance Rates

    SOCC '13 Paper Acceptance Rate 23 of 114 submissions, 20%;
    Overall Acceptance Rate 169 of 722 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)ReSQM: Accelerating Database Operations Using ReRAM-Based Content Addressable MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.301286039:11(4030-4041)Online publication date: Nov-2020
    • (2016)Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM.2016.62(212-219)Online publication date: May-2016
    • (2016)FB+-tree for Big Data ManagementBig Data Research10.1016/j.bdr.2015.11.0034:C(25-36)Online publication date: 1-Jun-2016
    • (2015)Forecasting the cost of processing multi-join queries via hashing for main-memory databasesProceedings of the Sixth ACM Symposium on Cloud Computing10.1145/2806777.2806944(153-166)Online publication date: 27-Aug-2015
    • (2015)ByteSliceProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2747642(31-46)Online publication date: 27-May-2015
    • (2015)In-Memory Big Data Management and Processing: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.242779527:7(1920-1948)Online publication date: 1-Jul-2015
    • (2015)Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataundefinedOnline publication date: 27-May-2015
    • (2014)Parallel data analysis directly on scientific file formatsProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2612185(385-396)Online publication date: 18-Jun-2014

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media