Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1807085.1807124acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Cache-oblivious hashing

Published: 06 June 2010 Publication History

Abstract

The hash table, especially its external memory version, is one of the most important index structures in large databases. Assuming a truly random hash function, it is known that in a standard external hash table with block size b, searching for a particular key only takes expected average t_q=1+1/2Ω(b) disk accesses for any load factor α bounded away from $1$. However, such near-perfect performance is achieved only when b is known and the hash table is particularly tuned for working with such a blocking. In this paper we study if it is possible to build a cache-oblivious hash table that works well with any blocking. Such a hash table will automatically perform well across all levels of the memory hierarchy and does not need any hardware-specific tuning, an important feature in autonomous databases.
We first show that linear probing, a classical collision resolution strategy for hash tables, can be easily made cache-oblivious but it only achieves t_q = 1 + O(αb). Then we demonstrate that it is possible to obtain t_q = 1 + 1/2Ω(b), thus matching the cache-aware bound, if the following two conditions hold: (a) b is a power of 2; and (b) every block starts at a memory address divisible by b. Both conditions hold on a real machine, although they are not stated in the cache-oblivious model. Interestingly, we also show that neither condition is dispensable: if either of them is removed, the best obtainable bound is t_q=1+O(αb), which is exactly what linear probing achieves.

References

[1]
P. Afshani, C. Hamilton, and N. Zeh. Cache-oblivious range reporting with optimal queries requires superlinear space. In Proc. Annual Symposium on Computational Geometry, 2009.
[2]
A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, 1988.
[3]
M. A. Bender, G. S. Brodal, R. Fagerberg, D. Ge, S. He, H. Hu, J. Iacono, and A. López-Ortiz. The cost of cache-oblivious searching. In Proc. IEEE Symposium on Foundations of Computer Science, 2003.
[4]
M. A. Bender, E. D. Demaine, and M. Farach-Colton. Cache-oblivious B-trees. SIAM J. Comput., 35(2):341--358, 2005.
[5]
G. S. Brodal and R. Fagerberg. On the limits of cache-obliviousness. In Proc. ACM Symposium on Theory of Computing, 2003.
[6]
J. Carter and M. Wegman. Universal classes of hash functions. Journal of Computer and System Sciences, 18:143--154, 1979.
[7]
E. Demaine. Cache-oblivious algorithms and data structures. In EEF Summer School on Massive Datasets. Springer Verlag, 2002.
[8]
R. Fagin, J. Nievergelt, N. Pippenger, and H. Strong. Extendible hashing--a fast access method for dynamic files. ACM Transactions on Database Systems, 4(3):315--344, 1979.
[9]
M. L. Fredman, J. Komlos, and E. Szemeredi. Storing a sparse table with o(1) worst-case access time. In Proc. 23rd Annu. IEEE Sympos. Found. Comput. Sci., pages 165--170, 1982.
[10]
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. IEEE Symposium on Foundations of Computer Science, pages 285--298, 1999.
[11]
G. H. Gonnet and P.-Å. Larson. External hashing with limited internal storage. Journal of the ACM, 35(1):161--184, 1988.
[12]
B. He and Q. Luo. Cache-oblivious databases: Limitations and opportunities. ACM Transactions on Database Systems, 33(2), article 8, 2008.
[13]
M. S. Jensen and R. Pagh. Optimality in external memory hashing. Algorithmica, 52(3):403--411, 2008.
[14]
D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison-Wesley, Reading, MA, 1973.
[15]
P.-Å. Larson. Dynamic hash tables. Communications of the ACM, 31(4):446--457, 1988.
[16]
P.-Å Larson. Linear hashing with separators--a dynamic hashing scheme achieving one-access retrieval. ACM Transactions on Database Systems, 3(3):366--388, 1988.
[17]
W. Litwin. Linear hashing: a new tool for file and table addressing. In Proc. International Conference on Very Large Data Bases, pages 212--223, 1980.
[18]
M. Mitzenmacher and S. Vadhan. Why simple hash functions work: Exploiting the entropy in a data stream. In Proc. ACM-SIAM Symposium on Discrete Algorithms, 2008.
[19]
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
[20]
A. Pagh, R. Pagh, and M. RuÇzic. Linear probing with constant independence. In Proc. ACM Symposium on Theory of Computing, 2007.
[21]
R. Pagh and F. F. Rodler. Cuckoo hashing. Journal of Algorithms, 51:122--144, 2004.
[22]
G. Tenenbaum. Introduction to analytic and probabilistic number theory. Cambridge Univ Press, 1995.
[23]
E. Verbin and Q. Zhang. The limits of buffering: A tight lower bound for dynamic membership in the external memory model. In Proc. ACM Symposium on Theory of Computing, 2010.
[24]
Z. Wei, K. Yi, and Q. Zhang. Dynamic external hashing: The limit of buffering. In Proc. ACM Symposium on Parallelism in Algorithms and Architectures, 2009.

Cited By

View all
  • (2018)The New Hardware Development Trend and the Challenges in Data Management and AnalysisData Science and Engineering10.1007/s41019-018-0072-63:3(263-276)Online publication date: 24-Sep-2018
  • (2014)Optimal hierarchical layouts for cache-oblivious search trees2014 IEEE 30th International Conference on Data Engineering10.1109/ICDE.2014.6816686(616-627)Online publication date: Mar-2014
  • (2012)Cache-Oblivious dictionaries and multimaps with negligible failure probabilityProceedings of the First Mediterranean conference on Design and Analysis of Algorithms10.1007/978-3-642-34862-4_15(203-218)Online publication date: 3-Dec-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '10: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2010
350 pages
ISBN:9781450300339
DOI:10.1145/1807085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache-oblivious algorithms
  2. hashing

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '10
Sponsor:
SIGMOD/PODS '10: International Conference on Management of Data
June 6 - 11, 2010
Indiana, Indianapolis, USA

Acceptance Rates

PODS '10 Paper Acceptance Rate 27 of 113 submissions, 24%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2018)The New Hardware Development Trend and the Challenges in Data Management and AnalysisData Science and Engineering10.1007/s41019-018-0072-63:3(263-276)Online publication date: 24-Sep-2018
  • (2014)Optimal hierarchical layouts for cache-oblivious search trees2014 IEEE 30th International Conference on Data Engineering10.1109/ICDE.2014.6816686(616-627)Online publication date: Mar-2014
  • (2012)Cache-Oblivious dictionaries and multimaps with negligible failure probabilityProceedings of the First Mediterranean conference on Design and Analysis of Algorithms10.1007/978-3-642-34862-4_15(203-218)Online publication date: 3-Dec-2012

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media