Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Iceberg Hashing: Optimizing Many Hash-Table Criteria at Once

Published: 30 November 2023 Publication History

Abstract

Despite being one of the oldest data structures in computer science, hash tables continue to be the focus of a great deal of both theoretical and empirical research. A central reason for this is that many of the fundamental properties that one desires from a hash table are difficult to achieve simultaneously; thus many variants offering different trade-offs have been proposed.
This article introduces Iceberg hashing, a hash table that simultaneously offers the strongest known guarantees on a large number of core properties. Iceberg hashing supports constant-time operations while improving on the state of the art for space efficiency, cache efficiency, and low failure probability. Iceberg hashing is also the first hash table to support a load factor of up to 1 - o(1) while being stable, meaning that the position where an element is stored only ever changes when resizes occur. In fact, in the setting where keys are Θ (log n) bits, the space guarantees that Iceberg hashing offers, namely that it uses at most \(\log \binom{|U|}{n} + O(n \log \ \text{log} n)\) bits to store n items from a universe U, matches a lower bound by Demaine et al. that applies to any stable hash table.
Iceberg hashing introduces new general-purpose techniques for some of the most basic aspects of hash-table design. Notably, our indirection-free technique for dynamic resizing, which we call waterfall addressing, and our techniques for achieving stability and very-high probability guarantees, can be applied to any hash table that makes use of the front-yard/backyard paradigm for hash table design.

References

[1]
abseil[2020.]. Google’s Abseil C++ Library. Retrieved June 11, 2020 from https://abseil.io/. Accessed: 2020-11-06.
[2]
Alok Aggarwal and S. Vitter, Jeffrey. 1988. The input/output complexity of sorting and related problems. Communications of the ACM 31, 9 ( 1988), 1116–1127. DOI:
[3]
Yuriy Arbitman, Moni Naor, and Gil Segev. 2009. De-amortized cuckoo hashing: Provable worst-case performance and experimental results. In Proceedings of the Automata, Languages and Programming. Springer, Berlin, 107–118.
[4]
Yuriy Arbitman, Moni Naor, and Gil Segev. 2010. Backyard cuckoo hashing: Constant worst-case operations with a succinct representation. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science. IEEE, 787–796.
[5]
Michael A. Bender, Alex Conway, Martín Farach-Colton, William Kuszmaul, and Guido Tagliavini. 2023. Tiny pointers. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 477–508. DOI:
[6]
Michael A. Bender, Abhishek Bhattacharjee, Alex Conway, Martín Farach-Colton, Rob Johnson, Sudarsun Kannan, William Kuszmaul, Nirjhar Mukherjee, Don Porter, Guido Tagliavini, Janet Vorobyeva, and Evan West. 2021. Paging and the address-translation problem. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures. 105–117.
[7]
Michael A. Bender, Alex Conway, Martín Farach-Colton, William Kuszmaul, and Guido Tagliavini. 2023. Tiny pointers. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 477–508.
[8]
Michael A. Bender, Martin Farach-Colton, Mayank Goswami, Rob Johnson, Samuel McCauley, and Shikha Singh. 2018. Bloom filters, adaptivity, and the dictionary problem. In Proceedings of the 59th Annual IEEE Symposium on Foundations of Computer Science. Paris, France, 182–193.
[9]
Michael A Bender, Bradley C Kuszmaul, and William Kuszmaul. 2022. Linear probing revisited: Tombstones mark the demise of primary clustering. In Proceedings of the 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science. IEEE, 1171–1182.
[10]
Ioana O. Bercea and Guy Even. 2020. A dynamic space-efficient filter with constant time operations. In Proceedings of the 17th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT) (Leibniz International Proceedings in Informatics (LIPIcs)), Susanne Albers (Ed.). Vol. 62, Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, 11:1–11:17. DOI:
[11]
Ioana Oriana Bercea and Guy Even. 2020. A Space-Efficient Dynamic Dictionary for Multisets with Constant Time Operations. arXiv:2005.02143. Retrieved from https://arxiv.org/abs/2005.02143
[12]
Alex Conway, Martín Farach-Colton, and Philip Shilane. 2018. Optimal hashing in external memory. In Proceedings of the 45th International Colloquium on Automata, Languages, and Programming (ICALP) (Leibniz International Proceedings in Informatics (LIPIcs)), Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, 39:1–39:14. DOI:
[13]
cplusplus1 [2020.]. cpppreference std::unordered_map. Retrieved June 11, 2020 from https://en.cppreference.com/w/cpp/container/unordered_map
[14]
cplusplus2 [2020.]. gcc-mirror/gcc libstdc++-v3 unordered_map.h. Retrieved June 11, 2020 from https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/unordered_map.h
[15]
Erik D. Demaine, Friedhelm Meyer auf der Heide, Rasmus Pagh, and Mihai Pǎtraşcu. 2006. De dictionariis dynamicis pauco spatio utentibus. In Proceedings of the Latin American Symposium on Theoretical Informatics. Springer, 349–361.
[16]
Martin Dietzfelbinger and Friedhelm Meyer auf der Heide. 1990. A new universal class of hash functions and dynamic hashing in real time. In Proceedings of the International Colloquium on Automata, Languages, and Programming. Springer, 6–19.
[17]
M. Dietzfelbinger, A. Karlin, K. Mehlhorn, F. M. auf der Heide, H. Rohnert, and R. E. Tarjan. 1988. Dynamic perfect hashing: upper and lower bounds. In Proceedings of the 29th Annual Symposium on Foundations of Computer Science. 524–531. DOI:
[18]
Martin Dietzfelbinger and Christoph Weidling. 2007. Balanced allocation and dictionaries with tightly packed constant size bins. Theoretical Computer Science 380, 1-2 ( 2007), 47–68. DOI:
[19]
Martin Dietzfelbinger and Philipp Woelfel. 2003. Almost random graphs with simple hash functions. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing. 629–638.
[20]
F14 [2020.]. Facebook’s F14 Hash Table. Retrieved June 11, 2020 from https://engineering.fb.com/2019/04/25/developer-tools/f14/. Accessed: 2020-11-06.
[21]
Dimitris Fotakis, Rasmus Pagh, Peter Sanders, and Paul Spirakis. 2003. Space efficient hash tables with worst case constant access time. In Proceedings of the STACS 2003. Springer, Berlin, 271–282.
[22]
Michael L. Fredman, Janos Komlos, and Endre Szemeredi. 1982. Storing a sparse table with O(1) worst case access time. In Proceedings of the 23rd Annual Symposium on Foundations of Computer Science. IEEE Computer Society, 165–169.
[23]
Michael L. Fredman and Dan E. Willard. 1990. BLASTING through the information theoretic barrier with FUSION TREES. In Proceedings of the 22nd Annual ACM symposium on Theory of Computing. ACM, 1–7.
[24]
Michael T. Goodrich, Daniel S Hirschberg, Michael Mitzenmacher, and Justin Thaler. 2011. Fully de-amortized cuckoo hashing for cache-oblivious dictionaries and multimaps. arXiv:1107.4378. Retrieved from https://arxiv.org/abs/1107.4378
[25]
Michael T. Goodrich, Daniel S. Hirschberg, Michael Mitzenmacher, and Justin Thaler. 2012. Cache-oblivious dictionaries and multimaps with negligible failure probability. In Proceedings of the Mediterranean Conference on Algorithms. Springer, 203–218.
[26]
Krishnan Gosakan, Jaehyun Han, William Kuszmaul, Ibrahim N. Mubarek, Nirjhar Mukherjee, Karthik Sriram, Guido Tagliavini, Evan West, Michael A. Bender, Abhishek Bhattacharjee, Alex Conway, Martin Farach-Colton, Jayneel Gandhi, Rob Johnson, Sudarsun Kannan, and Donald E. Porter. 2023. Mosaic pages: Big TLB reach with small pages. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 433–448.
[27]
Takao Gunji and Eiichi Goto. 1980. Studies on hashing part-1: A comparison of hashing algorithms with key deletion. J. Information Processing 3, 1 (1980), 1–12.
[28]
Torben Hagerup, Peter Bro Miltersen, and Rasmus Pagh. 2001. Deterministic dictionaries. Journal of Algorithms 41, 1 (2001), 69–85. DOI:
[29]
John Iacono and Mihai Pătraşcu. 2012. Using hashing to solve the dictionary problem. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms. 570–582.
[30]
Morten Skaarup Jensen and Rasmus Pagh. 2008. Optimality in external memory hashing. Algorithmica 52, 3 (2008), 403–411.
[31]
Eyal Kaplan, Moni Naor, and Omer Reingold. 2009. Derandomized constructions of k-wise (almost) independent permutations. Algorithmica 55, 1 (2009), 113–133.
[32]
Donald E. Knuth. 1973. The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition. Addison-Wesley. Retrieved from https://www.worldcat.org/oclc/310903895
[33]
Donald E. Knuth. 1973. The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley.
[34]
Donald E. Knuth. 2011. The art of Computer Programming, Volume 4A: Combinatorial Algorithms, Part 1. Pearson Education India.
[35]
William Kuszmaul. 2022. A hash table without hash functions, and how to get the most out of your random bits. In Proceedings of the 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science. IEEE, 991–1001.
[36]
PerÅke Larson. 1983. Analysis of uniform hashing. Journal of the ACM 30, 4 (1983), 805–819.
[37]
Per-Åke Larson. 1980. Linear hashing with partial expansions. In Proceedings of the VLDB. 224–232.
[38]
Witold Litwin. 1980. Linear Hashing: A new tool for file and table addressing. In Proceedings of the VLDB, Vol. 80. 1–3.
[39]
Mingmou Liu, Yitong Yin, and Huacheng Yu. 2020. Succinct filters for sets of unknown sizes. In Proceedings of the 47th International Colloquium on Automata, Languages, and Programming (ICALP) (LIPIcs), Artur Czumaj, Anuj Dawar, and Emanuela Merelli (Eds.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Vol. 168, 79:1–79:19. DOI:
[40]
Michael Luby and Charles Rackoff. 1988. How to construct pseudorandom permutations from pseudorandom functions. SIAM Journal on Computing 17, 2 (1988), 373–386.
[41]
Tobias Maier, Peter Sanders, and Stefan Walzer. 2019. Dynamic space efficient hashing. Algorithmica 81, 8 (2019), 3162–3185.
[42]
Colin McDiarmid. 1989. On the method of bounded differences. Surveys in Combinatorics 141, 1 (1989), 148–188.
[43]
Moni Naor and Omer Reingold. 1999. On the construction of pseudorandom permutations: Luby-rackoff revisited. Journal of Cryptology 12, 1 (1999), 29–66.
[44]
Anna Pagh and Rasmus Pagh. 2008. Uniform hashing in constant time and optimal space. SIAM Journal on Computings 38, 1 (2008), 85–96.
[45]
Rasmus Pagh. 2000. Faster deterministic dictionaries. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms. 487–493.
[46]
Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. Journal of Algorithms 51, 2 ( 2004), 122–144. DOI:
[47]
Rasmus Pagh, Zhewei Wei, Ke Yi, and Qin Zhang. 2014. Cache-oblivious hashing. Algorithmica 69, 4 (2014), 864–883. DOI:
[48]
Prashant Pandey, Michael A. Bender, Alex Conway, Martin Farach-Colton, William Kuszmaul, Guido Tagliavini, and Rob Johnson. 2023. IcebergHT: High performance PMEM hash tables through stability and low associativity. In Proceedings of the 2023 ACM International Conference on Management of Data.
[49]
Mihai Patrascu and Mikkel Thorup. 2014. Dynamic integer sets with optimal rank, select, and predecessor search. In Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science. 166–175.
[50]
W. Wesley Peterson. 1957. Addressing for random-access storage. IBM Journal of Research and Development 1, 2 (1957), 130–146.
[51]
Mihai Pǎtraşcu and Mikkel Thorup. 2012. The power of simple tabulation hashing. Journal of the ACM 59, 3 (2012), 1–50.
[52]
Rajeev Raman and Satti Srinivasa Rao. 2003. Succinct dynamic dictionaries and trees. In Proceedings of the Automata, Languages and Programming. Springer, Berlin, 357–368.
[53]
Milan Ružić. 2008. Uniform deterministic dictionaries. ACM Transactions on Algorithms 4, 1 ( 2008), 23 pages. DOI:
[54]
Peter Sanders. 2018. Hashing with linear probing and referential integrity. arXiv:1808.04602. Retrieved from https://arxiv.org/abs/1808.04602
[55]
Alan Siegel. 2004. On universal classes of extremely random constant-time hash functions. SIAM Journal on Computing 33, 3 (2004), 505–543.
[56]
Rajamani Sundar. 1991. A lower bound for the dictionary problem under a hashing model. In Proceedings 32nd Annual Symposium of Foundations of Computer Science. 612–621. DOI:
[57]
Elad Verbin and Qin Zhang. 2013. The limits of buffering: A tight lower bound for dynamic membership in the external memory model. SIAM Journal on Computing 42, 1 (2013), 212–229.
[58]
Jeffrey S. Vitter and Wen-Chin Chen. 1987. The Design and Analysis of Coalesced Hashing. Oxford University Press, Inc.

Cited By

View all
  • (2024)Approaches to Conflict-free Replicated Data TypesACM Computing Surveys10.1145/369524957:2(1-36)Online publication date: 9-Sep-2024
  • (2024)Mosaic Pages: Big TLB Reach With Small PagesIEEE Micro10.1109/MM.2024.340918144:4(52-59)Online publication date: 6-Jun-2024
  • (2024)Tight Bounds for Classical Open Addressing2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS61266.2024.00047(636-657)Online publication date: 27-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 70, Issue 6
December 2023
314 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/3633310
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2023
Online AM: 02 October 2023
Accepted: 27 August 2023
Revised: 11 May 2023
Received: 16 September 2021
Published in JACM Volume 70, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hashing
  2. stability
  3. dynamic resizing

Qualifiers

  • Research-article

Funding Sources

  • NSF
  • NSF GRFP fellowship and a Fannie and John Hertz Fellowship
  • United States Air Force Research Laboratory

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)685
  • Downloads (Last 6 weeks)35
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Approaches to Conflict-free Replicated Data TypesACM Computing Surveys10.1145/369524957:2(1-36)Online publication date: 9-Sep-2024
  • (2024)Mosaic Pages: Big TLB Reach With Small PagesIEEE Micro10.1109/MM.2024.340918144:4(52-59)Online publication date: 6-Jun-2024
  • (2024)Tight Bounds for Classical Open Addressing2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS61266.2024.00047(636-657)Online publication date: 27-Oct-2024
  • (2024)Optimal Bounds for Open Addressing Without Reordering2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS61266.2024.00045(594-605)Online publication date: 27-Oct-2024
  • (2024)Robust Left-Right Hashing Scheme for Ubiquitous ComputingEngineering Research Express10.1088/2631-8695/ad6d2aOnline publication date: 8-Aug-2024
  • (2024)A Micro-architecture that supports the Fano–Elias encoding and a hardware accelerator for approximate membership queriesMicroprocessors & Microsystems10.1016/j.micpro.2023.104992105:COnline publication date: 25-Jun-2024
  • (2024)Revisiting Cuckoo Hashing: re-addressing the challenges of Cuckoo HashingInternational Journal of Information Technology10.1007/s41870-024-02274-2Online publication date: 20-Nov-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media