Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

IcebergHT: High Performance Hash Tables Through Stability and Low Associativity

Published: 30 May 2023 Publication History

Abstract

Modern hash table designs for DRAM and PMEM strive to minimize space while maximizing speed. The most important factor in speed is the number of cache lines accessed during updates and queries. On PMEM, there is an additional consideration, which is to minimize the number of writes, because on PMEM writes are more expensive than reads. This paper proposes two design objectives, stability and low-associativity, that enable us to build hash tables that minimize cache-line accesses for all operations. A hash table is stable if it does not move items around, and a hash table has low associativity if there are only a few locations where an item can be stored. Low associativity ensures that queries need to examine only a few memory locations, and stability ensures that insertions write to very few cache lines. Stability also simplifies concurrency and, on PMEM, crash safety. We present IcebergHT, a fast, concurrent, space-efficient, and crash-safe (for PMEM) hash table based on the design principles of stability and low associativity. IcebergHT combines in-memory metadata with a new hashing technique, iceberg hashing, that is (1) space efficient, (2) stable, and (3) supports low associativity. In contrast, existing hash-tables either modify numerous cache lines during insertions (e.g. cuckoo hashing), access numerous cache lines during queries (e.g. linear probing), or waste space (e.g. chaining). Moreover, the combination of (1)-(3) yields several emergent benefits: IcebergHT scales better than other hash tables, has excellent performance, and supports crash-safety on PMEM. Our benchmarks show that IcebergHT has excellent performance both in DRAM and PMEM. In PMEM, IcebergHT insertions are 50% to 3× faster than state-of-the-art PMEM hash tables, such as Dash and CLHT, and queries are 20% to 2× faster. IcebergHT space overhead is 17%, whereas Dash and CLHT have space overheads of 2× and 3×, respectively. IcebergHT also scaled linearly throughout our experiments and is crash safe. In DRAM, IcebergHT outperforms state-of-the-art hash tables libcuckoo and CLHT by almost 2× on insertions while offering good query throughput and much better space efficiency.

Supplemental Material

MP4 File
Presentation video for "IcebergHT: High Performance Hash Tables Through Stability and Low Associativity".

References

[1]
Ole Amble and Donald Ervin Knuth. 1974. Ordered hash tables. Comput. J. 17, 2 (Jan. 1974), 135--142. https://doi.org/10.1093/comjnl/17.2.135
[2]
Apache. [n.d.]. Cassandra. http://cassandra.apache.org.
[3]
Michael A. Bender, Jake Christensen, Alex Conway, Martin Farach-Colton, Rob Johnson, and Meng-Tsung Tsai. 2019. Optimal Ball Recycling. In SODA. SIAM, 2527--2546.
[4]
Michael A. Bender, Alex Conway, Martín Farach-Colton, William Kuszmaul, and Guido Tagliavini. 2021. All-Purpose Hashing. https://doi.org/10.48550/ARXIV.2109.04548
[5]
Pedro Celis, Per-Ake Larson, and J Ian Munro. 1985. Robin hood hashing. In 26th Annual Symposium on Foundations of Computer Science (FOCS). 281--288.
[6]
Diego Cepeda and Wojciech Golab. 2021. PHPRX: An Efficient Hash Table for Persistent Memory. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (Virtual Event, USA) (SPAA '21). Association for Computing Machinery, New York, NY, USA, 423--425. https://doi.org/10.1145/3409964.3461820
[7]
Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin Levandoski, James Hunter, and Mike Barnett. 2018. Faster: A concurrent key-value store with in-place updates. In Proceedings of the 2018 International Conference on Management of Data. 275--290.
[8]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing. 143--154.
[9]
Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2015. Asynchronized concurrency: The secret to scaling concurrent search data structures. ACM SIGARCH Computer Architecture News 43, 1 (2015), 631--644.
[10]
Biplob Debnath, Alireza Haghdoost, Asim Kadav, Mohammed G. Khatib, and Cristian Ungureanu. 2015. Revisiting Hash Table Design for Phase Change Memory. In Proceedings of the 3rd Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads (Monterey, California) (INFLOW '15). Association for Computing Machinery, New York, NY, USA, Article 1, 9 pages. https://doi.org/10.1145/2819001.2819002
[11]
dynamo [n.d.]. DynamoDB. https://aws.amazon.com/dynamodb/. Accessed: 2020--11-06.
[12]
F14 [n.d.]. Facebook's F14 Hash Table. https://engineering.fb.com/2019/04/25/developer-tools/f14/. Accessed: 2020--11-06.
[13]
Ronald Fagin, Jurg Nievergelt, Nicholas Pippenger, and H Raymond Strong. 1979. Extendible hashing-a fast access method for dynamic files. ACM Transactions on Database Systems (TODS) 4, 3 (1979), 315--344.
[14]
Bin Fan, David G. Andersen, and Michael Kaminsky. 2013. MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). USENIX Association, Lombard, IL, 371--384. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/fan
[15]
googlesparse [n.d.]. Google's Sparse Hash. https://github.com/sparsehash/sparsehash. Accessed: 2020--11-06.
[16]
Takao Gunji and Eiichi Goto. 1980. Studies on hashing part-1: A comparison of hashing algorithms with key deletion. J. Information Processing 3, 1 (1980), 1--12.
[17]
Daokun Hu, Zhiwen Chen, Jianbing Wu, Jianhua Sun, and Hao Chen. 2021. Persistent Memory Hash Indexes: An Experimental Evaluation. Proc. VLDB Endow. 14, 5 (Jan. 2021), 785--798. https://doi.org/10.14778/3446095.3446101
[18]
Donald E. Knuth. 1973. The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley.
[19]
Se Kwon Lee, Jayashree Mohan, Sanidhya Kashyap, Taesoo Kim, and Vijay Chidambaram. 2019. RECIPE: Converting Concurrent DRAM Indexes to Persistent-Memory Indexes. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19). Ontario, Canada.
[20]
Yossi Lev, Victor Luchangco, and Marek Olszewski. 2009. Scalable reader-writer locks. In SPAA. ACM, 101--110.
[21]
Xiaozhou Li, David G Andersen, Michael Kaminsky, and Michael J Freedman. 2014. Algorithmic improvements for fast concurrent cuckoo hashing. In Proceedings of the Ninth European Conference on Computer Systems. 1--14.
[22]
Hyeontaek Lim, Bin Fan, David G Andersen, and Michael Kaminsky. 2011. SILT: A memory-efficient, high-performance key-value store. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. ACM, 1--13.
[23]
Hyeontaek Lim, Dongsu Han, David G Andersen, and Michael Kaminsky. 2014. {MICA}: A holistic approach to fast in-memory key-value storage. In 11th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 14). 429--444.
[24]
Baotong Lu, Xiangpeng Hao, Tianzheng Wang, and Eric Lo. 2020. Dash: Scalable Hashing on Persistent Memory. Proc. VLDB Endow. 13, 8 (2020), 1147--1161. https://doi.org/10.14778/3389133.3389134
[25]
Tobias Maier, Peter Sanders, and Roman Dementiev. 2016. Concurrent hash tables: fast and general?(!). In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, March 12--16, 2016, Rafael Asenjo and Tim Harris (Eds.). ACM, 34:1--34:2. https://doi.org/10.1145/2851141.2851188
[26]
Tobias Maier, Peter Sanders, and Roman Dementiev. 2019. Concurrent Hash Tables: Fast and General(?)! ACM Transactions Parallel Computing 5, 4 (2019), 16:1--16:32. https://doi.org/10.1145/3309206
[27]
Tobias Maier, Peter Sanders, and Stefan Walzer. 2019. Dynamic space efficient hashing. Algorithmica 81, 8 (2019), 3162--3185.
[28]
Memcached [n.d.]. Memcached. https://memcached.org/. Accessed: 2020--11-06.
[29]
Zviad Metreveli, Nickolai Zeldovich, and M. Frans Kaashoek. 2012. CPHASH: A Cache-Partitioned Hash Table. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (New Orleans, Louisiana, USA) (PPoPP '12). Association for Computing Machinery, New York, NY, USA, 319--320. https://doi.org/10.1145/2145816.2145874
[30]
Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press.
[31]
mongo [n.d.]. MongoDB. https://www.mongodb.com/. Accessed: 2020--11-06.
[32]
mremap [n.d.]. Linux Programmer's Manual. https://man7.org/linux/man-pages/man2/mremap.2.html. Accessed: 2021-09--14.
[33]
Moohyeon Nam, Hokeun Cha, Young ri Choi, Sam H. Noh, and Beomseok Nam. 2019. Write-Optimized Dynamic Hashing for Persistent Memory. In 17th USENIX Conference on File and Storage Technologies (FAST 19). USENIX Association, Boston, MA, 31--44. https://www.usenix.org/conference/fast19/presentation/nam
[34]
Ian Neal, Gefei Zuo, Eric Shiple, Tanvir Ahmed Khan, Youngjin Kwon, Simon Peter, and Baris Kasikci. 2021. Rethinking File Mapping for Persistent Memory. In 19th USENIX Conference on File and Storage Technologies (FAST 21). USENIX Association, 97--111. https://www.usenix.org/conference/fast21/presentation/neal
[35]
Rasmus Pagh and Flemming Friche Rodler. 2001. Cuckoo Hashing. In Algorithms - ESA 2001, Friedhelm Meyer auf der Heide (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 121--133.
[36]
Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo Hashing. J. Algorithms 51, 2 (May 2004), 122--144. https://doi.org/10.1016/j.jalgor.2003.12.002
[37]
Chuck Pheatt. 2008. Intel® threading building blocks. Journal of Computing Sciences in Colleges 23, 4 (2008), 298--298.
[38]
pmdk [n.d.]. Intel. Persistent Memory Development Kit. http://pmem.io/pmdk/libpmem/. Accessed August 2, 2021.
[39]
Redis [n.d.]. Redis. https://redis.io/. Accessed: 2020--11-06.
[40]
Peter Sanders. 2018. Hashing with Linear Probing and Referential Integrity. arXiv preprint arXiv:1808.04602 (2018).
[41]
David Schwalb, Markus Dreseler, Matthias Uflacker, and Hasso Plattner. 2015. NVC-Hashmap: A Persistent and Concurrent Hashmap For Non-Volatile Memories. In Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics (Kohala Coast, HI, USA) (IMDM '15). Association for Computing Machinery, New York, NY, USA, Article 4, 8 pages. https://doi.org/10.1145/2803140.2803144
[42]
Berthold Vöcking. 2003. How asymmetry helps load balancing. Journal of the ACM (JACM) 50, 4 (2003), 568--589.
[43]
Pengfei Zuo and Yu Hua. 2018. A Write-Friendly and Cache-Optimized Hashing Scheme for Non-Volatile Memory Systems. IEEE Transactions on Parallel and Distributed Systems 29, 5 (2018), 985--998. https://doi.org/10.1109/TPDS.2017.2782251
[44]
Pengfei Zuo, Yu Hua, and Jie Wu. 2018. Write-optimized and high-performance hashing index scheme for persistent memory. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 461--476.
[45]
Pengfei Zuo, Yu Hua, and Jie Wu. 2019. Level Hashing: A High-Performance and Flexible-Resizing Persistent Hashing Index Structure. ACM Trans. Storage 15, 2, Article 13 (June 2019), 30 pages. https://doi.org/10.1145/3322096

Cited By

View all
  • (2024)Rethinking Hash Tables: Challenges and Opportunities with Compute Express Link (CXL)Proceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674418(23-27)Online publication date: 5-Jul-2024

Index Terms

  1. IcebergHT: High Performance Hash Tables Through Stability and Low Associativity

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Management of Data
      Proceedings of the ACM on Management of Data  Volume 1, Issue 1
      PACMMOD
      May 2023
      2807 pages
      EISSN:2836-6573
      DOI:10.1145/3603164
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 May 2023
      Published in PACMMOD Volume 1, Issue 1

      Author Tags

      1. dictionary data structure
      2. hash tables

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)620
      • Downloads (Last 6 weeks)59
      Reflects downloads up to 12 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Rethinking Hash Tables: Challenges and Opportunities with Compute Express Link (CXL)Proceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674418(23-27)Online publication date: 5-Jul-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media