Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Theory and practice of monotone minimal perfect hashing

Published: 16 November 2008 Publication History
  • Get Citation Alerts
  • Abstract

    Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, order-preserving minimal perfect hash functions (Fox et al. 1991) have been used to retrieve the position of a key in a given list of keys; however, the ability to preserve any given order leads to an unavoidable Ω(n log n) lower bound on the number of bits required to store the function. Recently, it was observed (Belazzougui et al. 2009) that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of Web graphs, and so on. We refer to this restricted version of the problem as monotone minimal perfect hashing. We analyze experimentally the data structures proposed in Belazzougui et al. [2009], and along our way we propose some new methods that, albeit asymptotically equivalent or worse, perform very well in practice and provide a balance between access speed, ease of construction, and space usage.

    References

    [1]
    Belazzougui, D., Boldi, P., Pagh, R., and Vigna, S. 2009b. Monotone minimal perfect hashing: Searching a sorted table with O(1) accesses. In Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Mathematics (SODA'09). ACM, New York, 785--794.
    [2]
    Belazzougui, D., Boldi, P., Pagh, R., and Vigna, S. 2009a. Theory and practise of monotone minimal perfect hashing. In Proceedings of the 10th Workshop on Algorithm Engineering and Experiments (ALENEX'09). SIAM, Philadelphia, PA.
    [3]
    Boldi, P., Codenotti, B., Santini, M., and Vigna, S. 2004. Ubicrawler: A scalable fully distributed web crawler. Softw. Pract. Exp. 34, 8, 711--726.
    [4]
    Boldi, P., Santini, M., and Vigna, S. 2008. A large time-aware graph. SIGIR Forum 42, 1, 33--38.
    [5]
    Botelho, F. C., Pagh, R., and Ziviani, N. 2007. Simple and space-efficient minimal perfect hash functions. In Proceedings of the 10th International Workshop on Algorithms and Data Structures (WADS'07). Springer, Berlin, 139--150.
    [6]
    Botelho, F. C. and Ziviani, N. 2007. External perfect hashing for very large key sets. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM'07). ACM, New York, 653--662.
    [7]
    Charles, D. X. and Chellapilla, K. 2008. Bloomier filters: A second look. In Proceedings of the 16th Annual European Symposium (ESA'08). Springer, Berlin, 259--270.
    [8]
    Chazelle, B., Kilian, J., Rubinfeld, R., and Tal, A. 2004. The Bloomier filter: An efficient data structure for static support lookup tables. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'04). SIAM, Philadelphia, PA, 30--39.
    [9]
    Clark, D. R. and Munro, J. I. 1996. Efficient suffix trees on secondary storage. In Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, 383--391.
    [10]
    Dietzfelbinger, M. and Pagh, R. 2008. Succinct data structures for retrieval and approximate membership (extended abstract). In Proceedings of the 35th International Colloquium on Algorithms, Automata, Complexity, and Programming (ICALP'08). Springer, Berlin, 385--396.
    [11]
    Elias, P. 1974. Efficient storage and retrieval by content and address of static files. J. Assoc. Comput. Mach. 21, 2, 246--260.
    [12]
    Fano, R. M. 1971. On the number of bits required to implement an associative memory. Memorandum 61, Computer Structures Group, Project MAC, MIT, Cambridge, MA.
    [13]
    Fox, E. A., Chen, Q. F., Daoud, A. M., and Heath, L. S. 1991. Order-preserving minimal perfect hash functions and information retrieval. ACM Trans. Inf. Sys. 9, 3, 281--308.
    [14]
    Fredman, M. L. and Komlós, J. 1984. On the size of separating systems and families of perfect hash functions. SIAM J. Algebraic Discrete Methods 5, 1, 61--68.
    [15]
    Fredman, M. L., Komlós, J., and Szemerédi, E. 1984. Storing a sparse table with O(1) worst case access time. J. Assoc. Comput. Mach. 31, 3, 538--544.
    [16]
    Gamma, E., Helm, R., Johnson, R., and Vlissides, J. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Boston, MA.
    [17]
    Geary, R. F., Rahman, N., Raman, R., and Raman, V. 2006. A simple optimal representation for balanced parentheses. Theor. Comput. Sci 368, 3, 231--246.
    [18]
    Golynski, A. 2006. Optimal lower bounds for rank and select indexes. In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP'06). Springer, Berlin, 370--381.
    [19]
    Gonzàlez, R., Grabowski, S., Mäkinen, V., and Navarro, G. 2005. Practical implementation of rank and select queries. In Proceedings of the 4th Workshop on Efficient and Experimental Algorithms (WEA'05). CTI Press and Ellinika Grammata, 27--38.
    [20]
    Gupta, A., Hon, W.-K., Shah, R., and Vitter, J. S. 2007. Compressed data structures: Dictionaries and data-aware measures. Theor. Comput. Sci. 387, 3, 313--331.
    [21]
    Hagerup, T. and Tholey, T. 2001. Efficient minimal perfect hashing in nearly minimal space. In Proceedings of the 18th Symposium on Theoretical Aspects of Computer Science (STACS'01). Springer--Verlag, Berlin, 317--326.
    [22]
    Hirai, J., Raghavan, S., Garcia-Molina, H., and Paepcke, A. 2000. WebBase: a repository of Web pages. Comput. Networks 33, 1--6, 277--293.
    [23]
    Hu, T. C. and Tucker, A. C. 1971. Optimal computer search trees and variable-length alphabetical codes. SIAM J. Appl. Math. 21, 4, 514--532.
    [24]
    Jacobson, G. 1989. Space-efficient static trees and graphs. In Proceedings of the 30th Annual IEEE Symposium on Foundations of Computer Science (FOCS'89). IEEE, Los Alamitos, CA, 549--554.
    [25]
    Jenkins, B. 1997. Algorithm alley: Hash functions. Dr. Dobb's J. Softw. Tools 22, 9, 107--109, 115--116.
    [26]
    Kim, D. K., Na, J. C., Kim, J. E., and Park, K. 2005. Efficient implementation of rank and select functions for succinct representation. In Proceedings of the 4th International Workshop on Experimental and Efficient Algorithms, Springer, Berlin, 315--327.
    [27]
    Knuth, D. E. 1997. Sorting and Searching, 2nd ed. The Art of Computer Programming Series, vol. 3. Addison-Wesley, Boston, MA.
    [28]
    Majewski, B. S., Wormald, N. C., Havas, G., and Czech, Z. J. 1996. A family of perfect hashing methods. Comput. J. 39, 6, 547--554.
    [29]
    Molloy, M. 2005. Cores in random hypergraphs and Boolean formulas. Random Struct. Algorithms 27, 1, 124--135.
    [30]
    Morrison, D. R. 1968. PATRICIA—practical algorithm to retrieve information coded in alphanumeric. J. Assoc. Comput. Mach. 15, 4, 514--534.
    [31]
    Munro, J. I. and Raman, V. 2001. Succinct representation of balanced parentheses and static trees. SIAM J. Comput. 31, 3, 762--776.
    [32]
    Vigna, S. 2008. Broadword implementation of rank/select queries. In Proceedings of the 7th International Workshop on Experimental Algorithms (WEA'08). Springer--Verlag, Berlin, 154--168.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Journal of Experimental Algorithmics
    ACM Journal of Experimental Algorithmics  Volume 16, Issue
    2011
    411 pages
    ISSN:1084-6654
    EISSN:1084-6654
    DOI:10.1145/1963190
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 November 2008
    Published in JEA Volume 16

    Author Tags

    1. Monotone minimal perfect hashing
    2. succinct data structures
    3. very large databases

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CoCo-trieInformation Systems10.1016/j.is.2023.102316120:COnline publication date: 1-Feb-2024
    • (2022)A Learned Approach to Design Compressed Rank/Select Data StructuresACM Transactions on Algorithms10.1145/352406018:3(1-28)Online publication date: 17-Mar-2022
    • (2022)Compressing and Querying Integer Dictionaries Under Linearities and RepetitionsIEEE Access10.1109/ACCESS.2022.322152010(118831-118848)Online publication date: 2022
    • (2022)A Dynamic Repository Approach for Small File Management With Fast Access Time on Hadoop Cluster: Hash Based Extended Hadoop ArchiveIEEE Access10.1109/ACCESS.2022.316343310(36856-36867)Online publication date: 2022
    • (2022)On Representing the Degree Sequences of Sublogarithmic-Degree Wheeler GraphsString Processing and Information Retrieval10.1007/978-3-031-20643-6_18(250-256)Online publication date: 8-Nov-2022
    • (2022)A Learned Prefix Bloom Filter for Spatial DataDatabase and Expert Systems Applications10.1007/978-3-031-12423-5_26(336-350)Online publication date: 29-Jul-2022
    • (2020)Fully Functional Suffix Trees and Optimal Text Searching in BWT-Runs Bounded SpaceJournal of the ACM10.1145/337589067:1(1-54)Online publication date: 15-Jan-2020
    • (2018)Engineering Compressed Static Functions2018 Data Compression Conference10.1109/DCC.2018.00013(52-61)Online publication date: Mar-2018
    • (2016)Prime Box Parallel Search Algorithm: Searching Dynamic Dictionary in O(lg m) TimeJournal of Computer and Communications10.4236/jcc.2016.4401204:04(134-145)Online publication date: 2016
    • (2016)BitvectorsCompact Data Structures10.1017/CBO9781316588284.005(64-102)Online publication date: 5-Sep-2016
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media